Steve Nasypany [email protected] PowerVM Performance Updates HMC v8 Performance Capacity Monitor Dynamic Platform Optimizer PowerVP 1.1.2 VIOS Performance Advisor 2.2.3 © 2014 IBM Corporation First, a HIPER APAR… AIX.

Download Report

Transcript Steve Nasypany [email protected] PowerVM Performance Updates HMC v8 Performance Capacity Monitor Dynamic Platform Optimizer PowerVP 1.1.2 VIOS Performance Advisor 2.2.3 © 2014 IBM Corporation First, a HIPER APAR… AIX.

Steve Nasypany
[email protected]
PowerVM Performance Updates
HMC v8 Performance Capacity Monitor
Dynamic Platform Optimizer
PowerVP 1.1.2
VIOS Performance Advisor 2.2.3
© 2014 IBM Corporation
First, a HIPER APAR…
AIX 6.1 TL9 SP1
XMGC
XMGC
IV53582
AIX 7.1 TL3 SP1
XMGC
XMGC
IV53587
XMGC NOT TRAVERSING ALL KERNEL HEAPS
Systems running 6100-09 Technology Level with bos.mp64 below the 6.1.9.2 level
Systems running 7100-03 Technology Level with bos.mp64 below the 7.1.3.2 level
PROBLEM DESCRIPTION: xmalloc garbage collector is not traversing all kernel heaps,
causing pinned and virtual memory growth. This can lead to low memory or low paging
space issues, resulting in performance degradations and, in some cases, a system hang
or crash
You can’t diagnose this with vmstat or svmon easily. Systems just run out of memory
pinned or computational memory keeps climbing, and cannot be accounted to a process
2
© 2014 IBM Corporation
Optimization Redbook
Draft available now!
POWER7 & POWER8
PowerVM Hypervisor
AIX, i & Linux
Java, WAS, DB2…
Compilers & optimization
Performance tools & tuning
http://www.redbooks.ibm.com/redpieces/abstracts/sg248171.html
3
© 2014 IBM Corporation
HMC Version 8
Performance Capacity Monitor
4
© 2014 IBM Corporation
Power Systems Performance Monitoring
HMC 780 or earlier






HMC 810
Evolution from disjoint set of OS tools to integrated monitoring solution
System resource monitoring via a single touch-point (HMC)
Data collection and aggregation of performance metrics via Hypervisor
REST API (WEB APIs) for integration with IBM and third-party products
Trending of the utilization data
Assists in first level of performance analysis & capacity planning
5
© 2014 IBM Corporation
Performance Metrics (complete set, firmware dependent)
 Physical System Level Processor & Memory Resource Usage Statistics
– System Processor Usage Statistics (w/ LPAR, VIOS & Power Hypervisor usage
breakdown)
– System Dedicated Memory Allocation and Shared Memory Usage Statistics (w/
LPAR, VIOS & Power Hypervisor usage breakdown)
 Advanced Virtualization Statistics
– Per LPAR Dispatch Wait Time Statistics
– Per LPAR Placement Indicator (for understanding whether the LPAR placement is
good / bad based on score)
 Virtual IO Statistics
– Virtual IO Server’s CPU / Memory Usage (Aggregated, Breakdown)
– SEA Traffic & Bandwidth usage Statistics (Aggregated & Per Client, Intra/Inter
LPAR breakdown)
– NPIV Traffic & Bandwidth usage Statistics (HBA & Per Client breakdown)
– vSCSI Statistics (Aggregated & Per Client Usage)
– VLAN Traffic & Bandwidth usage Statistics (Adapter & LPAR breakdown)
 SRIOV Traffic & Bandwidth usage Statistics (Physical & Virtual Function Statistics w/
LPAR breakdown)
6
© 2014 IBM Corporation
6
Performance Metrics (cont.)
 Raw Metrics
– Cumulative counters (since IPL) or Quantities (size, config, etc.,)
– Fixed sampling intervals
• General purpose monitoring: 30 seconds, 30 minute cache
• Short term problem diagnosis: 5 seconds, 15 minute cache
 Processed Metrics
– Utilization (cpu, I/O, etc.,)
– Fixed interval of 30 seconds, preserved for 4 hrs
 Aggregated Metrics
– Rolled-up Processed Metrics
– Rolled-up data at 15 minute, 2-hour & daily (Min, Average & Max)
– Preserved for a max of 365 days (configurable per HMC & limited by
storage space)
7
© 2014 IBM Corporation
New control for storage and enablement
8
© 2014 IBM Corporation
Aggregate Server: Current Usage (CPU, Memory, IO)
9
© 2014 IBM Corporation
Partition: Entitlement vs Usage Spread, Detail
10
© 2014 IBM Corporation
Partition: Processor Utilization
11
© 2014 IBM Corporation
Partition: Network, including SR-IOV support
12
© 2014 IBM Corporation
Storage by VIOS, vSCSI or NPIV
13
© 2014 IBM Corporation
HMC v8 Monitor Support (June 2014 GA)
Minimum features with all POWER6 & above models:
 Managed System CPU Utilization (Point In Time & Historical)
 Managed System Memory Assignment (Point In Time & Historical)
 Server Overview Section of Historical Data with LPAR & VIOS view
 Processor Trend Views with LPAR, VIOS & Processor Pool (no System
Firmware Utilization, Dispatch Metrics, will be shown as zero)
 Memory Trend Views with LPAR & VIOS view
These metrics were available via legacy HMC performance data collection
mechanisms and are picked up by the monitor.
14
© 2014 IBM Corporation
HMC v8 Monitor Support (new firmware-based function)
 FW 780 & VIOS 2.2.3, all function except for 770/780-MxB models
– No support for LPAR Dispatch Wait Time
– No support for Power Hypervisor Utilization
 FW 780 or above with VIOS level below 2.2.3, then the following functions
are not available (basically, no IO utilization):
– Network Bridge / Virtual Storage Trend Data
– VIOS Network / Storage Utilization
 FW 770 or less with VIOS 2.2.3 or later then these are not provided:
– Network Bridge Trend Data
– LPAR Dispatch Wait Time
– Power Hypervisor Utilization
 FW 770 or less with VIOS level below 2.2.3, then the tool will not provide:
– Network Bridge / Virtual Storage Trend Data
– VIOS Network / Storage Utilization
– LPAR Dispatch Wait Time
– Power Hypervisor Utilization
15
© 2014 IBM Corporation
Dynamic Platform Optimizer
Update
16
© 2014 IBM Corporation
What is Dynamic Platform Optimizer - DPO
 DPO is a PowerVM virtualization feature that enables users to
improve partition memory and processor placement (affinity) on
Power Servers after they are up and running.
 DPO performs a sequence of memory and processor
relocations to transform the existing server layout to the optimal
layout based on the server topology.
 Client Benefits
–Ability to run without a platform IPL (entire system)
–Improved performance in a cloud or highly virtualized environments
–Dynamically adjust topology after mobility
17
© 2014 IBM Corporation
What is Affinity?
 Affinity is a locality measurement of an entity with respect to physical resources
– An entity could be a thread within AIX/i/Linux or the OS instance itself
– Physical resources could be a core, chip, node, socket, cache (L1/L2/L3),
memory controller, memory DIMMs, or I/O buses
 Affinity is optimal when the number of cycles required to access resources is
minimized
POWER7+ 760 Planar
Note x & z buses between chips, and A & B
buses between Dual Chip Modules (DCM)
In this model, each DCM is a “node”
18
© 2014 IBM Corporation
Partition Affinity: Why is it not always optimal?
Partition placement can become sub-optimal because of:
 Poor choices in Virtual Processor, Entitlement or Memory sizing
–The Hypervisor uses Entitlement & Memory settings to place a
partition. Wide use of 10:1 Virtual Processor to Entitlement settings
does not lend any information for optimal placement.
–Before you ask, there is no single golden rule, magic formula, or
IBM-wide Best Practice for Virtual Processor & Entitlement sizing.
If you want education in sizing, ask for it.
 Dynamic creation/deletion, processor and memory ops (DLPAR)
 Hibernation (Suspend or Resume)
 Live Partition Mobility (LPM)
 CEC Hot add, Repair, & Maintenance (CHARM)
 Older firmware levels are less sophisticated in placement and
dynamic operations
19
© 2014 IBM Corporation
Partition Affinity: Hypothetical 4 Node Frame
Partition X
Partition X
DPO
operation
Partition Y
Partition Z
Partition Y
Partition X
Partition Y
Partition Z
Free LMBs
Partition Z
20
© 2014 IBM Corporation
Current and Predicted Affinity enhancement with V7R780 firmware
Scores at the partition level along with the system wide scores
lsmemopt –m managed_system –o currscore –r [sys | lpar]
lsmemopt –m managed_system –o calcscore –r [sys | lpar]
[--id request_partition_list]
[--xid protect_partition_list]
sys = system-wide score (default if the –r option not specified)
lpar = partition scores
21
© 2014 IBM Corporation
Example: V7R780 firmware current affinity score
lsmemopt -m calvin -o currscore -r sys
>curr_sys_score=97
lsmemopt –m calvin –o currscore –r lpar
>lpar_name=calvinp1,lpar_id=1,curr_lpar_score=100
lpar_name=calvinp2,lpar_id=2,curr_lpar_score=100
lpar_name=calvinp50,lpar_id=50,curr_lpar_score=100
lpar_name=calvinp51,lpar_id=51,curr_lpar_score=none
lpar_name=calvinp52,lpar_id=52,curr_lpar_score=100
lpar_name=calvinp53,lpar_id=53,curr_lpar_score=74
lpar_name=calvinp54,lpar_id=54,curr_lpar_score=none
Get predicted score
lsmemopt -m calvin -o calcscore -r sys
>curr_sys_score=97,predicted_sys_score=100,requested_lpar_i
ds=none,protected_lpar_ids=none
22
© 2014 IBM Corporation
HMC CLI: Starting/Stopping a DPO Operation
optmem –m managed_system –t affinity –o start
[--id requested_partition_list]
[--xid protect_partition_list]
Use these switches
to exclude partitions
by name or number
example:
Partitions that are not DPO aware
– Partition lists are comma-separated and can include ranges
– Include –-id 1,3, 5-8
– Requested partitions: partitions that should be prioritized (default = all
LPARs)
– Protected partitions: partitions that should not be touched (default =
no LPARs)
– Exclude by name –x CAB, ZIN or by LPAR id number --xid 5,10,16-20
optmem –m managed_system –t affinity –o stop
23
© 2014 IBM Corporation
HMC CLI: DPO Status
lsmemopt –m managed_system
>in_progress=0,status=Finished,type=affinity,opt_id=1,
progress=100,requested_lpar_ids=none,protected_lpar_ids=
none,”impacted_lpar_ids=106,110”
• Unique optimization identifier
• Estimated progress %
• LPARs that were impacted by the optimization
(i.e. had CPUs, memory, or their hardware page table moved)
24
© 2014 IBM Corporation
What’s New (V7R7.8.0): DPO Schedule, Thresholds, Notifications
System affinity score
Not LPAR affinity score
25
© 2014 IBM Corporation
DPO – Supported Hardware and Firmware levels
 Introduced in fall 2012 (with feature code EB33)
• 770-MMD and 780-MHD with firmware level 760.00
• 795-FHB with firmware level 760.10 (760 with fix pack 1)
• Recommend 760_069 has enhancements below
 Additional systems added spring 2013 with firmware level 770
– 710,720,730,740 D-models with firmware level 770.00
– 750,760 D-models with firmware level 770.10 (770 with fix pack 1)
– 770-MMC and 780-MHC with firmware level 770.20 (770 with fix pack 2)
– Performance enhancements – DPO memory movement time reduced
– Scoring algorithm improvements
– Recommend firmware at 770_021
 Affinity scoring at the LPAR level with firmware level 780 delivered Dec 2013
http://www 770-MMB, 780-MHB added with 780.00
304.ibm.com/support/customercare/
 795-FHB updated with 780.00
sas/f/power5cm/power7.html
 770-MMD, 780-MHD (AM780_056_040 level released 4/30/2014)
* Some Power models and firmware releases listed above are currently planned for the future and have not yet been
announced. * All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and
represent goals and objectives only.
26
© 2014 IBM Corporation
Running DPO
 DPO aware Operating Systems
– AIX: 6.1 TL8 or later, AIX 7.1 TL2 or later
– IBM i: 7.1 TR6 or later
– Linux: Some reaffinitization in RHEL7/SLES12.
(Fully implemented in follow-on releases)
– VIOS 2.2.2.0 or later
– HMC V7R7.6.1
 Partitions that are DPO aware are notified after DPO completes
 Re-affinitization Required
– Performance team measurements show reaffinitization is critical
– For older OS levels, users can exclude those partitions from optimization, or
reboot them after running the DPO Optimizer
 Affinity (at a high level) is as good as CEC IPL
– (assuming unconstrained DPO)
27
© 2014 IBM Corporation
More Information
 IBM PowerVM Virtualization Managing and Monitoring (June 2013)
– SG24-7590-04: http://www.redbooks.ibm.com/abstracts/sg247590.html?Open
 IBM PowerVM Virtualization Introduction and Configuration (June 2013)
– SG24-7940-05: http://www.redbooks.ibm.com/abstracts/sg247940.html?Open
 POWER7 Information Center under logical partitioning topiccs
– http://pic.dhe.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=%2Fp7hat%2Fiphblm
anagedlparp6.htm
 IBM DeveloperWorks
– https://www.ibm.com/developerworks/community/blogs/PowerFW/entry/dynamic_platform
_optimizer5?lang=en
 POWER7 Logical Partitions “Under the Hood”
– http://www03.ibm.com/systems/resources/power_software_i_perfmgmt_processor_lpar.pdf
28
© 2014 IBM Corporation
PowerVP
29
© 2014 IBM Corporation
PowerVP Redbook
Draft available now!
http://www.redbooks.ibm.com/redpieces/pdfs/redp5112.pdf
30
© 2014 IBM Corporation
Review - POWER7+ 750/760 Four Socket Planer Layout
Note x & z buses between chips, and A & B
buses between Dual Chip Modules (nodes)
Power 750/760 D Technical Overview
31
© 2014 IBM Corporation
Review - POWER7+ 770/780 Four Socket Planer Layout
Loc Code Conn Ref
Not as pretty as 750+ diagram,
note we have x, w & z buses
between chips with this model
and buses to other nodes (not
pictured) and IO are a little
more cryptic
Power 770/780 D Technical Overview
32
© 2014 IBM Corporation
PowerVP - Virtual/Physical Topology Utilization
33
© 2014 IBM Corporation
Why PowerVP - Power Virtualization Performance
 During an IPL of the entire Power System, the Hypervisor
determines an optimal resource placement strategy for the server
based on the partition configuration and the hardware topology of
the system.
 There was a desire to have a visual understanding of how the
hardware resources were assigned and being consumed by the
various partitions that were running on the platform.
 It was also desired to have a visual indication showing a
resource’s consumption and when it was going past a warning
threshold (yellow) and when it was entering an overcommitted
threshold (red.)
34
© 2014 IBM Corporation
PowerVP Overview
 Graphically displays data from existing and new performance tools
 Converges performance data from across the system
 Shows CEC, node & partition level performance data
 Illustrates topology utilization with colored “heat” threshold settings
 Enables drill down for both physical and logical approaches
 Allows real-time monitoring and recording function
 Simplifies physical/virtual environment, monitoring, and analysis
 Not intended to replace any current monitoring or management product
35
© 2014 IBM Corporation
PowerVP Environment
Partition Collectors






Required for logical view
LPAR CPU utilization
Disk Activity
Network Activity
CPI analysis
Cache analysis
System-wide Collector






One required per system
P7 topology information
P7 chip/core utilizations
P7 Power bus utilizations
Memory and I/O utilization
LPAR entitlements, utilization
System
Collector
Partition
Collector
Operating system
Hypervisor
interfaces
IBM i, AIX, VIOS, Linux
Chip
Core
HPMCs PMUlets
FW/Hypervisor
Thread PMUs
Power Hardware
36
© 2014 IBM Corporation
You only need to
install a single
system-wide
collector to see
global metrics
PowerVP – System, Node and Partition Views
System
Topology
37
© 2014 IBM Corporation
Node
Drill Down
Partition
Drill Down
PowerVP – System Topology
• The initial view shows the
hardware topology of the system
you are logged into
• In this view, we see a Power 795
with all eight books and/or nodes
installed, each with four sockets
• Values within boxes show CPU
usage
• Lines between nodes show SMP
fabric activity
38
© 2014 IBM Corporation
PowerVP – Node drill down
• This view appears when
you click on a node and
allows you to see the
resource assignments or
consumption
• In this view, we see a
POWER7 780 node with
four chips each with four
cores
• Active buses are shown with solid colored lines. These can be
between nodes, chips, memory controllers and IO buses.
39
© 2014 IBM Corporation
PowerVP 1.1.2: Node View (POWER7 780)
40
© 2014 IBM Corporation
PowerVP 1.1.2: Chip (POWER7 780 with 4 cores)
SMP Bus
IO
Memory
Controller
CHIP
DIMM
LPARs
41
© 2014 IBM Corporation
PowerVP 1.1.2: CPU Affinity
LPAR 7 has 8 VPs. As we select cores, 2 VPs are “homed” to each
core. The fourth core has 4 VPs from four LPARs “homed” to it.
This does not prevent VPs from being dispatched elsewhere in the
pool as utilization requirements demand
42
© 2014 IBM Corporation
PowerVP 1.1.2: Memory Affinity
LPAR 7 Online Memory is 32768 MB, 50% of 64 GB in DIMMs
Note: LPARs will be listed in color order in shipping version
43
© 2014 IBM Corporation
PowerVP - Partition drill down
• View allows us to drill down
on resources being used by
selected partition
• In this view, we see CPU,
Memory, Disk IOPS, and
Ethernet being consumed.
We can also get an idea of
cache and memory affinity.
• We can drill down on several of these resources. Example: we can
drill down on the disk transfer or network activity by selecting the
resource
44
© 2014 IBM Corporation
PowerVP - Partition drill down (CPU, CPI)
45
© 2014 IBM Corporation
PowerVP - Partition drill down (Disk)
46
© 2014 IBM Corporation
PowerVP – How do I use this?
• PowerVP is not intended to replace traditional performance
management products
• It does not let you manage CPU, memory or IO resources
• It does provide an overview of hardware resource activity that
allows you to get a high-level view
• Node/socket activity
• Cores assigned to dedicated and shared pool
• VM’s Virtual Processors assigned to cores
• VM’s memory assigned to DIMMs
• Memory bus activity
• IO bus activity
• Provides partition activity related to
• Storage & Network
• CPU
• Software Cycles-Per-Instruction
47
© 2014 IBM Corporation
PowerVP – How do I use this? High-Level
• High-level view can allow visual identification of node and bus
stress
• Thresholding is largely arbitrary, but if one memory controller
is obviously saturated and others are inactive, you have an
indication more detailed review is required
• There are no rules-of-thumb or best practices for thresholds
• You can review system Redbooks and determine where you
are with respect to bus performance (not always available, but
newer Redbooks are more informative)
• This tool provides high-level diagnosis with some detailed view (if
partition-level collectors are installed)
48
© 2014 IBM Corporation
PowerVP – How do I use this? Low-Level
• Cycles-Per-Instruction (CPI) is a complicated subject, it will be
beyond the capacity of most customers to assess in detail
• In general, a lower CPI is better – the fewer number of CPU
cycles per instruction, the more instructions can get done
• PowerVP gives you various CPI values – these, in conjunction
with OS tools can tell you whether you have good affinity
• Affinity is a measurement of a threads locality to physical resources.
Resources can be many things: L1/L2/L3 cache, core(s), chip,
memory controller, socket, node, drawer, etc.
49
© 2014 IBM Corporation
AIX Enhanced Affinity
 AIX on POWER7 and above uses
Enhanced Affinity instrumentation
to localize threads by Scheduler
Resource Allocation Domain
(SRAD)
 AIX Enhanced Affinity measures
Local
Usually a Chip
Near
Local Node/DCM
Far
Other Node/Drawer/CEC
Affinity
Local
chip
Near
POWER7 770/780/795
Far
internode
 These are logical mappings, which
may or may not be exactly 1:1 with
physical resources
POWER8 S824 DCM
50
© 2014 IBM Corporation
intranode
AIX topas Logical Affinity (‘M’ option)
Topas Monitor for host: claret4
Interval: 2
===================================================================
REF1 SRAD TOTALMEM INUSE
FREE
FILECACHE HOMETHRDS CPUS
------------------------------------------------------------------0
2
4.48G
515M
3.98G
52.9M
134.0
12-15
0
12.1G 1.20G
10.9G
141M
236.0
0-7
1
1
4.98G
537M
4.46G
59.0M
129.0
8-11
3
3.40G
402M
3.01G
39.7M
116.0
16-19
===================================================================
CPU
SRAD TOTALDISP
LOCALDISP% NEARDISP%
FARDISP%
---------------------------------------------------------0
0
303.0
43.6
15.5
40.9
2
0
1.00
100.0
0.0
0.0
3
0
1.00
100.0
0.0
0.0
4
0
1.00
100.0
0.0is optimal 0.0
Local
5
0
1.00
100.0
0.0
0.0
6
0
1.00
100.0
0.0
0.0
Node
Chip
Dispatches
What’s a bad FARDISP% rate? No rule-of-thumb, but 1000’s of far dispatches per
second will likely indicate lower performance
How do we fix? Entitlement & Memory sizing Best Practices + Current Firmware +
Dynamic Platform Optimizer
51
© 2014 IBM Corporation
PowerVP Physical Affinity: VM View
• PowerVP can show us physical affinity (local, remote & distant)
• AIX topas can show us logical affinity (local, near & far)
• More local, more ideal
Cache Affinity
DIMM Affinity
Local is optimal
Computed CPI is an inverse calculation, lower is typically better
52
© 2014 IBM Corporation
PowerVP supported Power models and ITE’s
 Power System models and ITE’s with 770 firmware support
• 710-E1D, 720-E4D, 730-E2D, 740-E6D (also includes Linux D models)
• 750-E8D, 760-RMD
• 770-MMC, 780-MHC, ESE 9109-RMD
• p260-22X, p260-23X, p460-42X, p460-43X, p270-24X, p470-44X, p24L-7FL
• 71R-L1S, 71R-L1C, 71R-L1D, 71R-L1T, 7R2-L2C, 7R2-L2S, 7R2-L2D, 7R2-L2T
 Power System models added with 780 firmware support
– 770-MMB and 780-MHB (eConfig support 1/28/2014)
– 795-FHB Dec 2013
780 Power Firmware
http://www304.ibm.com/support/customercare/
sas/f/power5cm/power7.html
 Power System models with 780 firmware support
– 770-MMD, 780-MHD (4/30/2014)
 Pre-770 firmware models do not have instrumentation to support PowerVP
* Some Power models and firmware releases listed above are currently planned for the future and have not yet been announced.
* All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
53
© 2014 IBM Corporation
PowerVP OS Support
 Announced and GA in 4Q 2013
 PowerVP 1.1.2 ships 6/2014
 Available as standalone product or with PowerVM Enterprise
Edition
 Agents will run on IBM i, AIX, Linux on Power and VIOS
–System i V7R1, AIX 6.1 & 7.1, any VIOS level that supports
POWER7
–RHEL 6.4, SUSE 11 SP 3
 Client supported on Windows, Linux, and AIX
–Client requires Java 1.6 or greater
–Installer provided for Windows, Linux, and AIX
–Also includes a Java installer, which has worked under
VMWARE and OSX (limited testing)
Has worked on VMWARE and MAC where the others don’t
54
© 2014 IBM Corporation
VIOS Performance Advisor 2.2.3
55
© 2014 IBM Corporation
VIOS Performance Advisor: What is it?
 Not another performance monitoring tool, an integrated report that
leverages other tools and the lab’s knowledge base
 Summarizes the overall performance of a VIOS
 Identifies potential bottlenecks and performance inhibitors
 Proposes actions to be taken to address the bottlenecks
 The “beta” VIOS Performance Advisor productized and shipped with
the Virtual I/O Server
 Shipped with VIOS 2.2.2
56
© 2014 IBM Corporation
Performance Advisor: How does it work?




Polls key performance metrics for over a period of time
Analyzes the data
Produces an XML formatted report for viewing in a browser
“part” command is available in VIOS restricted shell
– pronounced as “p-Art” (Performance Analysis & Reporting Tool).
 “part” command can be executed in two different modes
– Monitoring Mode (actually uses nmon recording now)
– Post Processing nmon Recording Mode
 The final report along with the supporting files are bundled together in a
“tar” formatted file
 Users can download & extract it to their PC or machines with browser
installed to view the reports in a browser.
57
© 2014 IBM Corporation
VIOS Performance Advisor: Process
Collect Data
Transfer & view report
Monitoring Mode: 5 to 60 minutes
IBM Virtual I/O Server
login: padmin
$ part -i 30
- or Post-processing nmon recording
IBM Virtual I/O Server
login: padmin
$ part -f vio1_130915_1205.nmon



58
© 2014 IBM Corporation
Transfer the generated tar file to
a machine with browser support
Extract the tar file
Load *.xml file in brower
VIOS Performance Advisor: Browser View
59
© 2014 IBM Corporation
VIOS Performance Advisor: Legend, Risk & Impact
Advisor Legend
Informative
Investigate
Optimal
Warning
Critical
Help/Info
60
© 2014 IBM Corporation
Risk: Level of risk,
as a range of 1 to 5,
of making suggested
value change
Impact: Potential
performance impact,
as a range of 1 to 5,
of making suggested
value change
VIOS Performance Advisor: Config
61
© 2014 IBM Corporation
VIOS Performance Advisor: Tunable Information
When you select the help icon
62
© 2014 IBM Corporation
a pop-up with guidance appears
VIOS Performance Advisor: CPU Guidance
63
© 2014 IBM Corporation
VIOS Performance Advisor: Shared Pool Guidance
 If shared pool monitoring is enabled,
the Advisor will report, status,
settings and if there is a constraint
 Enablement is via partition
properties panel as: Allow
performance information collection
64
© 2014 IBM Corporation
VIOS Performance Advisor: Memory Guidance
65
© 2014 IBM Corporation
VIOS Performance Advisor: IO Total & Disks
66
© 2014 IBM Corporation
VIOS Performance Advisor: Disk Adapters
67
© 2014 IBM Corporation
VIOS Performance Advisor: FC Details
FC Utilization
based on peak
IOPS rates
68
© 2014 IBM Corporation
VIOS Performance Advisor: NPIV Breakdowns
69
© 2014 IBM Corporation
VIOS Performance Advisor: Storage Pool
70
© 2014 IBM Corporation
VIOS Performance Advisor: Shared Ethernet
Accounting feature must be enabled on VIOS
chdev –dev ent* –attr accounting=enabled
71
© 2014 IBM Corporation
VIOS Performance Advisor: Shared Tunings
72
© 2014 IBM Corporation
Performance Advisor: Overhead
 CPU overhead of running this tool in VIOS will be same as that of running
nmon recording – very low
 Memory footprint of the command is also kept to the minimum
 However in post-processing mode, if the recording contains a high
number of samples, the part command will consume noticeable cpu
when executed
– Example: an nmon recording with 4000 samples and of size 100MB
collected on a VIOS with 255 disks configured will be take about 2
minutes to complete the analysis on a VIOS with an entitlement of 0.2
– A typical (default) nmon recording will contain 1440 samples, so the
above example is on the high end of the scale
73
© 2014 IBM Corporation
Affinity Backup
74
© 2014 IBM Corporation
What is Affinity?
 Affinity is a locality measurement of an entity with respect to physical resources
– An entity could be a thread within AIX/i/Linux or the OS instance itself
– Physical resources could be a core, chip, node, socket, cache (L1/L2/L3),
memory controller, memory DIMMs, or I/O buses
 Affinity is optimal when the number of cycles required to access resources is
minimized
POWER7+ 760 Planar
Note x & z buses between chips, and A & B
buses between Dual Chip Modules (DCM)
In this model, each DCM is a “node”
75
© 2014 IBM Corporation
Thread Affinity
 Performance is closer to optimal when threads stay close to physical
resources. Thread Affinity is a measurement of proximity to a
resource
– Examples of resources: L2/L3 cache, memory, core, chip and node
– Cache Affinity: threads in different domains need to communicate
with each other, or cache needs to move with thread(s) migrating
across domains
– Memory Affinity: threads need to access data held in a different
memory bank not associated with the same chip or node
 Modern highly multi-threaded workloads are architected to have lightweight threads and distributed application memory
– Can span domains with limited impact
– Unix scheduler/dispatch/memory manager mechanisms spread
workloads
76
© 2014 IBM Corporation
Partition Affinity: Why is it not always optimal?
Partition placement can become sub-optimal because of:
 Poor choices in Virtual Processor, Entitlement or Memory sizing
–The Hypervisor uses Entitlement & Memory settings to place a
partition. Wide use of 10:1 Virtual Processor to Entitlement settings
does not lend much information for best placement.
–Before you ask, there is no single golden rule, magic formula, or
IBM-wide Best Practice for Virtual Processor & Entitlement sizing.
If you want education in sizing, ask for it.
 Dynamic creation/deletion, processor and memory ops (DLPAR)
 Hibernation (Suspend or Resume)
 Live Partition Mobility (LPM)
 CEC Hot add, Repair, & Maintenance (CHARM)
 Older firmware levels are less sophisticated in placement and
dynamic operations
77
© 2014 IBM Corporation
How does partition placement work?
 PowerVM knows the chip types and memory configuration, and
attempts to pack partitions onto the smallest number of chips / nodes /
drawers
– Optimizing placement will result in higher exploitation of local CPU
and memory resources
– Dispatches across node boundaries will incur longer latencies,
and both AIX and PowerVM the are actively trying to minimize that
via active Enhanced Affinity mechanisms
 It considers the partition profiles and calculates optimal placements
– Placement is a function of Desired Entitlement, Desired &
Maximum Memory settings
– Virtual Processor counts are not considered
– Maximum memory defines the size of the Hardware Page Table
maintained for each partition. For POWER7, it is 1/64th of
Maximum and 1/128th on POWER7+ and POWER8
– Ideally, Desired + (Maximum/HPT ratio) < node memory size if
possible
78
© 2014 IBM Corporation
What tools exist for optimizing affinity?
 Within the AIX, two technologies are used to maximize thread affinity
– AIX dispatcher uses Enhanced Affinity services to keep a thread
within the same POWER7 multiple-core chip to optimize chip and
memory controller use
– Dynamic System Optimizer (DSO) proactively monitors, measures
and moves threads, their associated memory pages and memory
pre-fetch algorithms to maximize core, cache and DIMM efficiency.
We do not cover this feature in this presentation.
 Within a PowerVM frame, three technologies assist in maximizing
partition(s) affinity
– The PowerVM Hypervisor determines an optimal resource placement
strategy for the server based on the partition configuration and the
hardware topology of the system.
– Dynamic Platform Optimizer relocates OS instances within a frame
for optimal physical placement
– PowerVP allows us to monitor placement, node, memory bus, IO bus
and Symmetric Multi-Processor (SMP) bus activity
79
© 2014 IBM Corporation
AIX Enhanced Affinity
 AIX on POWER7 and above uses
Enhanced Affinity instrumentation
to localize threads by Scheduler
Resource Allocation Domain
(SRAD)
 AIX Enhanced Affinity measures
Local
Usually a Chip
Near
Local Node/DCM
Far
Other Node/Drawer/CEC
Affinity
Local
chip
Near
POWER7 770/780/795
Far
internode
 These are logical mappings, which
may or may not be exactly 1:1 with
physical resources
POWER8 S824 DCM
80
© 2014 IBM Corporation
intranode
AIX Affinity: lssrad tool shows logical placement
View of 24-way, two socket POWER7+ 760 with Dual Chip Modules (DCM)
6 cores chip, 12 in each DCM
5 Virtual Processors x 4-way SMT = 20 logical cpus
Terms:
REF
Node (drawer or DCM/MCM socket)
SRAD Scheduler Resource Allocation Domain
Node 0
SRAD
# lssrad -av
REF1
SRAD
MEM
0
CPU
2
0
0
12363.94
2
4589.00
0-7
12-15
1
Node 1
SRAD
1
1
5104.50
8-11
3
3486.00
16-19
If a thread’s ‘home’ node was SRAD 0
SRAD 2 would be ‘near’
SRAD 1 & 3 would be ‘far’
81
© 2014 IBM Corporation
3
Affinity: Cycles-Per-Instruction
Another way to look at affinity is by watching how many cycles a
thread uses
 This can be done via Cycles-Per-Instruction (CPI) measurements
 POWER Architectures are instrumented with a variety of CPI values
provided for chip resources
 These measurements are usually a complicated subject that are the
domain of hardware and software developers
 In general, a lower CPI is better – the fewer number of CPU cycles
per instruction, the more efficient it is
 We will return to this concept in the PowerVP section
82
© 2014 IBM Corporation
Affinity: Diagnosis
When may I have a problem?
- SRAD has CPUs but no memory or vice-versa
- When CPU or Memory are very unbalanced
But how do I really know?
- Tools tell you: lssrad/topas/mpstat/svmon (AIX), numactl (Linux) &
PowerVP
- High percentage of threads with far dispatches
- Disparity in performance between equivalent systems
PowerVM & POWER8 provide a variety of improvements
- PowerVM has come a long way in the last three years – firmware,
AIX, Dynamic Platform Optimizer and PowerVP give you a lot of
options
- Cache (sizes, pre-fetch, L4, Non-Uniform Cache Access logic),
Controller, massive DIMM bandwidth improvement
- Inter-socket latencies and efficiency have progressively improved
from POWER7 to POWER7+ and now POWER8
83
© 2014 IBM Corporation
How do I Optimize Affinity?
• This is a separate topic, but an overview of options
• Follow POWER7 Best Practices for sizing (in general, size
partitions entitlement, desired & maximum memory settings to
be tailored to real usage and chip/node sizes)
• Update to newer firmware levels – they are much smarter about
physical placement of virtualized OS instances
• Use Dynamic Platform Optimizer (DPO) to optimally place
partitions within a frame
• Monitor Enhanced Affinity metrics in AIX (topas ‘M’)
• Use Dynamic System Optimizer (DSO) to optimally place
threads within AIX. DSO does this by monitoring core, cache
and DIMM memory use by individual threads.
• Use software products that are affinity-aware (newer levels of
some Websphere prodcuts are capable of this)
• Manually create Resource Sets (rsets) of CPU & memory
resources and assign workloads to them (expert level)
84
© 2014 IBM Corporation