Soft Capping in z196 Processors - Individual CMG Regions and SIGs
Download
Report
Transcript Soft Capping in z196 Processors - Individual CMG Regions and SIGs
Technology & Operations - Enterprise Infrastructure
Enterprise Platform Services
Customer Experiences with
HiperDispatch & Soft Capping in
IBM Mainframe Systems
Prepared for presentation at CMG Canada
on April 18, 2012
Jonathan Gladstone, P.Eng.
Senior Tech. Specialist, Mainframe & Mid-range Systems Capacity Planning
Technology & Operations
BMO Financial Group
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
Agenda
HiperDispatch
A high-level discussion of BMO’s implementation of this feature
Soft Capping
Detailed presentation with circles and arrows and a paragraph… and apologies to Arlo Guthrie
BONUS TOPIC!
Transition to z196 – Performance Implications
Just a preview; detailed analysis not yet complete
April 18, 2012
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
2
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
HiperDispatch: how is it supposed to work?
HiperDispatch aligns workloads “vertically” on physical CPs
Builds a strong affinity between logical and physical processors - details available in zJournal
article viewable at www.mainframezone.com
https://www.mainframezone.com/article/hiperdispatch-a-conceptual-overview
Applies to all processors by type, when logically shared: zAAPs, zIIPs, GCPs
VH (100%), VM (50-99%), VL (<50%; discretionary) by weight, but avoiding VL where possible
Purports to improve performance by reducing latency times, e.g. for CP state re-loads
Keep data and instructions in lowest-level (fastest) cache
Performance improvement claims vary depending on configuration
Largest (8-10%) for large, multi-book CECs with many large systems sharing logical resources
extensively; least (0-2%) for single-book CECs with few systems with limited sharing.
Performance improvements baked in to LSPR ratings for z/OS 1.11 and up
Turning off HiperDispatch now yields less than optimal performance.
April 18, 2012
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
3
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
HiperDispatch: how is it applied and working at BMO?
Turned on when we went to z/OS 1.11
Default in z/OS 1.11 and up is “ON” for HiperDispatch; we left it that way
Nasty surprises!
Specialty workload flowing back to GCPs
VL engines left parked while workloads not meeting WLM target performance
Causes?
Investigated multiple changes: new z/OS, zAAP-on-zIIP, HiperDispatch
Fixes?
Set zIIP (and zAAP) weights properly – never mattered before
Changed GCP weights to minimize impacts
Reviewing WLM profiles
Results?
Things working much better now (see third discussion regarding z196 performance)
Conclusions
HiperDispatch appears to yield performance benefits claimed by IBM, but…
Weights are now important all the time (not just when box is maxed out)
WLM profiles are more important than ever
Some situations still difficult (e.g. K-LPARs for GDPS)
April 18, 2012
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
4
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
Soft Capping: how is it supposed to work?
Soft Capping is available for single LPARs or “Capacity Groups” (CGs) of LPARs
Available since 2005 or earlier
Applies only to GCPs
Uses same MSU ratings as SCRT reports for VWLC
Limits CPU utilization of LPAR or Group based on four-hour rolling average (4HRA)
Checked by PR/SM every 5 minutes
Utilization can go as high as enabled capacity until 4HRA hits cap; then PR/SM will
limit utilization until 4HRA drops below cap again
A little more complicated for Capacity Groups: cap is applied to individual software products
rather than for LPARs, and LPAR weights are used as needed
4HRA can exceed cap
After Cap is reached, utilization at cap will often increase 4HRA for a few intervals until it
settles back
Reports suggest 4HRA will exceed cap by about 3% in these circumstances
IBM VWLC charge is based on cap rather than on actual utilization
True for whole CPC if CG includes all LPARs
April 18, 2012
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
5
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
Soft Capping: how is it applied and working at BMO?
In place at BMO in four instances (three current)
One z10 Production CEC from Jul/09 through Jan/10
One z196 Dev/Test/QA CEC from Sep/11 through present
Two z196 Production CECs from Jan-Feb/12 through present
SCRT reports show one instance of capping at BMO
Nov. 19, 2011 in z196D1: SCRT report shows MSU utilization hit 321 MSU on cap of 312 MSU
in capacity of 408 MSU
Analysis based on data from TDS/z and from SCRT reports as submitted to IBM
All IBM tools
Interesting results, with differences to customer benefit
Data clearly show Soft Capping working, as documented
No charge for over-utilization, as documented
Unexplained difference between computed value and SCRT report is to customer advantage
April 18, 2012
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
6
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
Soft Capping: November, 2011 – Cap takes effect
Capacity 408 MSU
Interval versus 4HRA CPU Consumption:
z196D1 in November, 2011
450
CG cap 312 MSU
400
Utilization above cap for
several intervals
beforehand
CPU consumption (MSU)
350
300
4HRA computed from
TDS/z crosses cap
around 01:20
250
200
Rises to 325.7 MSU
150
Peak hourly 4HRA 325
MSU according to TDS/z,
only 321 MSU according
to SCRT
100
50
NUM_CONSUMED_MSU is
corrected by multiplying by
number of intervals in one
hour (6), per APAR PK29312
CG values include all LPARs but
exclude *PHYSCAL
20
11
-1
20 1-1
8
11
00
-1
:0
1
20
-1
0
8
11
0
-1
2:
00
20 1-1
8
11
0
-1
4
20 1-1 :00
8
11
06
-1
20 1-1 :00
8
11
08
-1
20 1-1 :00
8
11
10
-1
20 1-1 :00
8
11
12
-1
20 1-1 :00
8
11
14
-1
:0
1
20
-1
0
8
11
1
-1
6:
00
20 1-1
8
11
1
-1
8
20 1-1 :00
8
11
20
-1
20 1-1 :00
8
11
22
-1
20 1-1 :00
9
11
00
-1
20 1-1 :00
9
11
02
-1
20 1-1 :00
9
11
04
-1
:0
1
20
-1
0
9
11
0
-1
6:
00
20 1-1
9
11
08
-1
20 1-1 :00
9
11
10
-1
20 1-1 :00
9
11
12
-1
20 1-1 :00
9
11
14
-1
20 1-1 :00
9
11
16
-1
:0
1
20
-1
0
9
11
1
-1
8:
00
20 1-1
9
11
2
-1
0:
100
19
22
:0
0
0
CG Corrected NUM_CONSUMED_MSU
DateMSU
& Time
CG Corrected 4HRA
CG Hourly value of 4HRA
CG Limit
SCRT Report
Capacity
1.3% difference due to
truncation instead of
averaging?
PR/SM not counted
Working exactly as expected!
April 18, 2012
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
7
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
Soft Capping: December, 2011 – Cap doesn’t kick in
Capacity 408 MSU
Interval versus 4HRA CPU Consumption:
z196D1 in December, 2011
450
CG cap 312 MSU
400
Utilization above cap for
several intervals in
morning hours
CPU consumption (MSU)
350
300
4HRA computed from
TDS/z never crosses cap
250
Rises to 302.1 MSU
200
150
100
NUM_CONSUMED_MSU is
corrected by multiplying by
number of intervals in one
hour (6), per APAR PK29312
50
CG values include all LPARs but
exclude *PHYSCAL
Peak hourly 4HRA 301
MSU according to TDS/z,
only 297 MSU according
to SCRT
20
11
-1
20 2-1
3
11
00
-1
:0
2
20
-1
0
3
11
0
-1
2:
00
20 2-1
3
11
0
-1
4
20 2-1 :00
3
11
06
-1
20 2-1 :00
3
11
08
-1
20 2-1 :00
3
11
10
-1
20 2-1 :00
3
11
12
-1
20 2-1 :00
3
11
14
-1
:0
2
20
-1
0
3
11
1
-1
6:
00
20 2-1
3
11
1
-1
8
20 2-1 :00
3
11
20
-1
20 2-1 :00
3
11
22
-1
20 2-1 :00
4
11
00
-1
20 2-1 :00
4
11
02
-1
20 2-1 :00
4
11
04
-1
:0
2
20
-1
0
4
11
0
-1
6:
00
20 2-1
4
11
08
-1
20 2-1 :00
4
11
10
-1
20 2-1 :00
4
11
12
-1
20 2-1 :00
4
11
14
-1
20 2-1 :00
4
11
16
-1
:0
2
20
-1
0
4
11
1
-1
8:
00
20 2-1
4
11
2
-1
0:
200
14
22
:0
0
0
CG Corrected NUM_CONSUMED_MSU
DateMSU
& Time
CG Corrected 4HRA
CG Hourly value of 4HRA
CG Limit
SCRT Report
Capacity
1.2% difference due to
truncation instead of
averaging?
PR/SM not counted
Still working exactly as expected!
April 18, 2012
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
8
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
Soft Capping: January, 2012 – Effects of POR
Capacity 408 MSU
Interval versus 4HRA CPU Consumption:
z196D1 in January, 2012
CG cap 312 MSU
450
400
CPU consumption (MSU)
350
300
POR around 00:50 drops
4HRA to 1MSU as
documented
POR captured by SMF70LAC
but not by computed 4HRA
SMF70LAC numbers exclude
*PHYSCAL, GDPS LPAR and
Test system
SMF70LAC catches up
with results computed
from TDS/z around
04:40
250
200
4HRA computed from TDS/z never crosses cap
Rises to 302.1 MSU
150
100
50
NUM_CONSUMED_MSU is
corrected by multiplying by
number of intervals in one
hour (6), per APAR PK29312
CG values include all LPARs but
exclude *PHYSCAL
Utilization above cap for
several intervals
4HRA computed from
TDS/z never crosses cap
Rises to 295.6 MSU
20
12
-0
20 1-3
0
12
00
-0
:0
1
20
-3
0
0
12
0
-0
2:
00
20 1-3
0
12
0
-0
4
20 1-3 :00
0
12
06
-0
20 1-3 :00
0
12
08
-0
20 1-3 :00
0
12
10
-0
20 1-3 :00
0
12
12
-0
20 1-3 :00
0
12
14
-0
:0
1
20
-3
0
0
12
1
-0
6:
00
20 1-3
0
12
1
-0
8
20 1-3 :00
0
12
20
-0
20 1-3 :00
0
12
22
-0
20 1-3 :00
1
12
00
-0
20 1-3 :00
1
12
02
-0
20 1-3 :00
1
12
04
-0
:0
1
20
-3
0
1
12
0
-0
6:
00
20 1-3
1
12
08
-0
20 1-3 :00
1
12
10
-0
20 1-3 :00
1
12
12
-0
20 1-3 :00
1
12
14
-0
20 1-3 :00
1
12
16
-0
:0
1
20
-3
0
1
12
1
-0
8:
00
20 1-3
1
12
2
-0
0:
100
31
22
:0
0
0
CG Corrected NUM_CONSUMED_MSU
CG Hourly value of 4HRA
Capacity
CG SMF70LAC Date & Time
CG Limit
CG Corrected 4HRA MSU
SCRT Report
Peak hourly 4HRA 291
MSU according to TDS/z,
only 285 MSU according
to SCRT
Still working exactly as expected!
April 18, 2012
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
2.1% difference due to
truncation instead of
averaging?
PR/SM not counted
9
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
Soft Capping: February, 2012 – Cap takes effect, unreported
Capacity 408 MSU
Interval versus 4HRA CPU Consumption:
z196D1 in February, 2012
CG cap 312 MSU
450
Utilization above cap for
several intervals
beforehand
400
CPU consumption (MSU)
350
300
4HRA computed from
TDS/z crosses cap
around 17:10
250
200
Rises to 316.5 MSU
150
100
NUM_CONSUMED_MSU is
corrected by multiplying by
number of intervals in one
hour (6), per APAR PK29312
50
CG values include all LPARs
but exclude *PHYSCAL
Peak hourly 4HRA 316
MSU according to TDS/z,
only 303 MSU according
to SCRT
-0
00
20 2-2
6
12
0
-0
4
20 2-2 :00
6
12
06
-0
20 2-2 :00
6
12
08
-0
20 2-2 :00
6
12
10
-0
20 2-2 :00
6
12
12
-0
20 2-2 :00
6
12
14
-0
:0
2
20
-2
0
6
12
1
-0
6:
00
20 2-2
6
12
1
-0
8
20 2-2 :00
6
12
20
-0
20 2-2 :00
6
12
22
-0
20 2-2 :00
7
12
00
-0
20 2-2 :00
7
12
02
-0
20 2-2 :00
7
12
04
-0
:0
2
20
-2
0
7
12
0
-0
6:
00
20 2-2
7
12
08
-0
20 2-2 :00
7
12
10
-0
20 2-2 :00
7
12
12
-0
20 2-2 :00
7
12
14
-0
20 2-2 :00
7
12
16
-0
:0
2
20
-2
0
7
12
1
-0
8:
00
20 2-2
7
12
2
-0
0:
200
27
22
:0
0
-2
6
20
12
02
:
-2
6
-0
2
20
12
-0
2
20
12
00
:
00
0
CG Corrected NUM_CONSUMED_MSU
DateMSU
& Time
CG Corrected 4HRA
CG Hourly value of 4HRA
CG Limit
SCRT Report
Capacity
4.3% difference due to
truncation instead of
averaging?
PR/SM not counted
Capping never shows in SCRT, but capping effect still clear
April 18, 2012
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
10
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
Transition to z196 – Production Site Experience
Before upgrades:
z10ECs, one book with
GCPs, zAAPs and zIIPs
Peak Hour GCP & Total Consumption
Average for 2-3PM EST Business Days Only
Actual demand from Nov. 1, 2011 through Apr. 8, 2012
18,000
Refreshed one of three CECs to z196 on Jan. 22
16,000
Refreshed another CEC to z196 on Feb. 12
14,000
Drops in CPU demand
evident for GCP and Total
utilization
12,000
GCP MIPS
After upgrade: z196s,
one book with GCPs and
zAAP-on-zIIP
10,000
MIPS normalized using
LSPR (1.9 for z10, 1.11
for z196)
8,000
6,000
CPU demand normally
rises to a peak at end of
February, RRSP season
Utilization normally rises
through February to a
peak at month-end
4,000
2,000
*NOTE: GCP capacity and GCP consumption
exclude z**P capacity & utilization.
BCC GCP
April 18, 2012
1Ap
r-
20
12
20
12
1M
ar
-
eb
-2
01
2
1F
1Ja
n20
12
ec
-2
01
1
1D
1N
ov
-2
01
1
0
Driven by transactions
Analysis will look at MIPS
per transaction for
several workload classes
BCC Total
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
CICS
DB2
WebSphere
Batch
11
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
Transition to z196 – Dev/Test/QA Site Experience
Before upgrade: z10EC,
two books with GCPs,
zAAPs and zIIPs
Peak Hour GCP & Total Consumption
Average for 2-3PM EST, Business Days Only
Actual demand from Nov. 1, 2011 through Apr. 8, 2012
7,000
After upgrade: z196, one
book with GCPs and
zAAP-on-zIIP
Refreshed one of tw o CECs to z196 on Sept. 11
6,000
CPU demand rises for
GCP and Total utilization
5,000
MIPS normalized using
LSPR (1.9 for z10, 1.11
for z196)
MIPS
4,000
3,000
Not explained
Harder to analyse D/T/Q
environment
2,000
1,000
*NOTE: GCP capacity and GCP consumption
exclude zAAP capacity & utilization.
SCC GCP
April 18, 2012
20
12
1Ap
r-
20
12
1M
ar
-
eb
-2
01
2
1F
1Ja
n20
12
ec
-2
01
1
1D
ov
-2
01
1
1N
1O
ct
-2
01
1
1Se
p20
11
1Au
g20
11
1Ju
l -2
01
1
1Ju
n20
11
1M
ay
-2
01
1
0
Analysis will look at MIPS
per transaction for
several workload classes
SCC Total
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
CICS
DB2
WebSphere
Batch
12
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
Summary
HiperDispatch
Drives performance benefit, but…
Requires extra vigilance in setting LPAR weights (GCP, zAAP, zIIP)
Requires careful review of WLM profiles
Has difficulty where normally low-utilization systems (e.g. GDPS K-systems) need high weights
Soft Capping
Performs as expected – yay!
Minor added benefits (SCRT calculation, PR/SM left out)
Transition to z196
Performance expectations based on LSPR for z/OS 1.11
Includes HiperDispatch
Performing better than expected
Detailed analysis pending
Questions?
April 18, 2012
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
13
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
About the Author
Jonathan Gladstone is an IT Capacity Management professional with well over 20
years experience in computer systems management and planning. He has been at
BMO Financial Group for almost 15 years, and working in capacity planning for over a
decade. He is BMO’s representative on Georgian College’s Computer Studies Advisory
Committee, is certified in ITIL v2 & v3 fundamentals and holds a B.A.Sc. degree in
Electrical Engineering from the University of Toronto and P.Eng. certification from the
Province of Ontario.
Jonathan wishes to thank many colleagues who helped with this presentation, in
particular Steve Pritchard (BMO Financial Group), Horace Dyke (independent
consultant) and Don Mackay (IBM Canada).
Jonathan can be found on LinkedIn, Twitter (@jbglad59) and on his own (largely I/T)
blog, http://alwaysgrumpy.wordpress.com.
April 18, 2012
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
14