Soft Capping in z196 Processors - Individual CMG Regions and SIGs

Download Report

Transcript Soft Capping in z196 Processors - Individual CMG Regions and SIGs

Technology & Operations - Enterprise Infrastructure
Enterprise Platform Services
Customer Experiences with
HiperDispatch & Soft Capping in
IBM Mainframe Systems
Prepared for presentation at CMG Canada
on April 18, 2012
Jonathan Gladstone, P.Eng.
Senior Tech. Specialist, Mainframe & Mid-range Systems Capacity Planning
Technology & Operations
BMO Financial Group
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
Agenda
 HiperDispatch
 A high-level discussion of BMO’s implementation of this feature
 Soft Capping
 Detailed presentation with circles and arrows and a paragraph… and apologies to Arlo Guthrie
 BONUS TOPIC!
Transition to z196 – Performance Implications
 Just a preview; detailed analysis not yet complete
April 18, 2012
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
2
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
HiperDispatch: how is it supposed to work?
 HiperDispatch aligns workloads “vertically” on physical CPs
 Builds a strong affinity between logical and physical processors - details available in zJournal
article viewable at www.mainframezone.com
 https://www.mainframezone.com/article/hiperdispatch-a-conceptual-overview
 Applies to all processors by type, when logically shared: zAAPs, zIIPs, GCPs
 VH (100%), VM (50-99%), VL (<50%; discretionary) by weight, but avoiding VL where possible
 Purports to improve performance by reducing latency times, e.g. for CP state re-loads
 Keep data and instructions in lowest-level (fastest) cache
 Performance improvement claims vary depending on configuration
 Largest (8-10%) for large, multi-book CECs with many large systems sharing logical resources
extensively; least (0-2%) for single-book CECs with few systems with limited sharing.
 Performance improvements baked in to LSPR ratings for z/OS 1.11 and up
 Turning off HiperDispatch now yields less than optimal performance.
April 18, 2012
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
3
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
HiperDispatch: how is it applied and working at BMO?
 Turned on when we went to z/OS 1.11
 Default in z/OS 1.11 and up is “ON” for HiperDispatch; we left it that way
 Nasty surprises!
 Specialty workload flowing back to GCPs
 VL engines left parked while workloads not meeting WLM target performance
 Causes?
 Investigated multiple changes: new z/OS, zAAP-on-zIIP, HiperDispatch
 Fixes?
 Set zIIP (and zAAP) weights properly – never mattered before
 Changed GCP weights to minimize impacts
 Reviewing WLM profiles
 Results?
 Things working much better now (see third discussion regarding z196 performance)
 Conclusions
 HiperDispatch appears to yield performance benefits claimed by IBM, but…
 Weights are now important all the time (not just when box is maxed out)
 WLM profiles are more important than ever
 Some situations still difficult (e.g. K-LPARs for GDPS)
April 18, 2012
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
4
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
Soft Capping: how is it supposed to work?
 Soft Capping is available for single LPARs or “Capacity Groups” (CGs) of LPARs
 Available since 2005 or earlier
 Applies only to GCPs
 Uses same MSU ratings as SCRT reports for VWLC
 Limits CPU utilization of LPAR or Group based on four-hour rolling average (4HRA)
 Checked by PR/SM every 5 minutes
 Utilization can go as high as enabled capacity until 4HRA hits cap; then PR/SM will
limit utilization until 4HRA drops below cap again
 A little more complicated for Capacity Groups: cap is applied to individual software products
rather than for LPARs, and LPAR weights are used as needed
 4HRA can exceed cap
 After Cap is reached, utilization at cap will often increase 4HRA for a few intervals until it
settles back
 Reports suggest 4HRA will exceed cap by about 3% in these circumstances
 IBM VWLC charge is based on cap rather than on actual utilization
 True for whole CPC if CG includes all LPARs
April 18, 2012
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
5
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
Soft Capping: how is it applied and working at BMO?
 In place at BMO in four instances (three current)
 One z10 Production CEC from Jul/09 through Jan/10
 One z196 Dev/Test/QA CEC from Sep/11 through present
 Two z196 Production CECs from Jan-Feb/12 through present
 SCRT reports show one instance of capping at BMO
 Nov. 19, 2011 in z196D1: SCRT report shows MSU utilization hit 321 MSU on cap of 312 MSU
in capacity of 408 MSU
 Analysis based on data from TDS/z and from SCRT reports as submitted to IBM
 All IBM tools
 Interesting results, with differences to customer benefit
 Data clearly show Soft Capping working, as documented
 No charge for over-utilization, as documented
 Unexplained difference between computed value and SCRT report is to customer advantage
April 18, 2012
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
6
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
Soft Capping: November, 2011 – Cap takes effect
 Capacity 408 MSU
Interval versus 4HRA CPU Consumption:
z196D1 in November, 2011
450
 CG cap 312 MSU
400
 Utilization above cap for
several intervals
beforehand
CPU consumption (MSU)
350
300
 4HRA computed from
TDS/z crosses cap
around 01:20
250
200
 Rises to 325.7 MSU
150
 Peak hourly 4HRA 325
MSU according to TDS/z,
only 321 MSU according
to SCRT
100
50
NUM_CONSUMED_MSU is
corrected by multiplying by
number of intervals in one
hour (6), per APAR PK29312
CG values include all LPARs but
exclude *PHYSCAL
20
11
-1
20 1-1
8
11
00
-1
:0
1
20
-1
0
8
11
0
-1
2:
00
20 1-1
8
11
0
-1
4
20 1-1 :00
8
11
06
-1
20 1-1 :00
8
11
08
-1
20 1-1 :00
8
11
10
-1
20 1-1 :00
8
11
12
-1
20 1-1 :00
8
11
14
-1
:0
1
20
-1
0
8
11
1
-1
6:
00
20 1-1
8
11
1
-1
8
20 1-1 :00
8
11
20
-1
20 1-1 :00
8
11
22
-1
20 1-1 :00
9
11
00
-1
20 1-1 :00
9
11
02
-1
20 1-1 :00
9
11
04
-1
:0
1
20
-1
0
9
11
0
-1
6:
00
20 1-1
9
11
08
-1
20 1-1 :00
9
11
10
-1
20 1-1 :00
9
11
12
-1
20 1-1 :00
9
11
14
-1
20 1-1 :00
9
11
16
-1
:0
1
20
-1
0
9
11
1
-1
8:
00
20 1-1
9
11
2
-1
0:
100
19
22
:0
0
0
CG Corrected NUM_CONSUMED_MSU
DateMSU
& Time
CG Corrected 4HRA
CG Hourly value of 4HRA
CG Limit
SCRT Report
Capacity
 1.3% difference due to
truncation instead of
averaging?
 PR/SM not counted
 Working exactly as expected!
April 18, 2012
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
7
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
Soft Capping: December, 2011 – Cap doesn’t kick in
 Capacity 408 MSU
Interval versus 4HRA CPU Consumption:
z196D1 in December, 2011
450
 CG cap 312 MSU
400
 Utilization above cap for
several intervals in
morning hours
CPU consumption (MSU)
350
300
 4HRA computed from
TDS/z never crosses cap
250
 Rises to 302.1 MSU
200
150
100
NUM_CONSUMED_MSU is
corrected by multiplying by
number of intervals in one
hour (6), per APAR PK29312
50
CG values include all LPARs but
exclude *PHYSCAL
 Peak hourly 4HRA 301
MSU according to TDS/z,
only 297 MSU according
to SCRT
20
11
-1
20 2-1
3
11
00
-1
:0
2
20
-1
0
3
11
0
-1
2:
00
20 2-1
3
11
0
-1
4
20 2-1 :00
3
11
06
-1
20 2-1 :00
3
11
08
-1
20 2-1 :00
3
11
10
-1
20 2-1 :00
3
11
12
-1
20 2-1 :00
3
11
14
-1
:0
2
20
-1
0
3
11
1
-1
6:
00
20 2-1
3
11
1
-1
8
20 2-1 :00
3
11
20
-1
20 2-1 :00
3
11
22
-1
20 2-1 :00
4
11
00
-1
20 2-1 :00
4
11
02
-1
20 2-1 :00
4
11
04
-1
:0
2
20
-1
0
4
11
0
-1
6:
00
20 2-1
4
11
08
-1
20 2-1 :00
4
11
10
-1
20 2-1 :00
4
11
12
-1
20 2-1 :00
4
11
14
-1
20 2-1 :00
4
11
16
-1
:0
2
20
-1
0
4
11
1
-1
8:
00
20 2-1
4
11
2
-1
0:
200
14
22
:0
0
0
CG Corrected NUM_CONSUMED_MSU
DateMSU
& Time
CG Corrected 4HRA
CG Hourly value of 4HRA
CG Limit
SCRT Report
Capacity
 1.2% difference due to
truncation instead of
averaging?
 PR/SM not counted
 Still working exactly as expected!
April 18, 2012
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
8
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
Soft Capping: January, 2012 – Effects of POR
 Capacity 408 MSU
Interval versus 4HRA CPU Consumption:
z196D1 in January, 2012
 CG cap 312 MSU
450
400
CPU consumption (MSU)
350
300
 POR around 00:50 drops
4HRA to 1MSU as
documented
POR captured by SMF70LAC
but not by computed 4HRA
SMF70LAC numbers exclude
*PHYSCAL, GDPS LPAR and
Test system
 SMF70LAC catches up
with results computed
from TDS/z around
04:40
250
200
4HRA computed from TDS/z never crosses cap
Rises to 302.1 MSU
150
100
50
NUM_CONSUMED_MSU is
corrected by multiplying by
number of intervals in one
hour (6), per APAR PK29312
CG values include all LPARs but
exclude *PHYSCAL
 Utilization above cap for
several intervals
 4HRA computed from
TDS/z never crosses cap
 Rises to 295.6 MSU
20
12
-0
20 1-3
0
12
00
-0
:0
1
20
-3
0
0
12
0
-0
2:
00
20 1-3
0
12
0
-0
4
20 1-3 :00
0
12
06
-0
20 1-3 :00
0
12
08
-0
20 1-3 :00
0
12
10
-0
20 1-3 :00
0
12
12
-0
20 1-3 :00
0
12
14
-0
:0
1
20
-3
0
0
12
1
-0
6:
00
20 1-3
0
12
1
-0
8
20 1-3 :00
0
12
20
-0
20 1-3 :00
0
12
22
-0
20 1-3 :00
1
12
00
-0
20 1-3 :00
1
12
02
-0
20 1-3 :00
1
12
04
-0
:0
1
20
-3
0
1
12
0
-0
6:
00
20 1-3
1
12
08
-0
20 1-3 :00
1
12
10
-0
20 1-3 :00
1
12
12
-0
20 1-3 :00
1
12
14
-0
20 1-3 :00
1
12
16
-0
:0
1
20
-3
0
1
12
1
-0
8:
00
20 1-3
1
12
2
-0
0:
100
31
22
:0
0
0
CG Corrected NUM_CONSUMED_MSU
CG Hourly value of 4HRA
Capacity
CG SMF70LAC Date & Time
CG Limit
CG Corrected 4HRA MSU
SCRT Report
 Peak hourly 4HRA 291
MSU according to TDS/z,
only 285 MSU according
to SCRT
 Still working exactly as expected!
April 18, 2012
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
 2.1% difference due to
truncation instead of
averaging?
 PR/SM not counted
9
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
Soft Capping: February, 2012 – Cap takes effect, unreported
 Capacity 408 MSU
Interval versus 4HRA CPU Consumption:
z196D1 in February, 2012
 CG cap 312 MSU
450
 Utilization above cap for
several intervals
beforehand
400
CPU consumption (MSU)
350
300
 4HRA computed from
TDS/z crosses cap
around 17:10
250
200
 Rises to 316.5 MSU
150
100
NUM_CONSUMED_MSU is
corrected by multiplying by
number of intervals in one
hour (6), per APAR PK29312
50
CG values include all LPARs
but exclude *PHYSCAL
 Peak hourly 4HRA 316
MSU according to TDS/z,
only 303 MSU according
to SCRT
-0
00
20 2-2
6
12
0
-0
4
20 2-2 :00
6
12
06
-0
20 2-2 :00
6
12
08
-0
20 2-2 :00
6
12
10
-0
20 2-2 :00
6
12
12
-0
20 2-2 :00
6
12
14
-0
:0
2
20
-2
0
6
12
1
-0
6:
00
20 2-2
6
12
1
-0
8
20 2-2 :00
6
12
20
-0
20 2-2 :00
6
12
22
-0
20 2-2 :00
7
12
00
-0
20 2-2 :00
7
12
02
-0
20 2-2 :00
7
12
04
-0
:0
2
20
-2
0
7
12
0
-0
6:
00
20 2-2
7
12
08
-0
20 2-2 :00
7
12
10
-0
20 2-2 :00
7
12
12
-0
20 2-2 :00
7
12
14
-0
20 2-2 :00
7
12
16
-0
:0
2
20
-2
0
7
12
1
-0
8:
00
20 2-2
7
12
2
-0
0:
200
27
22
:0
0
-2
6
20
12
02
:
-2
6
-0
2
20
12
-0
2
20
12
00
:
00
0
CG Corrected NUM_CONSUMED_MSU
DateMSU
& Time
CG Corrected 4HRA
CG Hourly value of 4HRA
CG Limit
SCRT Report
Capacity
 4.3% difference due to
truncation instead of
averaging?
 PR/SM not counted
 Capping never shows in SCRT, but capping effect still clear
April 18, 2012
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
10
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
Transition to z196 – Production Site Experience
 Before upgrades:
z10ECs, one book with
GCPs, zAAPs and zIIPs
Peak Hour GCP & Total Consumption
Average for 2-3PM EST Business Days Only
Actual demand from Nov. 1, 2011 through Apr. 8, 2012
18,000
Refreshed one of three CECs to z196 on Jan. 22
16,000
Refreshed another CEC to z196 on Feb. 12
14,000
 Drops in CPU demand
evident for GCP and Total
utilization
12,000
GCP MIPS
 After upgrade: z196s,
one book with GCPs and
zAAP-on-zIIP
10,000
 MIPS normalized using
LSPR (1.9 for z10, 1.11
for z196)
8,000
6,000
 CPU demand normally
rises to a peak at end of
February, RRSP season
Utilization normally rises
through February to a
peak at month-end
4,000
2,000
*NOTE: GCP capacity and GCP consumption
exclude z**P capacity & utilization.
BCC GCP
April 18, 2012
1Ap
r-
20
12
20
12
1M
ar
-
eb
-2
01
2
1F
1Ja
n20
12
ec
-2
01
1
1D
1N
ov
-2
01
1
0
 Driven by transactions
 Analysis will look at MIPS
per transaction for
several workload classes
BCC Total
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG




CICS
DB2
WebSphere
Batch
11
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
Transition to z196 – Dev/Test/QA Site Experience
 Before upgrade: z10EC,
two books with GCPs,
zAAPs and zIIPs
Peak Hour GCP & Total Consumption
Average for 2-3PM EST, Business Days Only
Actual demand from Nov. 1, 2011 through Apr. 8, 2012
7,000
 After upgrade: z196, one
book with GCPs and
zAAP-on-zIIP
Refreshed one of tw o CECs to z196 on Sept. 11
6,000
 CPU demand rises for
GCP and Total utilization
5,000
 MIPS normalized using
LSPR (1.9 for z10, 1.11
for z196)
MIPS
4,000
3,000
 Not explained
 Harder to analyse D/T/Q
environment
2,000
1,000
*NOTE: GCP capacity and GCP consumption
exclude zAAP capacity & utilization.
SCC GCP
April 18, 2012
20
12
1Ap
r-
20
12
1M
ar
-
eb
-2
01
2
1F
1Ja
n20
12
ec
-2
01
1
1D
ov
-2
01
1
1N
1O
ct
-2
01
1
1Se
p20
11
1Au
g20
11
1Ju
l -2
01
1
1Ju
n20
11
1M
ay
-2
01
1
0
 Analysis will look at MIPS
per transaction for
several workload classes
SCC Total
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG




CICS
DB2
WebSphere
Batch
12
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
Summary
 HiperDispatch




Drives performance benefit, but…
Requires extra vigilance in setting LPAR weights (GCP, zAAP, zIIP)
Requires careful review of WLM profiles
Has difficulty where normally low-utilization systems (e.g. GDPS K-systems) need high weights
 Soft Capping
 Performs as expected – yay!
 Minor added benefits (SCRT calculation, PR/SM left out)
 Transition to z196
 Performance expectations based on LSPR for z/OS 1.11
 Includes HiperDispatch
 Performing better than expected
 Detailed analysis pending
Questions?
April 18, 2012
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
13
Technology & Technology
Operations - Enterprise Infrastructure
Enterprise Platform Services
About the Author
 Jonathan Gladstone is an IT Capacity Management professional with well over 20
years experience in computer systems management and planning. He has been at
BMO Financial Group for almost 15 years, and working in capacity planning for over a
decade. He is BMO’s representative on Georgian College’s Computer Studies Advisory
Committee, is certified in ITIL v2 & v3 fundamentals and holds a B.A.Sc. degree in
Electrical Engineering from the University of Toronto and P.Eng. certification from the
Province of Ontario.
 Jonathan wishes to thank many colleagues who helped with this presentation, in
particular Steve Pritchard (BMO Financial Group), Horace Dyke (independent
consultant) and Don Mackay (IBM Canada).
 Jonathan can be found on LinkedIn, Twitter (@jbglad59) and on his own (largely I/T)
blog, http://alwaysgrumpy.wordpress.com.
April 18, 2012
Customer Experiences with HiperDispatch & Soft Capping in IBM Mainframe Systems - J.Gladstone, BMOFG
14