Transcript Document

zSeries Scalability and Availability
improvements
Latest hardware and software advancements
Fri Sep 26, 2014: 10:15-11:30
Track 2
Speaker: Donald Zeunert (BMC Software)
Abstract / Summary
• Recent IBM system and subsystem scalability
improvements allow significant consolidation.
– This save CPU and Memory resources
– Avoids creating more server instances to manage
• Bottlenecks - Faster CPUs mean everything else
relatively slower
– Need to look at what are the bottlenecks and what did we ignore
when it was released
Areas of improvements
• Performance / Scalability
– Hardware, Microcode, PR/SM
– Software – z/OS, CICS, DB2
• Availability
– Software / hardware
IT Budgets – Save $
• Need to look at
technology
impact on
Software
License Costs?
• New hardware
or software
costs money
– Does it save
me any?
Relative speed of devices
CPU faster everything else relatively
slower
• RAM is now slow
– > CHIP Cache on
z196, EC12
• Paging devices are
slower
– EC12 Flash memory
Topic 1 – Hardware /
Microcode Performance
• zEC12 Benefits
– Workloads may use fewer MSUs
– Warning track
– Other exclusives
• CF Sharing - Thin interrupts
• Hiperdispatch
– Why do we need ?
– Parked engines
z10 vs zEC12 Chip Cache
• Each
generation
CPs getting
– more levels
of chip
cache
– More MB at
each level
• Chip cache
hits = fewer
CPU secs
zEC12
RNI = Ratio of Chip /
local sourcing to total
sourcing (Local +
Remote)
Disclaimer – “Your Mileage
may vary”
Migrating from gasoline Exotic to electric supercar will improve MPG
But what if you also have an 18-wheeler
zPCR Workload RNI
Characteristics
zEC12 – Workloads may use fewer MSUs
– Onlines may get better response, use significantly less CPU secs
and MSUs. But does this help my hardware and software bills?
• Yes, if onlines are driving your box capacity and 4HRA
– Step 1 – Run or find your last Sub-Capacity Reporting Tool (SCRT)
report and determine what hour / shift is the peak(s).
– Step 2 – Determine 4HRA contributors % Batch, CICS, etc.
– Step 3 – Look at zPCR chart for RNI (High?) of your workload
– Step 4 – If workload has high RNI run zCPR model upgrade to zEC2
using your Hardware Capacity SMF 113s Cache stats
RNI by Hour correlated with 4HRA
Day shift
OLTP High
RNI –
matches
4HRA
Peak 4HRA
High RNI
at peak
RNI
3:00 AM
Night shift –
4HRA lower
RNI, not as
CHIP cache
constrained
MSU
1:00PM
zEC12 – Warning Track Interrupt
More consistent
response time
PR/SM CP
Stealing
Old Way
New Way - Warning Track Interupt
z/OS is not directly involved in EC12 presents z/OS "Warning Track"
the undispatching of a logical interrupt when it wants to undispatch a
processor from physical CP. LCP logical processor.
saved status info is stored off This gives z/OS a grace period to
z/OS
undispatch (save status) the workunit.
Pro
If undispatched in grace period then z/OS
can dispatch it on another logical
processor.
Con
Any pending work on that LCP If grace period expires before z/OS returns
would remain undispatched logical CP to PR/SM, PR/SM undispatches
until that LCP is put back on
the logical CP. Then it works like before.
any PCP
Warning Track - %
Successful and Rate
• High rate of WTIs / second – (typical low weight / overcommited)
• Logical removed from Physical successfully > 98% of the time
• making work eligible to be re-dispatched on different logical.
• Provides more consistent OLTP response time
WTI Metrics are in SMF70s, support to collect in;
CMF PTFs BQM0868 (5.8) or BQM0867 (5.9) or
RMF APAR OA37803
zEC12 – Other exclusives
• Storage Class Memory (SCM) – AKA Flash Express
– PCIe - 1.4 TB of memory per mirrored card pair
• PR/SM Absolute Capping
– Expressed in terms of 1/100ths of a processor (0.01 to 255.0)
• Smoother capping
• 4HRA Max Spike(s)
• Data Compression Express (zEDC)
– (PCIe) device – Offload software compression
• SMF Logstreams
– Get Softinflate (OA41156 ) if machines w/o zEDC
• Numerous other exploiters
Type
CF Thin interrupts – Improved
Service times
Parm
CF Polling
Dynamic CF Dispatching
Coupling Thin Interrupts
DYNDISP=NO
DYNDISP=YES
DYNDISP=THIN
LPAR Time Slicing
Dispatch
algorithm
Pros
Cons
CF Time based
Event driven Dispatching algorithm for CF
CF releases engine if no
engine sharing
work left.
Best performance More effective engine
Improved Sync
for dedicated CF, sharing than polling
Service times.
not suitable for
loop. CF now
Thin interrupt to dispatch
shared CF
managing slice.
processor when new work
arrives
Holds CF processor Uses whole slice even
Requires z/OS 1.12 or
yields very poor / if no work, can not be
higher w/ PTFs and CFCC
erratic Sync Service interupted / stolen
microcode level 19
times
High Sync times = GCP Spin loop = wasted MSUs = contribute 4HRA
Logical / Physical Guaranteed
SJS* LPAR weight < 10% of 16 CPs so
guaranteed < 1.6 CPs, actually max is 2 -> 3
These LPARs using about twice guarantee
Using White Space /
more than guaranteed
share.
Impacting 4HRA?
Guaranteed Share
calculations
Name LP CT
VMR
4
SYSP
3
VM4
2
VM5
2
VM9
2
DB2A
3
DB2B
3
SYSN
3
SJSE
3
SYSM
3
SJSB
2
SJSC
3
SJSD
3
ESAJ
3
IMSA
3
SJSH
2
IOC2
1
Total
Wgt
500
500
500
500
500
400
400
400
300
300
300
250
250
200
150
50
50
5550
Rel Shr Guar CPs %MSU Over commit
9.01%
1.44
36%
178%
9.01%
1.44
48%
108%
9.01%
1.44
72%
39%
9.01%
1.44
72%
39%
9.01%
1.44
72%
39%
7.21%
1.15
38%
160%
7.21%
1.15
38%
160%
7.21%
1.15
38%
160%
5.41%
0.86
29%
247%
5.41%
0.86
29%
247%
5.41%
0.86
43%
131%
4.50%
0.72
24%
316%
4.50%
0.72
24%
316%
3.60%
0.58
19%
420%
2.70%
0.43
14%
594%
0.90%
0.14
7%
1288%
0.90%
0.14
14%
594%
• %MSU is % of 1 CP so
work runs 100-%MSU
slower than if full CP
– VMR CPs are 64%
slower than full due
to over commit
– Unless using
HiperDispatch with
parked CPs
– Warning track helps
• If not over commit
can’t exceed # of CPs to
LPAR
– How much white
space do you need?
HiperDispatch – view of over
commit
Guaranteed
0.86
0.72
1.15
Lots of Parked Med/Low = slower effective MSUs and overhead
Performance / Scalability
improvements in Software
How does IBM charge for
sub-capacity?
Sub-Cap Reporting Tool (SCRT)
4448 MSUs - CEC
utilization of a LPAR
w/ DB2 licensed
199 MSUs – DB2
Service class CPU
(not complete DB2)
But not DB2 peak
• IBM MLC costs
30% of IT budget
• Reducing 4HRA
MSUs lowers bills
– Paid on peak LPAR
where product(s)
run
– Less GCP MSUs
• zIIP
– LPAR consolidation
DB2 Software Performance /
Scalability
• DB2 v10 / 11 – ( $/ MSU higher, but fewer MSUs?)
– Max # concurrent threads,
– zIIP offload sequential prefetch (batch)
• Watch for zIIP overcommit / zIIP on GCP
– IDAA – offload and WLM goal awareness
– V11 DDF Enclave classification enhancements
• Also requires z/OS 2.1 WLM
• Package Name: 128 characters (instead of 8)
• Procedure Name: 128 characters (instead of 18)
DB2 V10 – Seq prefetch zIIP
overcommit
• DB2 V10 Sequential
prefetch zIIP
eligibility can
overcommit zIIPs
• Monitor zIIP > 50%
potential for GCP
• Monitor DB2 WLM
Service classes w/
SMF72s or DBM1
jobs w/ SMF30s for
zIIP eligible on GCP
DB2 V10 – 10x more Threads per
SSID
• Higher concurrent threads = fewer SSIDs / LPAR
Software Performance /
Scalability
• CICS –
– TOR CPU Starvation
– Threadsafe vs QR, fewer regions = less function shipping
= Save CPU & memory
• Convert via 80/20 rule, and whatever is left on QR runs faster
• Target with CICS stats - #context switches, CPU secs
CICS Scalability - L8 running
threadsafe
Lower
QR
TCB
CPU
Originally DB2 was only OPENAPI TRUE
Now MQ, CICS/DLI, native sockets
• Non-threadsafe program
calls DB2, CICS switches
from the QR TCB to an
open TCB, and back to
after DB2 request
• Threadsafe Program stays
on Open TCB until nonthreadsafe CICS API call
• Force to OTE / L8 w/o
OPENAPI TRUE call
– CONCURRENCY(THREADS
AFE) and API=OPENAPI
CICS TOR CPU Starvation / responsiveness
• Problem from TORs
and AORs running at
the same dispatch
priority.
– AORs heavily
consumes CPU.
– TORs need to wait
too long to receive
work and return
results to the caller
• Especially when
large # of
transactions enter
AORs w/o TOR
26
• Old Circumventions: Move TORs
to a service class with higher
importance than AORs
– Option 1 - : Exempt all regions from being
managed by response time goals and classify
TORs to a service class with higher
importance than AORs.
• Disadvantages:
• Loose WLM Server management of variable
dispatching priorities, memory protection,
etc.
• No response time data available in Service
class reports
– Option 2 - Exempt only AORs and move them
to a service class with lower importance than
the CICS service classes with response time
goals.
• Disadvantage:
• WLM Server Mgmt lost
• WLM SrvCls BTE reports for highest CPU
consumption eliminated.
CICS – Work Manager / Consumer Model
Allows CICS to use
similar model to WAS
with DB2 / DDF.
WLM classification both
from APAR OA35428
•
•
•
Use NEW WLM SrvCls classification
option “BOTH” for TORs
Define a STC service class for
TORs which has a higher
importance than the AORs response
time goals
Retain “Manage Regions by Goals
of Transaction” for AORs.
•
•
27
TORs managed towards the goals of the STC’s
service class
– WLM ensures bookkeeping of transaction
completions for CICS response time for
service class
– CICS transactions are managed towards CICS
response time goals and the
AORs Response time goal management
unchanged
Software Performance /
Scalability
• IMS 13 - 117K trans / second ,
– 4x Max # PSTs,
– IMS Connect WLM routing,
• z/OS –
– CF "fair queueing" algorithm for queued CF requests
• Selection FIFO prior to APAR (OA41203) z/OS 1.12+
• Burst to single structure could negatively affect all structures.
• Now allows low volume high priority signaling to get
processed
– SMF logstreams, SMF Compress (zEDC),
– VSAM RLS based User Catalogs (enqueues < CPU)
• IP - RoCE (Remote Direct Memory Access over
Converged Ethernet)
z/OS 2.1 VSAM RLS User
Catalog Benchmarks
Non-VSAM Delete Performance
RLS Improve %
NonRLS RLS
Elapsed
Min
CPU Sec
80.42
1269.3
8.42
298.7
89.51 • Help batch elapsed
76.46 time when high # of
temporary cataloged
datasets
Non-VSAM Define Performance
RLS Improve %
NonRLS
RLS
Elapsed
Min
CPU Sec
48.84
685.6
21.42
130.8
56.13
76.46
CPU measured in the CATALOG, GRS, SMSVSAM,
and XCFAS address spaces. Source:
Terri Menendez of IBM RLS/VSAM/Catalog R&D
– Reduce batch
window squeeze
– Possibly reduce
4HRA MSUs by
moving peak
Monitor VSAM RLS for User
Catalogs
• Monitor z/OS Health
checkers for RLS
– Enable RLS HCs
– 8 RLS Msgs that go
to MVS console
– CF IGWLOCK00
• Monitor RLS
– See SHARE RLS
Performance
sessions for what to
monitor
• True / False
contention
• DSN false
invalidates
RLS stats in SMF 42s, BP in Subtype 19
Availability improvements
from Hardware / Software
Topics - Availability
• z/OS –
– V2.1 Serial Coupling Facility structure rebuild
• Processing, re-designed to help improve performance and
availability by rebuilding coupling facility structures more
quickly and in priority order.
– Previously all structures rebuilt in parallel causing contention
– Now critical systems structures recovered 1st and other
structures optionally prioritized by policy
– zAware – Analytics – Detects anomalies avoid outages
• MQ ShrQ CF full avoidance –
– EC12 Storage Class Memory (SCM) (AKA- Flash Memory)
• May hurt performance as slower than CF memory
Questions
SHARE Sessions with more
details
• 15841: EWCP: Project Open and IBM ATS Hot Topics
– Monday, August 4, 2014: 1:30 PM-2:30 PM Room 303
– Speakers: Kathy Walsh(IBM Corporation)
• 15806: IBM zEnterprise EC12 and BC12 Update
– Tuesday, August 5, 2014: 10:00 AM-11:00 AM Room 310
– Speaker: Harv Emery(IBM Corporation)
• 15105: z/OS Parallel Sysplex z/OS 2.1 Update (Annaheim)
– https://share.confex.com/share/122/webprogram/Session15105.html
• 14142: Unclog Your Systems with z/OS 2.1 – Something New
and Exciting for Catalog (VSAM RLS UCATs) (Boston)
– http://proceedings.share.org/conference/abstract.cfm?abstract_id=268
22
谢谢
Backup Slides
z/OS 1.12 – Servers w/ no
active enclaves
• Controlled by new IEAOPT Parameter
• ManageNonEnclaveWork = {No |Yes}
– No: (default) Non enclave work is managed based on the most
important enclave.
• Doesn’t work well when no active enclaves and other work to
do
• Example; Significant work unrelated to an enclave:
– Garbage collection for a JVM (WAS)
– Yes: (new / recommended) Non-enclave work is managed
towards the goals of the address space external service class
• Enclave managed address spaces service class goals /
importance is more important than it used to be.
– Verify high enough before switching to Yes
SMF70 - Warning-TrackInterrupt Metrics
SMF RECORD 70 SUBTYPE 1
Field Name
The number of times PR/SM issued a warning-track
SMF70WTS
interruption to a logical processor and z/OS was able
to return the logical processor within the grace period.
The number of times PR/SM issued a warning-track
SMF70WTU
interruption to a logical processor and z/OS was
unable to return the logical processor within the grace
period.
Amount of time in milliseconds that a logical processor SMF70WTI
was yielded to PR/SM due to warning-track
processing.
Offset
x50
x54
x58
Metric of interest
% WTI Successful = SMF70WTS / (SMF70WTS + SMF70WTU)
CMF PTFs BQM0868 (5.8) or BQM0867 (5.9)
RMF APAR OA37803
SMF 42 – SMSVSAM - VSAM
RLS
• Subtype 15: Storage Class Response Time Summary
• Subtype 16: Dataset Response Time Summary
– Only generated if DSN Monitoring is enabled
• using the V SMS,MONDS command
– Remembered by SMSplex, not available via Parm
• Subtype 17: Coupling Facility Lock Structure Usage
• Subtype 18: CF Cache Partition Usage
• Subtype 19: Local Buffer Manager LRU Statistics