Capacity Purchase Planning
Download
Report
Transcript Capacity Purchase Planning
Edward Jones IS Capacity Planning and
Performance Management
Jim Poletti
October 23, 2007
About Edward Jones. . .
• Full service investment firm
• 10,000+ branches – US, Canada, UK
• 1 "broker" and 1 branch office administrator
per branch
• Land-line WAN – DSL or T1
• St Louis datacenter is hub for most traffic
• Tempe datacenter primarily DR for mainframe
• 21,000 users signed on to CICS at high-water
IS Capacity Planning & Performance Management
Rich Unnerstall (Director – Data Center Operations)
Art Morlock (Department Leader)
• Jim Poletti (MF Performance Analyst)
• Gerry Oliver (MF Performance
Analyst)
• Greg Volk (Network Performance
Analyst)
• Rick Pranger (Open Systems
Performance Analyst)
• Dwayne Allen (Open Systems
Performance Analyst)
• Tom Siech (Load Tester)
• Brandy Brown (Load Tester)
St. Louis Mainframe Hardware
•
•
•
•
•
•
All LPARs run on 1 physical mainframe
IBM Z9 2094-707 – 3516 MIPs – Z/os 1.7
80 GB memory
40 TB DASD – EMC Raid -1 and -7, 5 Ms
Older symmetrix – replacing with DMX-4
Data replication to Tempe using SRDF
CPU by LPAR
Production Environment/LPAR
• 1 LPAR (no data-sharing SYSPLEX yet)
• 25 CICS regions – 19 AORs, 5 TORs,1 FOR
• 32 Million CICS transactions/day = 7 million
user "enters"
• DB2 – 1 subsystem
• IDMS – 5 regions, 15 million run units/day
• RRDF replication in DB2 and IDMS to Tempe
Responsibilities
• Assure system performance and scalability.
• Provide capacity planning support for
purchasing decisions.
• Tune the mainframe hardware "till the
wheels come off", then buy capacity.
• Hotline, war room participation.
• Performance Testing.
Early Morning "System Checks"
•
•
•
•
•
•
•
•
•
Check system "barometers" from yesterday
Check performance graphs and reports
CICS transactions – Volume, CPU, Response
LPAR CPU
Memory
DASD
DB2
IDMS
Development response time – TSO, compiles
Houston, we have a problem !
• Go into detective mode
• Start at high level, look at service classes
within LPAR for abnormalities
Daily Workload Statistics
For 9:30-10:30 on Wed, Oct 17, 2007
Compared to Prior 4 Wednesdays
Service
CPU
CPU
Change
%
Real
Real
Class
Util
Util
in
Change
Memory
Memory
17-Oct
Prior 4
CPU
CPU
Gb
Prior 4
Wednesdays
Util
Wednesdays
BAT_HOT
0.3
0.3
0
-8
7.6
8.6
BAT_1
1.6
1.5
0.1
5
20.1
15.7
BAT_2
3.6
3.6
0
1
52
126.2
CICS_1
11.8
11.2
0.6
6
1490
1490
CICS_2
33.4
34.5
-1.2
-3
2037
2246
CICS_3
0.6
0.8
-0.2
-27
315.5
352.5
DB2_HI
1.6
1.8
-0.2
-11
6648
6636
DB2_LO
0.6
0.6
-0.1
-11
21.9
25.5
11.3
11.9
-0.6
-5
1390
1398
MQSERIES
0.3
0.2
0.1
35
775
418.7
NEWWORK
0
0
0
-44
0
IDMS
Dig deeper into details of the workload
Program
SUM CPU
CICS
+DB2
CPU
%
DB2
DB2
Pct
Resp
Resp
Name
Time
CPU
Time
Change
CPU
Time
Change
Time
Time
9:30 to
Time
Prior 4
CPU
Time
Prior 4
DB2
10:30
Per
Weds
Per
Weds
Tran
Prior 4
Weds
Tran
CMSOC300
884
0.0025
0.0025
1
0.0021
0.0021
DFHMIRS
424
0.0006
0.0006
-2
0
0
MYDOC016
391
0.0072
0.0075
-3
0.006
0.0062
PRTOC515
284
0.0141
0.0145
-3
0.0102
BRHOC053
190
0.0008
0.0008
1
PRTOC630
188
0.0111
0.0116
CMSOC320
187
0.0052
CHSOC120
133
CMSOC330
1
0.076
0.078
0.031
0.034
-3
0.301
0.314
0.0104
-3
0.189
0.21
0.0006
0.0006
1
0.011
0.012
-4
0.0053
0.0056
-5
0.07
0.077
0.0052
1
0.0048
0.0048
1
0.149
0.153
0.0025
0.0025
-2
0.0006
0.0006
-2
0.052
0.057
95
0.006
0.0059
2
0.0058
0.0057
2
0.182
0.184
BRIOC022
93
0.001
0.001
0
0
0
1
0.018
0.019
IAAOC222
91
0.0156
0.0156
0
0.0116
0.0116
0
0.482
0.485
PRTOC001
84
0.005
0.005
0
0.0019
0.0019
0
0.074
0.08
.
Once problem is found, find cause
• Run strobe on CICS or
batch job.
• Ask if program was
changed.
• Was a system parm
changed?
• Lurking problem
surfaced when user
patterns changed
• Did a new system go in?
Recommend change to fix problem
•
•
•
•
•
•
•
Code fix
Parameter change
SQL or IDMS call change
Run workload different time; smooth peaks
Redesign database or add index
Completely shutdown workload
If you don't know how to fix it, ask others
It helps to make performance recommendations if…
• You were a programmer in a previous life
• You were a DBA in a previous life
• Knowledgeable in MVS,CICS, DASD etc.
Integrity matters
• Be right, study before you speak
• Go for tuning that gives a payback
• If the workload isn't measurable, put in
mechanisms to measure it before doing the
tuning change
• Do some PR work - Send tuning results to
programmer and their management
Mainframe tools
•
•
•
•
•
•
•
SAS
MXG
Strobe
Jones built performance repositories
Our performance website
RMF 3
Omegamon
Capacity Management’s Prime Objective:
When Do We Run Out?
• When do we need more of a resource?
• How much lead time do you need?
– Approval cycle
– Floor space
– Vendor Delivery Time
– Installation Time
– Acceptable Risk
Forecasting Processes
Business
Forecasts
Performance and
Workload Data
Repositories
Resource
Utilization
Trends
Workload
Models
Resource
Utilization
Models
Performance
Prediction
Validate,
Assess and
Revise
Performance Tuning:
• We continually tune hardware and software, as well
as their interrelationships, to improve the
performance of systems.
• Shares ownership across multiple departments.
• Very highly iterative – never done!
• Why:
– Direct positive impact upon end user experience.
– Tuning cost avoidance.
Performance Tuning: How do we improve programs?
• Divide and Conquer:
– Which program in a batch job takes the longest?
– Which program uses the most CPU?
– Profile Code
– Tune infrastructure (including
network).
– Prioritize process
Performance Tuning
Identify Opportunities for Improvement – aka
"Hawgs" and "Dawgs".
• Which programs are slowest
(Dawgs)?
• Which programs use the most
resources (Hawgs)?
• Which programs are used the
most?
• Business criticality: How
important are they to the business?
Performance Data Repositories
• We maintain many performance data repositories –
these tend to be collections of statistics not detail
data.
• For example, we will not retain CICS transaction
detail, but we will calculate counts of transactions by
region by transaction name as well as average,
maximum and percentile statistics for a variety of
variables and intervals.
• SAS is our primary tool.
Performance Data Repositories: Data Sources
•
•
•
•
•
•
•
•
CICS – by day, by tran
DASD Type 74 – by day, by LPAR, by VOLSER
Jones application instrumentation
MVS level – by day, by LPAR
IDMS- by day, by program
DB2 – by day, by tran
Service and report classes – by day, by service class
Proc summary, proc append
Business Metrics and Workloads
• Business Metrics typically use different time frames
than workload metrics.
• Business doesn’t forecast in terms of megabytes of
DASD, cpu seconds used, interactive sessions,
concurrent users or paging rates.
• They refer to branches, IRs, customers, trades,
purchases, $$$, payments, visits, exorbitant cost of
IT,…
Loved Ones: Sorry, all apps are not equal
• What is the business importance of
the application / workload?
• If there are diverse workloads on a
system it is necessary to prioritize
the work to ensure that the work is
processed in an order that reflects its
business priority.
• To understand priorities you have
to understand the business.
• Capacity planning activities should
also ensure that when work is
constrained, the highest priority work
is favored.
Performance testing
•
•
•
•
•
Jones has clone environment of production
Use Loadrunner tool to generate transactions
Think time adjustable
A few hundred users is usually enough
All major system enhancements are loaded
tested
Load Testing: Objectives
Is End User Performance acceptable?
Will the introduction of these new features threaten the
health of other applications?
How does response & resource utilization compare to
current production levels?
Reproduce and troubleshoot production problems.
Will we need to add capacity?
In stress testing we measure response times at production peak
load and 5x production peak.
Often identify 'Break Points' to watch for in production.
Interaction with Availability
• A badly performing application is
effectively the same as the application
being unavailable.
• Capacity and Availability Management
share common goals / tools and
complement each other.
• Capacity Management needs to be aware
of Availability techniques deployed, such
as mirroring, load balancers or clustering,
in order to plan accurately for Capacity.
Questions: