Performance Tuning Mainframe Applications at Westpac

Download Report

Transcript Performance Tuning Mainframe Applications at Westpac

Performance Tuning
Mainframe
Applications
“It’s Not So Hard”
Tony Shediak
Ampdev Pty Ltd
[email protected]
Compuware User Conference 2007
Why is Performance “Painful”
•
Significant Lack of Mainframe Skills – and dropping.
Performance and Competence go together
•
Little Focus (and budget) on Mainframe Training.
“Management believes every mainframe programmer is
automatically competent in COBOL” – it’s only COBOL right
? What about VSAM, CICS, DB2, IMS etc… If it is on your
resume then surely you must know it………
How many programmers can read a DUMP ?
•
Lack of Performance disciplines, standards and Tools usage,
and more importantly - their ENFORCEMENT. Most
programmers will NOT care unless you MAKE THEM CARE.
•
All the focus is on Cost-cutting, Functionality, Change Control,
Audit compliance, processes, processes and more processes. But
we are Not stopping the rubbish getting into Production.
The COBOL experiment
• Setup Anonymous COBOL Skills Test
• Large organisation (still a client , can’t name them - before 6 beers)
• 150+ COBOL programmers
• Out of 100%:
• Lowest Mark 4%
• Highest Mark 93%
• Average Mark
28%
• Weakest areas – Mainframe fundamentals, Data Types and Indexes
vs Subscripts
• Remediation – assign and train COBOL practice lead to raise skill
level by running internal half-day COBOL skills work-shop and
half day COBOL Dumps work-shop
The Usual Suspects
•
Inefficient use of the programming language data types
•
Data conversions caused by mixing data types unnecessarily
•
Inefficient compiler options
•
Inefficient initialisation of large structures/groups
•
Over-Use of built-in functions or program language constructs
that generate subroutine calls
•
Inadequate VSAM buffering for the required function
•
Long-running jobs processing several large databases randomly
in a single step
•
Inefficient Date/Time processing
•
Using SQL to process tables like files, record by record
•
Using overly complex or inefficient SQL
•
Over-qualifying IMS DL/I calls
•
Inefficient file block size for the device
•
PL/I Onkey condition – this is very expensive since Language
Environment because the condition handling architecture
employed by LE. A Cobol “key not found” condition is about 7
times faster than a PL/I one.
•
Re-reading small files/databases/reference data over and over
rather than loading into program storage. Don’t be afraid to use
storage – Ent COBOL 3.4+ has extended WS limit to 134MB.
•
Not using the most optimal utility for the job, e.g. SORT COPY
vs REPRO.
Programming Efficiently
•
Understanding Data Types
•
Binary – integers, loop control, subscripts etc - fast.
•
Packed Decimal – fractions, money
•
Floating point – large range of numbers
•
Subscripting Tables/Arrays - If a subscript is not binary then it
will be converted to binary whether you like it or not.
•
COBOL Indexes – optimised sequential array processing. But
how many programmers actually understand this ?
•
Use optimal compiler options as much as possible
•
Utilise the “LIST” compile option and browse the assembler
•
Avoid heavy INITIALIZE as it does it one element at a time
Compile with “LIST”
•
Utilise the “LIST” compile option and browse the assembler and
look for CVB, CVD, PACK, BALR etc
E.g Find all occurrences CVB :
412
WORD
'CVB'
COMMAND INPUT ===> F CVB word all
.
.
008119
MOVE
002CB0
F272 A9B8 34F4
PACK 2488(8,10),1268(3,3)
TS2=0
002CB6
960F A9BF
OI
TS2=7
002CBA
4F60 A9B8
CVB
002CBE
4C60 53C4
MH
6,964(0,5)
002CC2
1A62
AR
6,2
002CC4
D202 6CBD 4EEF
MVC
3261(3,6),3823(4)
2495(10),X'0F'
6,2488(0,10)
TS2=0
PGMLIT AT +44
LKG6K-CNTYCODE()
Built-in Functions
•
Use them wisely. Check the “LIST” compile and see if a
subroutine call is generated, e.g. IGZCSTG for STRING and
IGZCIN1 for INSPECT.
•
In a lot of cases the “do it yourself code” is simple and far more
efficient.
*
* concatenate AAA (up to but not including the first blank) with all of BBB into DDD
*
STRING AAA DELIMITED SPACE
BBB DELIMITED SIZE
INTO
DDD.
subroutine call because of search for space
* Do it yourself
PERFORM VARYING I FROM 1 BY 1 UNTIL I > LENGTH OF AAA
OR
AAA(I:1) = SPACE
END-PERFORM.
much more efficient to do it
yourself – no subroutine call
this code is
65% more efficient
COMPUTE LEN-AAA = I – 1
MOVE AAA(1:LEN-AAA) TO DDD(1:LEN-AAA).
MOVE BBB
TO DDD(LEN-AAA + 1:LENGTH OF BBB).
than using STRING
Built-in Functions
•
But sometimes the built-in function can be more efficient. The
INSPECT converting with BOTH 2nd and 3rd arguments as
constants will generate a TR (Translate) machine instruction.
*
* Change ALL ‘*’ to SPACE and leave everything else as is
*
INSPECT AAA CONVERTING ‘*’ TO SPACE.
This code is 90% more
than ‘Do it Yourself’
*
* Do it yourself
*
PERFORM VARYING I FROM 1 BY 1 UNTIL I > LENGTH OF AAA
IF AAA(I : 1) = ‘*’
MOVE SPACE TO AAA(I : 1)
END-IF
END-PERFORM.
efficient
VSAM Buffering
•
•
•
NSR - Good for Sequential access
•
Read ahead
•
One set of buffers per file
LSR – Good for Random access
•
No read ahead
•
Buffers can be shared by several files
SMB – System Managed Buffering
Good Redbook –
“Vsam Demystified”
•
Enabled by SMS Dataclas
•
Allocates NSR or LSR buffers depending on how opened.
Watch out for DYNAMIC opens (Direct vs Sequential ?)
•
Makes JCL simple
VSAM Buffering – Use
IAM
•
IAM (Innovation Access Method) is a 3rd Party Product that
intercepts (transparently) VSAM I/O and uses its own
optimised Access Method with its own internal data
structures.
•
Default Buffering more than caters for most processing
requirements with little (if any) tweaking required for most jobs.
•
JCL is kept simple and for the most part unchanged
•
Significant CPU savings (30% + ) as installed with all default
settings
IMS Considerations
•
Are your programmers well skilled/trained in IMS Application
programming ? This is the first hurdle
•
Understand your Data before you do anything else
•
Over-qualifying SSAs is expensive – simple test program shows
18% CPU reduction by minimal qualification
•
Avoid single step processing with heavy random access to
several large databases. It is best to split the processing into
several steps (extract – sort – process) per large DB
•
When processing HDAM/DEDB databases it is most efficient if
the driving input is sorted in the same physical sequence as the
database - RAPSORT.
•
Load Heavily hit Reference data into IMS MPRs.
CICS Considerations
•
Use Dynamic CALL/FETCH of subroutines rather than CICS
LINK (as this creates LE Enclave). Ent PL/I has removed
FETCH restrictions.
•
Use THREADSAFE Progs with DB2 – can save 5 to 15% CPU
by minimising TCB switching. But do your homework – read
redbook “Threadsafe Considerations for CICS”
•
Only turn CICS trace on when you need it. Otherwise turn off –
save you about 3%
•
Check your STROBE report for LSR buffer hits, LE Heap/Stack
allocation. There is more storage available these days so why not
use it – Bump up the default LE parms if you need to.
•
Use VSAM Data Tables to reduce I/O
DB2 Considerations
•
Understand your Data before you do anything else
•
Watch out for the “record-oriented” SQL approach:
Open Cursor A
For Every A Row
Open Cursor B
For Every B row
….
End
Close Cursor B
End
•
Simple can be effective – A tablespace scan is very fast as long
as you do one pass.
•
Learn to use EXPLAIN – even at a basic level you will pick up
easy fix issues.
Date/Time Processing
•
Use COBOL/PLI functions instead of DB2
•
DB2 Date/Time Arithmetic is expensive – so use it wisely
•
LE Date/Time Arithmetic better than DB2 but still expensive
•
In-house written functions for Date/Time validation and
Arithmetic is by far the best performer
•
When Developing in-house routines:
•
Utilise internal tables as much as possible to store constant
information rather than derive it each time. For example,
Leap year indicator can be stored rather than calculating each
time – storage is not so much an issue these days
•
Make your routine “Reducible” – i.e. if input parameters are
exactly the same as the last invocation then return the last
saved output parameters
Example 1 – Compiler Options
Description:
Subroutine performing name/address abbreviation
Language:
COBOL
Performance
problems identified: Using PIC 999 display data for arithmetic operations and array subscripts.
Compiled with SSRANGE and TRUNC(BIN) - Pre COBOL V2.2
How identified:
Compile listing and STROBE
Tuning applied:
Change all display data to BINARY, as all are integer.
Compile with NOSSRANGE and TRUNC(OPT)
Effort required:
4 hours
Performance
improvement:
70% CPU reduction
Example 2 – Lots of Logical I/O
Description:
Program performing account range validation via a VSAM(IAM) ksds by
checking the range on each record against all subsequent records
Language:
PL/I
Performance
problems identified: Program issuing over 300M logical I/O’s even though file contains
30,000 records only
How identified:
IAM I/O report
Tuning applied:
load file into an array allocated above the 16M line to eliminate further I/O’s.
re-work checking algorithm to eliminate multiple passes through each entry
Effort required:
3 days
Performance
improvement:
99.97% CPU reduction. Cpu secs dropped from 2000 to 0.5
Example 3 – High Hit Static Reference Data
Description:
Expensive CICS Tran used to create web drop-down list
Language:
COBOL
Performance
problems identified: DB2 Table containing relatively static reference data has a very high hit rate
How identified:
STROBE report
Tuning applied:
Load data into CICS region using GETMAIN SHARED and refresh every hour.
Use ENQ/DEQ to serialise updates to storage to allow program to run as
THREADSAFE
Effort required:
2 days
Performance
improvement:
99% CPU reduction.
Example 4 – Record Oriented SQL
Description:
DB2 Program performing rate reporting
Language:
PL/I
Performance
problems identified: SQL used in a record-oriented approach e.g.
Open Cursor A
For every A row
Open Cursor B
For every B row
Open Cursor C
…etc
End
Close Cursor B
End
Close Cursor A
How identified:
STROBE report and eyeballing the source
Tuning applied:
Rewrite SQL to utilise table JOINS (specifically Left outer join in this case)
and create and process a single cursor, so DB2 does the work once
Effort required:
2 days
Performance
improvement:
Elapsed time dropped from 12 hours to 3 mins
Example 5 – VSAM Buffering and SMB
Description:
Program performing specialised data extract across VSAM files
Language:
COBOL
Performance
problems identified: using default buffering (2 data and 1 index).
using NSR but most of the processing is random
How identified:
STROBE report
Tuning applied:
Change file open mode from dynamic to RANDOM.
Enable SMB(System Managed Buffering) by adding dataclas
to define cluster and reorganising to allow SMB dataclas to apply.
Because the file mode is random SMB will use LSR buffering. Note
that alternatively we could have left the file open mode as dynamic and
added AMP=’ACCBIAS=DO’ to the JCL to force LSR
Effort required:
1 hour
Performance
improvement:
86% CPU reduction. 95% Elapsed time reduction
Example 6 – Compiler Generated Subroutine Calls
Description:
Subroutine performing name and address compression into a
cross-reference key
Language:
COBOL
Performance
problems identified: using complex INSPECT, STRING extensively
How identified:
STROBE report and eyeballing source code
Tuning applied:
Total rewrite of code eliminating ALL STRING and INSPECT functions
and replacing them with simple iterative search loops and sub-string
manipulations
Effort required:
6 days
Performance
improvement:
75% CPU reduction
Example 7 – Data Conversions
Description:
Program performing specialised data extract
Language:
PL/I
Performance
problems identified: many compiler generated subroutine calls due to manipulation
of UNALIGNED bit strings in a STRUCTURE and various data
conversions caused by mixing data types
How identified:
STROBE report and compile listing
Tuning applied:
Changed all unaligned bit strings within STRUCTURES to aligned.
Eliminated all other data conversions requiring subroutine calls.
70 compiler generated subroutine calls were eliminated from the object
code
Effort required:
1 day
Performance
improvement:
65% CPU reduction
Example 8 – Bad SQL - 1
Description:
DB2 Program performing Online Query
Language:
COBOL
Performance
problems identified: SQL NOT utilising Table Index in Join because of mis-matching data types
hence causing Tablespace Scan:
SELECT
P.PROD_CD
.
.
FROM (SELECT A.PARAMETER_NUM_WHLE
AS PROD_CD
.
.
) AS P
INNER JOIN P.SD700T00 AS S
DEC(4)
ON
How identified:
S.SD700_PRODUCT_CODE
= P.PROD_CD
STROBE report and EXPLAIN or ISTROBE
DEC(15)
Example 8 – Bad SQL - 2
Tuning applied:
Use CAST function to Convert to Correct data type :
SELECT
P.PROD_CD
.
.
FROM (SELECT CAST(A.PARAMETER_NUM_WHLE AS
AS PROD_CD
.
.
) AS P
INNER JOIN P.SD700T00 AS S
ON
S.SD700_PRODUCT_CODE
DECIMAL(4))
= P.PROD_CD
Now DB2 uses Index on SD700T00 Table. Tablespace scan is gone
Effort required:
1 Hour
Performance
improvement:
50% CPU reduction
Example 9 – Bad SQL - 1
Description:
DB2 Program performing Batch Query
Language:
COBOL
Performance
problems identified: SQL performing entire Index Scan and minor sort unnecessarily:
SELECT
DISTINCT 1
Sort Unique (DISTINCT)
to eliminate multiple rows
FROM LM135T00
WHERE LM135_RACFID
= :LM135-RACFID
OR SUBSTR(LM135_RACFID,2,7) = :LM135-RACFID
Index Scan with NO matching Cols. But Silly to use
SUBSTR(..2,7) because data tells us that the first
character can only be a ‘*’ anyway.
e.g. we are looking for ‘X123456’ OR ‘*X123456’
How identified:
STROBE report and EXPLAIN or ISTROBE
Example 9 – Bad SQL - 2
Tuning applied:
Understand the data and hence eliminate Index Scan by searching for only valid
combinations:
ASTER-LM135-RACFID = ‘*’ || LM135-RACFID
SELECT 1
FROM LM135T00
WHERE LM135_RACFID IN (:LM135-RACFID,
:ASTER-LM135-RACFID)
FETCH FIRST 1 ROW ONLY
Removes problem of multiple rows returned, hence we don’t need the
Distinct.
e.g. we are still looking for ‘X123456’ OR ‘*X123456’
Effort required:
2 Hours
Performance
improvement:
98% CPU reduction
Example 10 – Initialising a Large Table
Description:
Program performing data extract
Language:
COBOL
Performance
problems identified: High volume initialisation of large table
How identified:
STROBE report and compile listing
Tuning applied:
Removed initialisation as it was not really required, the program was already
keeping track of the number of elements via a counter anyway
Effort required:
2 hours
Performance
improvement:
30% CPU reduction
The AAPT Story – From 650 to 400
MIPS in 15 Months
•
35 initiatives implemented
•
Mostly Bad SQL or COBOL or both. Lots of code changed
•
Some quick wins – Package REBINDs, Modifying or
adding Index, using IEBGENER (SORT COPY) instead of
REPRO, Changing Job schedules to run less often, and
Web front-end bug initiating CICS Tran too many times
•
CICS Threadsafe implemented with 15% CPU savings
across Region. Programs targeted by heavy SQL execution.
•
Upgraded DASD sub-system significantly helped I/O and
gave more room to reduce MIPS
•
Test Smart vs Test Hard, e.g. if only SQL was changed then
only SQL was tested
•
Strong Senior management support was given
Make Performance Part of the Culture
• Training, training and more training (can be internal)
•
Establish mentoring programs
•
Implement Practice leadership for your key technical areas,
e.g. COBOL, DB2, CICS, VSAM/IAM, IMS etc. These
people should:
• Own and Set Standards - and have the authority to
enforce
• Give Regular technical updates/presentations
• Consult to Apps Teams
• Investigate New features/versions and their benefit
• Be the ultimate authority in that area and have full
management support – otherwise pointless
Make Performance Part of the Culture
•
Establish Internal Performance Team – ideally Practice
leads would be in this team. Responsibilities:
• Regularly monitor M/F health using STROBE, SMF
reports etc and check out new software & OS features
• Work with Apps Teams to fix issues
• Constantly look for performance opportunities
•
Implement and Enforce Performance Management as part
of the Software Development/Maintenance methodology
and processes. E.g. MUST provide STROBE report for
New/Modified Programs as a deliverable before approval to
Production.
-- Criteria can be set to exclude low volume and
inexpensive trans and trivially quick low frequency
batch jobs. BUT Remember Performance Team is watching