CPExpert2010

Download Report

Transcript CPExpert2010

An Expert System designed to evaluate IBM z/OS systems

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

1

Product Overview

Helps analyze performance of z/OS systems.

Written in SAS (only SAS/BASE is required).

Runs as a batch job on mainframe (or on PC).

Processes data in a standard performance data base (either MXG, SAS/ITRM, or MICS).

Produces narrative reports showing results from analysis!

Product is updated every six months

45-day trial is available (see license agreement for details).

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

2

Components Delivered

SRM Component * March 1991

TSO Component *

MVS Component * April 1991 June 1991

*

These legacy components apply only to Compatibility Mode

DASD Component October 1991

CICS Component

WLM Component May 1992 April 1995

DB2 Component

WMQ Component October 1999 June 2004

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

3

Product Documentation

Each component has an extensive User Manual available in hard-copy or CD, and web-enabled

Describes the likely impact of each finding

Discusses the performance issues associated with each finding

Suggests ways to improve performance and describes alternative solutions

Provides specific references to IBM or other documents relating to the findings

More than 4,000 pages for all components

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

4

WLM Component

Checks for problems in service definition

Identifies reasons performance goals were missed

• • • • • •

Analyzes general system problems:

Coupling facility/XCF

• •

Paging subsystem System logger WLM-managed initiators Excessive CPU use by SYSTEM or SYSSTC IFA/zAAP, zIIP, and IOP/SAP processors PR/SM, LPAR, and HiperDispatch problems Intelligent Resource Director (IRD) problems

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

5

WLM Component - sample report

RULE WLM103 : SERVICE CLASS DID NOT ACHIEVE VELOCITY GOAL DB2HIGH (Period 1): Service class did not achieve its velocity goal during the measurement intervals shown below. The velocity goal was 50% execution velocity, with an importance level of 2. The '% USING' and '%TOTAL DELAY' percentages are computed as a function of the average address space ACTIVE time. The 'PRIMARY,SECONDARY CAUSES OF DELAY' are computed as a function of the execution delay samples on the local system.

------LOCAL SYSTEM------- % % TOTAL EXEC PERF PLEX PRIMARY,SECONDARY MEASUREMENT INTERVAL USING DELAY VELOC INDX PI CAUSES OF DELAY 21:15-21:30,08SEP1998 16.6 83.4 17% 3.02 2.36 DASD DELAY(99%) RULE WLM361 : NON-PAGING DASD I/O ACTIVITY CAUSED SIGNIFICANT DELAYS DB2HIGH (Period 1): A significant part of the delay to the service class can be attributed to non-paging DASD I/O delay. The below data shows intervals when non-paging DASD delay caused DB2HIGH to miss its performance goal: AVG DASD AVG DASD --AVERAGE DASD I/O TIMES MEASUREMENT INTERVAL I/O RATE USING/SEC RESP WAIT DISC CONN 21:15-21:30,08SEP1998 31 1.405 0.010 0.003 0.004 0.002

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

6

WLM Component - sample report

RULE WLM601 : TRANSPORT CLASS MAY NEED TO BE SPLIT You should consider whether the DEFAULT transport class should be split.

A large percentage of the messages were too small, while a significant percentage of messages were too large. Storage is wasted when buffers are used by messages that are too small, while unnecessary overhead is incurred when XCF must expand the buffers to fit a message. The CLASSLEN parameter establishes the size of each message buffer, and the CLASSLEN parameter was specified as 16,316 for this transport class.

This finding applies to the following RMF measurement intervals: SENT SMALL MESSAGES MESSAGES TOTAL MEASUREMENT INTERVAL TO MESSAGES THAT FIT TOO BIG MESSAGES 10:00-10:30,26MAR1996 JA0 4,296 0 57 4,353 12:00-12:30,26MAR1996 Z0 2,653 6 762 3,421 12:30-13:00,26MAR1996 Z0 2,017 0 109 2,126 RULE WLM316: PEAK BLOCKED WORK WAS MORE THAN GUIDANCE The SMF statistics showed that blocked workload waited longer than specified by the BLWLINTHD parameter in IEAOPTxx. A maximum of more than 2 address spaces and enclaves were concurrently blocked during the interval.

BLWLINTHD BLWLTRPCT --BLOCKED WORKLOAD- MEASUREMENT INTERVAL IN IEAOPT IN IEAOPT AVERAGE PEAK 7:14- 7:29,01OCT2010 20 5 0.002 63 7:29- 7:44,01OCT2010 20 5 0.000 22 7:44- 7:59,01OCT2010 20 5 0.001 49 7:59- 8:14,01OCT2010 20 5 0.001 63 8:14- 8:29,01OCT2010 20 5 0.002 62

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

7

WLM Component - sample report

RULE WLM893: LOGICAL PROCESSORS IN LPAR HAD SKEWED ACCESS TO CAPACITY LPAR SYSC: HiperDispatch was specified for one or more LPARs in this CPC, and at least one LPAR used one or more high polarity central processors. LPAR SYSC was not operating in HiperDispatch Management Mode, and experienced a skew of its access to physical processors because the high polarity processors and medium polarity processors used by LPARs running in HiperDispatch Management Mode.

The information below shows the number of logical processors that were assigned to LPAR SYSC and each logical processor share of physical a processor. The CPU activity skew is shown during each RMF interval, showing the minimum, average, and maximum CPU busy for the logical processors assigned to LPAR SYSC.

LOGICAL CPUS % PHYSICAL CPU ACTIVITY SKEW MEASUREMENT INTERVAL ASSIGNED CPU SHARE MIN AVG MAX 13:59-14:14,15SEP2009 2 45.5 28.2 43.3 58.4

RULE WLM537: ZAAP-ELIGIBLE WORK HAD HIGH GOAL IMPORTANCE Rule WLM530 or Rule WLM535 was produced for this system, indicating that a relatively large amount of zAAP-eligible work was processed on a central processor. One possible cause of this situation is that the zAAP-eligible work was assigned a relatively high Goal Importance (the Goal Importance was either Importance 1 or Importance 2). Please see the discussion in the WLM Component User Manual for an explanation of this issue

.

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

8

DB2 Component

Analyzes standard DB2 interval statistics

Applies analysis from DB2 Administration Guide and DB2 Performance Guide (with DB2 9.1)

Analyzes DB2 Versions 3, 4, 5, 6, 7, 8, and 9

Evaluates overall DB2 constraints, buffer pools, EDM pool, RID list processing, Lock Manager, Log Manager, DDF, and data sharing

All analysis can be tailored to your site!

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

9

DB2 Component

Typical DB2 local buffer constraints

There might be insufficient buffers for work files

There were insufficient buffers for work files in merge passes

Buffer pool was full

Hiperpool read requests failed (pages stolen by system)

Hiperpool write requests failed (expanded storage not available

Buffer pool page fault rate was high

Data Management Threshold (DMTH) was reached

DWQT and VDWQT might be too large

DWQT, VDWQT, or VPSEQT might be too small

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

10

DB2 Component

Typical DB2 I/O prefetch constraints

Sequential prefetch was disabled, buffer shortage

Sequential prefetch was disabled, unavailable read engine

Sequential prefetch not scheduled, prefetch quantity = 0

Synchronous read I/O and sequential prefetch was high

Dynamic sequential prefetch was high (before DB2 8.1)

Synchronous read I/O was high

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

11

DB2 Component

Typical DB2 parallel processing constraints

Parallel groups fell back to sequential mode

Parallel groups reduced due to buffer shortage

Prefetch quantity reduced to one-half of normal

Prefetch quantity reduced to one-quarter of normal

Prefetch I/O streams were denied, shortage of buffers

Page requested for a parallel query was unavailable

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

12

DB2 Component

Typical DB2 EDM pool constraints

Failures were caused by full EDM pool

Low percent of DBDs found in EDM pool

Low percent of CT Sections found in EDM pool

Low percent of PT Sections found in EDM pool

Size of EDM pool could be reduced

Excessive Class 24 (EDM LRU) latch contention

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

13

DB2 Component

Typical DB2 Lock Manager constraints

Work was suspended because of lock conflict

Locks were escalated to shared mode

Locks were escalated to exclusive mode

Lock escalation was not effective

Work was suspended for longer than time-out value

Deadlocks were detected

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

14

DB2 Component

Typical DB2 Log Manager constraints

Archive log read allocations exceeded guidance

Archive log write allocations exceeded guidance

Waits were caused by unavailable output log buffer

Log reads satisfied from active log data set

Log reads were satisfied from archive log data set

Failed look-ahead tape mounts

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

15

DB2 Component

Typical DB2 Data Sharing constraints

Group buffer pool is too small

Incorrect directory entry/data entry ratio

Directory reclaims resulting in cross-invalidations

Castout processing occurring in “spurts”

Excessive lock contention or false lock contention

GBPCACHE ALL inappropriately specified

GBPCACHE CHANGED inappropriately specified

Conflicts between applications

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

16

DB2 Component - sample report

RULE DB2-208: VIRTUAL BUFFER POOL WAS FULL Buffer Pool 2: A usable buffer could not be located in virtual Buffer Pool 2, because the virtual buffer pool was full. This condition should not normally occur, as there should be ample buffers. You should consider using the -ALTER BUFERPOOL command to increase the virtual buffer pool size (VPSIZE) for the virtual buffer pool. This situation occurred during the intervals shown below: BUFFERS NUMBER OF TIMES MEASUREMENT INTERVAL ALLOCATED POOL WAS FULL 10:54-11:24, 15SEP1999 100 12 11:24-11:54, 15SEP1999 100 13 RULE DB2-216: BUFFER POOLS MIGHT BE TOO LARGE Buffer Pool 1: The page fault rates for read and write I/O indicated that the buffer pools might be too large for the available processor storage. This situation occurred for Buffer Pool 1 during the intervals shown below: BUFFERS PAGE-IN FOR PAGE-IN FOR PAGE MEASUREMENT INTERVAL ALLOCATED READ I/O WRITE I/O RATE 11:15-11:45, 16SEP1999 25,000 36,904 195 41.2

11:45-12:15, 16SEP1999 25,000 30,892 563 35.0

12:45-13:15, 16SEP1999 25,000 23,890 170 26.7

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

17

DB2 Component - sample report

RULE DB2-230: SEQUENTIAL PREFETCH WAS DISABLED - BUFFER SHORTAGE Buffer Pool BP1: Sequential prefetch is disabled when there is a buffer shortage, as controlled by the Sequential Prefetch Threshold (SPTH).

Ideally, sequential prefetch should not be disabled, since performance is adversely affected. If sequential prefetch is disabled a large number of times, the buffer pool size might be too small. The sequential prefetch threshold was reached for Buffer Pool BP1 during the intervals shown below.

BUFFERS TIMES SEQUENTIAL PREFETCH MEASUREMENT INTERVAL ALLOCATED DISABLED (BUFFER SHORTAGE) 5:00- 5:15, 15MAY2009 268,000 125 BP1 5:15- 5:30, 15MAY2009 268,000 1,533 BP1 RULE DB2-234: WRITE ENGINES WERE NOT AVAILABLE FOR ASYNCHRONOUS I/O Buffer Pool BP13: DB2 has 600 deferred write engines available for asynchronous I/O operations. When all 600 write engines are used, synchronous writes are performed. The application is suspended during synchronous writes, and performance is adversely affected. This situation occurred for Buffer Pool BP13 during the intervals shown below: BUFFERS TIMES WRITE ENGINES MEASUREMENT INTERVAL ALLOCATED WERE NOT AVAILABLE 5:45- 6:00, 15MAY2009 12,800 44 BP13

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

18

DB2 Component - sample report

RULE DB2-423: DATABASE ACCESS THREAD WAS QUEUED, ZPARM LIMIT WAS REACHED Database access threads were queued because the ZPARM maximum for active remote threads was reached. You should consider increasing the maximum number of database access threads allowed. This situation occurred during the intervals shown below: DATABASE ACCESS THREADS QUEUED MEASUREMENT INTERVAL ZPARM LIMIT REACHED 11:24-11:54, 01OCT2010 9 RULE DB2-512: LOG READS WERE SATISFIED FROM ACTIVE LOG DATA SET The DB2 Log Manager statistics revealed that more than 25% of the log reads were satisfied from the active log data set. It is preferable that the data be in the output buffer, but this is not always possible with an active DB2 environment. However, if a large percent of reads are satisfied from the active log, you should ensure that the output buffer is as large as possible. This finding occurred during the intervals shown below: TOTAL LOG LOG READS FROM MEASUREMENT INTERVAL READS ACTIVE LOG DATA SET PERCENT 14:24-14:54, 01OCT2010 6,554 4,678 71.4

14:54-15:24, 01OCT2010 7,274 3,695 50.8

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

19

DB2 Component - sample report

RULE DB2-601: COUPLING FACILITY READ REQUESTS COULD NOT COMPLETE Group Buffer Pool 6: Coupling facility read requests could not be completed because of a lack of coupling facility storage resources.

This situation occurred for Group Buffer Pool 6 during the intervals shown below: GROUP BUFFER POOL TIMES CF READ MEASUREMENT INTERVAL ALLOCATED SIZE REQUESTS NOT COMPLETE 11:01-11:31, 14OCT1999 38M 130 RULE DB2-610: GBPCACHE(N0) OR GBPCACHE NONE MIGHT BE APPROPRIATE Group Buffer Pool 4: This buffer pool had a very small amount of read activity relative to write activity. Pages read were less than 1% of the pages written. Since so few pages were read from this group buffer pool, you should consider specifying GPBCACHE(NO) for the group buffer pool or specifying GBPCACHE NONE for the page sets using the group buffer pool. This situation occurred for Group Buffer Pool 4 during the intervals shown below: GROUP BUFFER POOL PAGES PAGES READ MEASUREMENT INTERVAL ALLOCATED SIZE READ WRITTEN PERCENT 10:34-11:04, 14OCT1999 38M 14 18,268 0.07%

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

20

CICS Component

Processes CICS Interval Statistics contained in MXG Performance Data Base (standard SMF 110)

Analyzes all releases of CICS (CICS/ESA, CICS/TS for OS390, and CICS/TS for z/OS)

Applies most analysis techniques contained in IBM’s CICS Performance Guides

Produces specific suggestions for improving CICS performance

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

21

CICS Component (Major areas analyzed)

• • • • • • • • • • • • •

Virtual and real storage (MXG/AMXT/TCLASS) VSAM and File Control (NSR and LSR pools) Database management (DL/I, IMS, DB2) Journaling (System and User journals) Network and VTAM (RAPOOL, RAMAX) CICS Facilities (temp storage, transient data) ISC/IRC (MRO, LU61., LU6.2 modegroups) System logger Temporary Storage Coupling Facility Data Tables (CFDT) CICS-DB2 Interface Open TCB pools TCP/IP and SSL

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

22

CICS Component - sample report

RULE CIC101: CICS REACHED MAXIMUM TASKS TOO OFTEN The CICS statistics revealed that the number of attached tasks was restricted by the MXT operand, but storage did not appear to be constrained. CPExpert suggests that you consider increasing the MXT value in the System Initialization Table (SIT) for this region.

This finding applies to the following CICS statistics intervals: TIMES PEAK TIME STATISTICS MXT -PEAK TASKS- MAXTASK MAXTASK WAITING COLLECTION TIME APPLID VALUE TOTAL USER REACHED QUEUE MAXTASK 0:00,01OCT2010 CICSIDG. 20 46 20 36 8 0:02:29.0

RULE CIC140: THE NUMBER OF TRANSACTION ERRORS IS HIGH The CICS statistics revealed that more than 5 transaction errors were related to terminals. These transactions errors may indicate that there is an attempted security breach, there may be problems with the terminal, or perhaps additional operator training is indicated. This finding applies to the following CICS statistics intervals: STATISTICS COLLECTION TIME APPLID TERMINAL NUMBER OF ERRORS 0:00,01OCT2010 CICSPROD T2M1 348 0:00,01OCT2010 CICSPROD T2M2 60 0:00,01OCT2010 CICSPROD T2M6 348

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

23

CICS Component - sample report

RULE CIC170: MORE THAN ONE STRING SPECIFIED FOR WRITE-ONLY ESDS FILE More than one string was specified for a VSAM ESDS file that was used exclusively for write operations. Specifying more than one string can significantly affect performance because of exclusive control conflict that can occur. If this finding occurs for all normal CICS processing you should consider specifying only one string in the ESDS file definition.

STATISTICS NUMBER OF COLLECTION TIME APPLID VSAM FILE WRITE OPERATIONS 0:00,16MAR2010 CICSYA LNTEMSTR 431,436 RULE CIC267: INSUFFICIENT SESSIONS MAY HAVE BEEN DEFINED CPExpert believes that an insufficient number of sessions may have been defined for the CICS DAL1 connection, or the application system could have been issuing ALLOCATE requests too often. CPExpert suggests you consider increasing the number of sessions defined for the connection, or you should increase the ALLOCQ guidance variable to cause CPExpert to signal a potential problem only when you view the problem as serious. For APPC modegroups, this finding applies only to generic ALLOCATE requests.

This finding applies to the following CICS statistics intervals: STATISTICS ALLOCATE REQUESTS COLLECTION TIME APPLID RETURNED TO USERS 10:00,26MAR2008 CICSDTL1 335

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

24

CICS Component - sample report

RULE CIC267: INSUFFICIENT SESSIONS MAY HAVE BEEN DEFINED CPExpert believes that an insufficient number of sessions may have been defined for the CICS DAL1 connection, or the application system could have been issuing ALLOCATE requests too often. The number of ALLOCATE requests returned was greater than the value specified for the ALLOCQ guidance variable in USOURCE(CICGUIDE). CPExpert suggests you consider increasing the number of sessions defined for the connection, or you should increase the ALLOCQ guidance variable to cause CPExpert to signal a potential problem only when you view the problem as serious. For APPC modegroups, this finding applies only to generic ALLOCATE requests.

This finding applies to the following CICS statistics intervals: STATISTICS ALLOCATE REQUESTS COLLECTION TIME APPLID RETURNED TO USERS 10:00,26MAR2008 CICSDTL1 335 11:00,26MAR2008 CICSDTL1 12 12:00,26MAR2008 CICSDTL1 27

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

25

CICS Component - sample report

RULE CIC307: FREQUENT LOG STREAM DASD-SHIFTS OCCURRED CICS75.A075CICS.DFHLOG: More than 1 log stream DASD-shift was initiated for this log stream during the intervals shown below. A DASD-shift event occurs when system logger determines that a log stream must stop writing to one log data set and start writing to a different data set.

You normally should allocate sufficiently large log data sets so that a DASD-shift occurs infrequently.

------NUMBER OF DASD LOG SHIFTS----- SMF INTERVAL DURING INTERVAL DURING PAST HOUR 14:45,16MAR2010 1 2 RULE CIC650: CICS EVENT PROCESSING WAS DISABLED IN CICS EVENTBINDING Event Processing was disabled in EVENTBINDING, with the result that events defined in the EVENTBINDING were not captured by CICS Event Processing. You should investigate the Event Binding to determine whether the Binding should be enabled or disabled for the region. This finding applies to the following CICS statistics intervals: STATISTICS COLLECTION TIME 0:00,12MAR2009 3:00,12MAR2009 6:00,12MAR2009

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

26

DASD Component

Processes SMF Type 70(series) to automatically build model of your I/O configuration.

Identifies performance problems with devices which have most potential for improvement.

PEND delays

• •

Disconnect delays Connect delays

• •

IOSQ delays Shared DASD conflicts

Analyzes SMF Type 42(DS) and Type 64 to identify VSAM performance problems.

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

27

DASD Component - sample report

RULE DAS100: VOLUME WITH WORST OVERALL PERFORMANCE VOLSER DB2327 (device 2A1F) had the worst overall performance during the entire measurement period (10:00, 16FEB2001 to 11:00, 16FEB2001).

This volume had an overall average of 56.8 I/O operations per second, was busy processing I/O for an average of 361% of the time, and had I/O operations queued for an average of 1% of the time. Please note that percentages greater than 100% and Average Per Second Delays greater than 1 indicate that multiple I/O operations were concurrently delayed.

This can happen, for example, if multiple I/O operations were queued or if multiple I/O operations were PENDing. The following summarizes significant performance characteristics of VOLSER DB2327: I/O --- AVERAGE PER SECOND DELAYS-- MAJOR MEASUREMENT INTERVAL RATE RESP CONN DISC PEND IOSQ PROBLEM 10:00-10:30,16FEB2001 59.1 1.308 0.316 0.004 0.988 0.000 PEND TIME 10:30-11:00,16FEB2001 57.2 3.792 0.300 0.004 3.483 0.006 PEND TIME 11:00-11:30,16FEB2001 54.2 5.769 0.279 0.004 5.464 0.023 PEND TIME

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

28

DASD Component - sample report

RULE DAS130: PEND TIME WAS MAJOR CAUSE OF I/O DELAY.

A major cause of the I/O delay with VOLSER DB2327 was PEND time. The average per-second PEND delay for I/O is shown below: PEND PEND PEND PEND PEND TOTAL MEASUREMENT INTERVAL CHAN DIR PORT CONTROL DEVICE OTHER PEND 10:00-10:30,16FEB2001 0.492 0.000 0.000 0.000 0.495 0.988

10:30-11:00,16FEB2001 1.927 0.000 0.000 0.000 1.556 3.483

11:00-11:30,16FEB2001 2.840 0.000 0.000 0.000 2.624 5.464

RULE DAS160: DISCONNECT TIME WAS MAJOR CAUSE OF I/O DELAY.

A major cause of the I/O delay with VOLSER DB26380 was DISCONNECT time.

DISC time for modern systems is a result of cache read miss operations, potentially back-end staging delay for cache write operations, peer-to-peer remote copy (PPRC) operations, and other miscellaneous reasons.

--PERCENT- DASD CACHE ----CACHE--- READ WRITE TO TO MEASUREMENT INTERVAL READS WRITES HITS HITS CACHE DASD PPRC BPCR ICLR 8:30- 8:45,22OCT2001 14615 932 19.2 100.0 11825 903 0 0 0 8:45- 9:00,22OCT2001 14570 921 20.7 100.0 11567 907 0 0 0

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

29

DASD Component - sample report

RULE DAS300: PERHAPS SHARED DASD CONFLICTS CAUSED PERFORMANCE PROBLEMS Accessing conflicts caused by sharing VOLSER DB2700 between systems might have caused performance problems for the device during the measurement intervals shown below. Conflicting systems had the indicated I/O rate, average CONN time per second, average DISC time per second, average PEND time per second, and average RESERVE time to the device. Even moderate CONN, DISC, or RESERVE can cause delays to shared devices.

.. I/O MAJOR OTHER -------OTHER SYSTEM DATA------- MEASUREMENT INTERVAL RATE PROBLEM SYSTEM I/O RATE CONN DISC PEND RESV 8:30- 8:45,22OCT2001 31.3 QUEUING SY1 35.0 0.041 0.001 0.455 0.000

SY2 88.2 0.100 0.003 0.714 0.000

SY3 109.0 0.123 0.003 0.723 0.000

TOTAL 232.2 0.264 0.006 1.892 0.000

8:45- 9:00,22OCT2001 25.7 QUEUING SY1 46.4 0.054 0.001 0.565 0.000

SY2 98.2 0.112 0.003 0.836 0.000

SY3 119.0 0.136 0.003 0.846 0.000

TOTAL 263.5 0.303 0.007 2.247 0.000

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

30

DASD Component - sample report

RULE DAS607 : VSAM DATA SET IS CLOSE TO MAXIMUM NUMBER OF EXTENTS VOLSER: RLS003. More than 225 extents were allocated for the VSAM data sets listed below. The VSAM data sets are approaching the maximum number of extents allowed. The below shows the number of extents and the primary and secondary space allocation: .. TOTAL EXTENTS ---ALLOCATIONS-- SMF TIME STAMP JOB NAME VSAM DATA SET .. EXTENTS THIS OPEN PRIMARY SECONDARY 10:30,11MAR2002 CICS2ABA RLSADSW.VF01D.DATAENDB.DATA................. 229 4 30 CYL 1 CYL RULE DAS625 : NSR WAS USED, BUT LARGE PERCENT OF ACCESS WAS DIRECT VOLSER: MVS902. Non-Shared resources (NSR) was specified as the buffering technique for the below VSAM data sets, but more than 75% of the I/O activity was direct access. NSR is not designed for direct access, and many of the advantages of NSR are not available for direct access. You should consider Local Shared Resources (LSR) for the below VSAM data sets (perhaps using System Managed Buffers to facilitate the use of LSR). The I/O RATE is for the time the data set was open. The SMF TIME STAMP and JOB NAME are from the last record for the data set.

.. I/O OPEN -ACCESS TYPE (PCT) SMF TIME STAMP JOB NAME VSAM DATA SET .. RATE DURATION SEQUENTIAL DIRECT 13:19,19SEP2002 NRXX807. SDPDPA.PK.MVSP.RT.NDMGIX.DATA............... 8.4 0:07:08 0.0 100.0

13:19,19SEP2002 NRXX807. SDPDPA.PR.MVSP.RT.NDMGIXD.DATA.............. 11.2 0:06:42 0.0 100.0

13:33,19SEP2002 TSJHM... SDPDPA.PR.MVSP.RT.NDMRQFDA.DATA............. 0.3 2:21:58 0.0 100.0

13:33,19SEP2002 TSJHM... SDPDPA.PR.MVSP.RT.NDMRQF.DATA............... 2.8 3:37:53 0.0 100.0

13:33,19SEP2002 TSJHM... SDPDPA.PK.MVSP.RT.NDMTCF.DATA............... 11.1 6:24:10 0.1 99.9

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

31

DASD Component (Application Analysis)

Requires simple modification to MXG or MICS

Modification collects job step data while processing SMF Type 30 (Interval) records

Typically requires less than 10 cylinders

Data is correlated with Type 74 information

CPExpert associated performance problems to specific applications (jobs and job steps)

CPExpert can perform “Loved one” analysis of DASD performance problems

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

32

WMQ Component

Analyzes SMF Type 115 statistics, as processed by MXG or MICS and placed into performance data base.

• • • •

MQMLOG - Log manager statistics MQMMSGDM - Message/data manager statistics MQMBUFER - Buffer Manager statistics MQMCFMGR - Coupling Facility Manager stats Type 115 records should be synchronized with SMF interval recording interval.

IBM says overhead to collect accounting data is negligible.

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

33

WMQ Component

Optionally analyzes SMF Type 116 accounting data, as processed by MXG or MICS and placed into performance data base.

• •

MQMACCTQ - Thread-level accounting data MQMQUEUE - Queue-level accounting data Type 116 records should be synchronized with SMF interval recording interval.

IBM says overhead to collect accounting data is 5-10%

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

34

WebSphere MQ Typical queue manager problems

Assignment of queues to page sets Assignment of page sets to buffer pools Queue manager parameters Index characteristics of queues Characteristics of messages in queues Characteristics of MQ calls CPExpert analysis uses SMF Type 116 records

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

35

WebSphere MQ Typical buffer manager problems

Buffer thresholds exceeded for pool Buffers assigned per pool (too few/too many) Message traffic Message characteristics Application design CPExpert analysis uses SMF Type 115 records

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

36

WebSphere MQ Typical log manager problems

Log buffers assigned Active log use characteristics Archive log use characteristics Tasks backing out System paging of log buffers Excessive checkpoints taken CPExpert analysis uses SMF Type 115 records

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

37

WebSphere MQ Typical DB2-interface problems

Thread delays DB2 server processing delays Server requests queued Server tasks experienced ABENDs Deadlocks in DB2 Maximum request queue depth was too large CPExpert analysis uses SMF Type 115 records

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

38

WebSphere MQ Typical Shared queue problems

Structure was full Large number of application structures defined MINSIZE is less than SIZE for CSQ.ADMIN

SIZE is more than double MINSIZE ALLOWAUTOALT(YES) not specified FULLTHRESHOLD value might be incorrect CPExpert analysis uses SMF Type 115 records and Type 74 (Coupling Facility) records

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

39

WebSphere MQ – sample report

RULE WMQ100: MESSAGES WERE WRITTEN TO PAGE SET ZERO More than 0 messages were written to Page Set Zero during the intervals shown below. Messages should not be written to Page Set Zero, since serious WebSphere MQ system problems could occur if Page Set Zero should become full. This finding relates to queue SYSTEM.COMMAND.INPUT

MESSAGES WRITTEN STATISTICS INTERVAL TO PAGE SET ZERO 13:16-14:45, 28AUG2003 624 RULE WMQ122: DEAD.LETTER QUEUE IS INAPPROPRIATE FOR PAGE SET ZERO Buffer Pool 0. The DEAD.LETTER queue was assigned to Page Set Zero.

A dead-letter queue stores messages that cannot be routed to their correct destinations. If the DEAD-LETTER queue grows large unexpectedly, Page Set Zero can become full, and WebSphere MQ can enter a serious stress condition. You should redefine the DEAD.LETTER queue to a page set other than Page Set Zero. This finding relates to queue SYSTEM.DEAD.LETTER.QUEUE

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

40

WebSphere MQ – sample report

RULE WMQ110: EXPYRINT VALUE IS OFF OR TOO SMALL Buffer Pool 3. There were more than 25 expired messages skipped when scanning a queue for a specific message. Processing expired messages adds both CPU time and elapsed time to the message processing. With WebSphere 5.3, the EXPYRINT keyword was introduced to allow the queue manager to automatically determine whether queues contained expired messages and to eliminate expired messages at the interval specified by the EXPYRINT value. This finding applies to queue: DPS.REPLYTO.RCB.IVR04

GET BROWSE EXPIRED MESSAGES STATISTICS INTERVAL SPECIFIC SPECIFIC PROCESSED 13:41-13:41, 03JUL2003 0 0 313 RULE WMQ320: APPLICATIONS WERE SUSPENDED FOR LOG WRITE BUFFERS Applications were suspended while in-storage log buffers are being written to the active log. This finding normally means that too few log buffers were assigned. However, the finding could mean that there is an I/O configuration problem and the log buffer writes to the active log are delayed for I/O reasons. This finding applies to the following statistics intervals.

NUMBER OF SUSPENSIONS STATISTICS INTERVAL WAITING ON OUTPUT BUFFERS 14:19-14:44, 12SEP2003 139

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

41

WebSphere MQ – sample report

RULE WMQ201: BUFFER POOL ENCOUNTERED SYNCHRONOUS (5%) THRESHOLD Buffer Pool 0. This buffer pool encountered the Synchronous Write threshold (less than 5% of the pages in the buffer pool were "stealable" or more than 95% of the pages were on the Deferred Write queue). While the Synchronous Page Writer is executing, updates to any page cause the page to be written immediately to the page set (the page is not placed on the Deferred Write Queue, but is written immediately to the page set as a synchronous write operation). This situation harms performance of applications, and is an indicator that the buffer pool is in danger of encountering a Short on Storage condition.

BUFFERS TIMES AT IMMEDIATE STATISTICS INTERVAL ASSIGNED 5% THRESHOLD WRITES 17:08-17:09, 07OCT2003 1,050 19 19 RULE WMQ205: HIGH I/O RATE TO PAGE SETS WITH SHORT-LIVED MESSAGES Buffer Pool 0. This buffer pool had short-lived messages assigned.

The total I/O rate (read and write activity) to page sets for the short-lived messages was more than 0.5 pages per second. Writing pages to the page set and subsequently reading the pages from the page set cause I/O overhead and delay to the application. This finding applies to the following intervals: BUFFERS PAGES PAGES I/O RATE STATISTICS INTERVAL ASSIGNED WRITTEN READ WITH DASD 11:32-11:32, 24JUL2006 50,000 101 0 50.5

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

42

WebSphere MQ – sample report

RULE WMQ300: ARCHIVE LOGS WERE USED FOR BACKOUT WebSphere MQ applications issued log reads to the archive log file for backout more than 0 times during the WebSphere MQ statistics intervals shown below. Most log read requests should come from the output buffer or the active log. Using archive logs for backout purposes often indicates that either the active log files were too small or long-running applications were backing out work.

NUMBER OF LOG READS STATISTICS INTERVAL FROM ARCHIVE LOG 4:30- 5:00, 12SEP2003 192 RULE WMQ611: LARGE NUMBER OF APPLICATION STRUCTURES WERE DEFINED SMF TYPE74 (Structure) statistics showed that more than 5 application structures were defined to a coupling facility. IBM suggests that you should have as few application structures as possible. Having multiple application structures in a coupling facility can degrade performance.

WEBSPHERE MQ COUPLING FACILITY STRUCTURES DEFINED CF1 8 CF2 9 CF3 8

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

43

CPExpert Release 18.1

(Issued April 2008)

Major enhancements with this update:

Provided support for z10 server

Provided analysis of HiperDispatch problems

Provided new reports to help analysis of DB2 buffer pool problems

Expanded the CPExpert email feature to the DASD Component

Provided additional analysis features for the WebSphere MQ Component

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

44

CPExpert Release 18.2

(Issued October 2008)

Major enhancements with this update:

Provided support for z/OS Version 1, Release 10

Provided additional analysis of z/OS performance problems (in WLM Component), including reduced CPU speed caused by cooling unit failure

Provided new reporting of rules based on History information kept by CPExpert (applies to all components except DB2 Component)

Added masking technique to select CICS regions (by region Group), DASD volumes (including SMS Storage Groups), and WebSphere MQ subsystems

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

45

CPExpert Release 19.1

(Issued April 2009)

Major enhancements with this update:

Enhanced WLM Component with analysis of more z/OS performance problems, including Enqueue Promoted Dispatching Priority analysis

Project the amount of zAAP-eligible work that could be offloaded to a zAAP processor, if a zAAP processor were assigned to the LPAR

Provided more analysis of CICS temporary storage in CICS Component

Added Resource Enqueue analysis to DASD Component

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

46

CPExpert Release 19.2

(Issued October 2009)

Major enhancements with this update:

Provided support for z/OS Version 1, Release 11

Provide support for CICS/TS Release 4.1.

Added analysis of Resource Enqueue contention between different levels of Goal Importance to WLM Component

Added analysis of CICS Event Processing to the CICS Component (applicable to CICS/TS 4.1)

Allow users to specify narrative descriptions of individual DB2 buffer pools in CPExpert reports

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

47

CPExpert Release 20.1

(Issued April 2010)

Major enhancements with this update:

Enhanced WLM Component with analysis of SMF buffer specifications and other SMF performance constraints

Support analysis of VSAM performance problems when analyzing a MICS performance data base, but using MXG TYPE42DS and MXG TYPE64 files

Allow selection of up to 20 unique DB2 subsystems while analyzing performance problems with DB2 subsystems, and add logic to handle the case where an installation has multiple identical DB2 subsystem names defined in z/OS images

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

48

CPExpert Release 20.2

(Issued October 2010)

Major enhancements with this update:

Provided support z/OS Version 1 Release 12

Provided support for z/Enterprise System (z196)

Enhanced WLM Component to provide analysis of dropped SMF records and analysis of SMF flood facility (available with z/OS V1R12)

Enhanced WLM Component to provide Management Overview of CPExpert findings, with web-enabled documentation links

Enhanced the WebSphere MQ Component to provide analysis of a non-indexed request/reply-to queue

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

49

Components

License fees

(Site license) First Year Additional year WLM Component 7,500 5,000 DB2 Component 7,500 5,000 CICS Component (see note) 5,000 WMQ Component 5,000 DASD Component 3,000 3,000 3,000 1,500 Note:

Fees shown for the CICS Component are for analyzing no more than 50 CICS regions.

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

50

Summary

The major objective is to share solutions and provide insight into new z/OS features.

CPExpert is updated every six months; support for new versions of z/OS has been available within 30 days after General Availability of the new z/OS release.

CPExpert is offered at a low cost (affordable by all z/OS shops).

45-day no-obligation trial is available (see license agreement for details).

Free no-obligation performance analysis is available

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

51

For more information, please contact Don Deese Computer Management Sciences, Inc.

634 Lakeview Drive Hartfield, VA 23071-3113 Phone: Fax: email: (804) 776-7109 (804) 776-7139 [email protected]

Visit www.cpexpert.com for more information, to review sample output, to review documentation in SAS ODS “point-and-click” format, to download license agreements in .pdf “form” mode, etc.

©Copyright 1998-2010, Computer Management Sciences, Inc., Alexandria, VA www.cpexpert.com

52