No Slide Title

Transcript No Slide Title

An Expert System
designed to evaluate
IBM z/OS systems
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
1
Product Overview
• Helps analyze performance of z/OS systems.
• Written in SAS (only SAS/BASE is required).
• Runs as a batch job on mainframe (or on PC).
• Processes data in a standard performance
data base (either MXG, SAS/ITRM, or NeuMICS).
• Produces narrative reports showing results
from analysis!
• Product is updated every six months
• 45-day trial is available (see license agreement
for details).
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
2
Components Delivered
• SRM Component *
March 1991
• TSO Component *
April 1991
• MVS Component *
June 1991
* These legacy components apply only to Compatibility Mode
• DASD Component
October 1991
• CICS Component
May 1992
• WLM Component
April 1995
• DB2 Component
October 1999
• WMQ Component
June 2004
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
3
Product Documentation
Each component has an extensive User Manual
available in hard-copy or CD, and web-enabled
• Describes the likely impact of each finding
• Discusses the performance issues associated
with each finding
• Suggests ways to improve performance and
describes alternative solutions
• Provides specific references to IBM or other
documents relating to the findings
• More than 4,000 pages for all components
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
4
WLM Component
• Checks for problems in service definition
• Identifies reasons performance goals were missed
• Analyzes general system problems:
• Coupling facility/XCF
• Paging subsystem
• System logger
• WLM-managed initiators
• Excessive CPU use by SYSTEM or SYSSTC
• IFA/zAAP, zIIP, and IOP/SAP processors
• PR/SM and LPAR problems
• Intelligent Resource Director (IRD) problems
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
5
WLM Component - sample report
RULE WLM103: SERVICE CLASS DID NOT ACHIEVE VELOCITY GOAL
DB2HIGH (Period 1): Service class did not achieve its velocity goal
during the measurement intervals shown below. The velocity goal was
50% execution velocity, with an importance level of 2. The '% USING'
and '%TOTAL DELAY' percentages are computed as a function of the average
address space ACTIVE time. The 'PRIMARY,SECONDARY CAUSES OF DELAY'
are computed as a function of the execution delay samples on the local
system.
------LOCAL SYSTEM-------%
% TOTAL EXEC
PERF
MEASUREMENT INTERVAL USING
DELAY VELOC INDX
21:15-21:30,08SEP1998 16.6
83.4
17%
3.02
PLEX
PI
2.36
PRIMARY,SECONDARY
CAUSES OF DELAY
DASD DELAY(99%)
RULE WLM361: NON-PAGING DASD I/O ACTIVITY CAUSED SIGNIFICANT DELAYS
DB2HIGH (Period 1): A significant part of the delay to the service
class can be attributed to non-paging DASD I/O delay. The below data
shows intervals when non-paging DASD delay caused DB2HIGH to miss its
performance goal:
AVG DASD
MEASUREMENT INTERVAL I/O RATE
21:15-21:30,08SEP1998
31
AVG DASD
USING/SEC
1.405
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
--AVERAGE DASD I/O TIMESRESP
WAIT
DISC
CONN
0.010 0.003 0.004 0.002
www.cpexpert.com
6
WLM Component - sample report
RULE WLM601: TRANSPORT CLASS MAY NEED TO BE SPLIT
You should consider whether the DEFAULT transport class should be split.
A large percentage of the messages were too small, while a significant
percentage of messages were too large. Storage is wasted when buffers
are used by messages that are too small, while unnecessary overhead is
incurred when XCF must expand the buffers to fit a message. The CLASSLEN
parameter establishes the size of each message buffer, and the CLASSLEN
parameter was specified as 16,316 for this transport class.
This finding applies to the following RMF measurement intervals:
MEASUREMENT INTERVAL
10:00-10:30,26MAR1996
12:00-12:30,26MAR1996
12:30-13:00,26MAR1996
SENT
TO
JA0
Z0
Z0
SMALL
MESSAGES
4,296
2,653
2,017
MESSAGES
THAT FIT
0
6
0
MESSAGES
TOO BIG
57
762
109
TOTAL
MESSAGES
4,353
3,421
2,126
RULE WLM622: THE NUMBER OF OUTBOUND PATHS MAY NEED TO BE INCREASED
The PATH BUSY (when selected for transfer) was high relative to the
PATH AVAILABLE for the C605 path on System JB0, sending messages to the
C611 path on System JA0 in transport class DEFSMALL. This usually
means that you need to add more OUTBOUND paths to the transport class.
This finding applies to the following RMF measurement intervals:
MEASUREMENT INTERVAL
12:00-12:30,26MAR1996
TOTAL
MESSAGES
2562
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
PCT OUTBOUND
PATH BUSY
21.1
PCT PATH
AVAILABLE
78.9
www.cpexpert.com
7
WLM Component - sample report
RULE WLM705: STAGING DATA SETS NOT EFFICIENTLY USED FOR LOG STREAM
GRP1.CICS1T9A.DFHLOG: The bytes deleted after data was offloaded from
staging data sets to DASD log data sets was high relative to the bytes
deleted before data was offloaded. More than 10% of the bytes deleted
were deleted after being offloaded to DASD, with associated unnecessary
I/O operations. This indicates that the space allocated for the DASD
staging data sets was not efficiently used for the log stream data. If
this finding is consistently produced, you should consider increasing
the size of the DASD staging data set for log stream GRP1.CICS1T9A.DFHLOG.
This finding applies to the following SMF measurement intervals:
MEASUREMENT INTERVAL
14:45,03OCT2002
BYTES
DELETED
844K
BYTES DELETED
BEFORE OFFLOAD
0
BYTES DELETED
AFTER OFFLOAD
844K
PERCENT
AFTER
100.0
RULE WLM537: ZAAP-ELIGIBLE WORK HAD HIGH GOAL IMPORTANCE
Rule WLM530 or Rule WLM535 was produced for this system, indicating
that a relatively large amount of zAAP-eligible work was processed on a
central processor. One possible cause of this situation is that the
zAAP-eligible work was assigned a relatively high Goal Importance (the
Goal Importance was either Importance 1 or Importance 2). Please see
the discussion in the WLM Component User Manual for an explanation of
this issue.
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
8
DB2 Component
• Analyzes standard DB2 interval statistics
• Applies analysis from DB2 Administration Guide
and DB2 Performance Guide (with DB2 9.1)
• Analyzes DB2 Versions 3, 4, 5, 6, 7, 8, and 9
• Evaluates overall DB2 constraints, buffer pools,
EDM pool, RID list processing, Lock Manager,
Log Manager, DDF, and data sharing
• All analysis can be tailored to your site!
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
9
DB2 Component
Typical DB2 local buffer constraints
• There might be insufficient buffers for work files
• There were insufficient buffers for work files in merge passes
• Buffer pool was full
• Hiperpool read requests failed (pages stolen by system)
• Hiperpool write requests failed (expanded storage not available
• Buffer pool page fault rate was high
• Data Management Threshold (DMTH) was reached
• DWQT and VDWQT might be too large
• DWQT, VDWQT, or VPSEQT might be too small
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
10
DB2 Component
Typical DB2 I/O prefetch constraints
• Sequential prefetch was disabled, buffer shortage
• Sequential prefetch was disabled, unavailable read engine
• Sequential prefetch not scheduled, prefetch quantity = 0
• Synchronous read I/O and sequential prefetch was high
• Dynamic sequential prefetch was high (before DB2 8.1)
• Synchronous read I/O was high
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
11
DB2 Component
Typical DB2 parallel processing constraints
• Parallel groups fell back to sequential mode
• Parallel groups reduced due to buffer shortage
• Prefetch quantity reduced to one-half of normal
• Prefetch quantity reduced to one-quarter of normal
• Prefetch I/O streams were denied, shortage of buffers
• Page requested for a parallel query was unavailable
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
12
DB2 Component
Typical DB2 EDM pool constraints
• Failures were caused by full EDM pool
• Low percent of DBDs found in EDM pool
• Low percent of CT Sections found in EDM pool
• Low percent of PT Sections found in EDM pool
• Size of EDM pool could be reduced
• Excessive Class 24 (EDM LRU) latch contention
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
13
DB2 Component
Typical DB2 Lock Manager constraints
• Work was suspended because of lock conflict
• Locks were escalated to shared mode
• Locks were escalated to exclusive mode
• Lock escalation was not effective
• Work was suspended for longer than time-out value
• Deadlocks were detected
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
14
DB2 Component
Typical DB2 Log Manager constraints
• Archive log read allocations exceeded guidance
• Archive log write allocations exceeded guidance
• Waits were caused by unavailable output log buffer
• Log reads satisfied from active log data set
• Log reads were satisfied from archive log data set
• Failed look-ahead tape mounts
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
15
DB2 Component
Typical DB2 Data Sharing constraints
• Group buffer pool is too small
• Incorrect directory entry/data entry ratio
• Directory reclaims resulting in cross-invalidations
• Castout processing occurring in “spurts”
• Excessive lock contention or false lock contention
• GBPCACHE ALL inappropriately specified
• GBPCACHE CHANGED inappropriately specified
• Conflicts between applications
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
16
DB2 Component - sample report
RULE DB2-208: VIRTUAL BUFFER POOL WAS FULL
Buffer Pool 2: A usable buffer could not be located in virtual Buffer
Pool 2, because the virtual buffer pool was full. This condition
should not normally occur, as there should be ample buffers. You
should consider using the -ALTER BUFERPOOL command to increase the
virtual buffer pool size (VPSIZE) for the virtual buffer pool. This
situation occurred during the intervals shown below:
MEASUREMENT INTERVAL
10:54-11:24, 15SEP1999
11:24-11:54, 15SEP1999
BUFFERS
ALLOCATED
100
100
NUMBER OF TIMES
POOL WAS FULL
12
13
RULE DB2-216: BUFFER POOLS MIGHT BE TOO LARGE
Buffer Pool 1: The page fault rates for read and write I/O indicated
that the buffer pools might be too large for the available processor
storage. This situation occurred for Buffer Pool 1 during the intervals
shown below:
MEASUREMENT INTERVAL
11:15-11:45, 16SEP1999
11:45-12:15, 16SEP1999
12:45-13:15, 16SEP1999
BUFFERS
ALLOCATED
25,000
25,000
25,000
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
PAGE-IN FOR
READ I/O
36,904
30,892
23,890
PAGE-IN FOR
WRITE I/O
195
563
170
www.cpexpert.com
PAGE
RATE
41.2
35.0
26.7
17
DB2 Component - sample report
RULE DB2-601: COUPLING FACILITY READ REQUESTS COULD NOT COMPLETE
Group Buffer Pool 6: Coupling facility read requests could not be
completed because of a lack of coupling facility storage resources.
This situation occurred for Group Buffer Pool 6 during the intervals
shown below:
MEASUREMENT INTERVAL
11:01-11:31, 14OCT1999
GROUP BUFFER POOL
ALLOCATED SIZE
38M
TIMES CF READ
REQUESTS NOT COMPLETE
130
RULE DB2-610: GBPCACHE(N0) OR GBPCACHE NONE MIGHT BE APPROPRIATE
Group Buffer Pool 4: This buffer pool had a very small amount of read
activity relative to write activity. Pages read were less than 1% of
the pages written. Since so few pages were read from this group buffer
pool, you should consider specifying GPBCACHE(NO) for the group buffer
pool or specifying GBPCACHE NONE for the page sets using the group
buffer pool. This situation occurred for Group Buffer Pool 4 during
the intervals shown below:
MEASUREMENT INTERVAL
10:34-11:04, 14OCT1999
GROUP BUFFER POOL
ALLOCATED SIZE
38M
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
PAGES
READ
14
PAGES
WRITTEN
18,268
READ
PERCENT
0.07%
www.cpexpert.com
18
CICS Component
• Processes CICS Interval Statistics contained in
MXG Performance Data Base (standard SMF 110)
• Analyzes all releases of CICS (CICS/ESA,
CICS/TS for OS390, and CICS/TS for z/OS)
• Applies most analysis techniques contained in
IBM’s CICS Performance Guides
• Produces specific suggestions for improving
CICS performance
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
19
CICS Component
(Major areas analyzed)
•
•
•
•
•
•
•
•
•
•
•
•
•
Virtual and real storage (MXG/AMXT/TCLASS)
VSAM and File Control (NSR and LSR pools)
Database management (DL/I, IMS, DB2)
Journaling (System and User journals)
Network and VTAM (RAPOOL, RAMAX)
CICS Facilities (temp storage, transient data)
ISC/IRC (MRO, LU61., LU6.2 modegroups)
System logger
Temporary Storage
Coupling Facility Data Tables (CFDT)
CICS-DB2 Interface
Open TCB pools
TCP/IP and SSL
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
20
CICS Component - sample report
RULE CIC269:
EXCESSIVE GETMAIN/FREEMAIN ACTIVITY FOR MRO SESSIONS
The IOAREALEN parameter specifications for the below connections might
cause excessive GETMAIN/FREEMAIN activity. The GETMAIN/FREEMAIN rate
was greater than 25 per second to the Storage Domain SMTP Subpool.
This high GETMAIN/FREEMAIN activity normally means that the IOAREALEN
parameter was set to the default (zero), or was set to a smaller value
than the TIOA (Terminal I/O Area) required to support the MRO traffic.
The information shown below describes the total GETMAIN and FREEMAIN
activity, the Connection ID, the percent of message activity to each
Connection ID, and TIOA size (and the percent of message activity at
that TIOA size for the Connection ID) for connections with the highest
total function shipping activity.
APPLID
CICS1TAA
CICS1TAA
CICS1TAA
GETMAIN+
FREEMAIN
191,502
191,502
191,502
CONN
ID
1AAA
1AAB
1AAC
PERCENT
ACTIVITY
34.2
33.0
32.7
TIOA--PCT
624 96.2%
600 97.0%
600 97.1%
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
TIOA--PCT
120 3.7%
264 2.9%
344 2.8%
TIOA--PCT
www.cpexpert.com
21
CICS Component - sample report
RULE CIC267:
INSUFFICIENT SESSIONS MAY HAVE BEEN DEFINED
CPExpert believes that an insufficient number of sessions may have been
defined for the CICS DAL1 connection, or the application system could
have been issuing ALLOCATE requests too often. The number of ALLOCATE
requests returned was greater than the value specified for the ALLOCQ
guidance variable in USOURCE(CICGUIDE). CPExpert suggests you consider
increasing the number of sessions defined for the connection, or you
should increase the ALLOCQ guidance variable to cause CPExpert to signal
a potential problem only when you view the problem as serious. For APPC
modegroups, this finding applies only to generic ALLOCATE requests.
This finding applies to the following CICS statistics intervals:
STATISTICS
COLLECTION TIME
10:00,26MAR1996
11:00,26MAR1996
12:00,26MAR1996
APPLID
CICSDTL1
CICSDTL1
CICSDTL1
ALLOCATE REQUESTS
RETURNED TO USERS
335
12
27
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
22
DASD Component
• Processes SMF Type 70(series) to automatically
build model of your I/O configuration.
• Identifies performance problems with devices
which have most potential for improvement.
• PEND delays
• Disconnect delays
• Connect delays
• IOSQ delays
• Shared DASD conflicts
• Analyzes SMF Type 42(DS) and Type 64 to
identify VSAM performance problems.
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
23
DASD Component - sample report
RULE DAS100:
VOLUME WITH WORST OVERALL PERFORMANCE
VOLSER DB2327 (device 2A1F) had the worst overall performance during
the entire measurement period (10:00, 16FEB2001 to 11:00, 16FEB2001).
This volume had an overall average of 56.8 I/O operations per second,
was busy processing I/O for an average of 361% of the time, and had I/O
operations queued for an average of 1% of the time. Please note that
percentages greater than 100% and Average Per Second Delays greater
than 1 indicate that multiple I/O operations were concurrently delayed.
This can happen, for example, if multiple I/O operations were queued or
if multiple I/O operations were PENDing. The following summarizes
significant performance characteristics of VOLSER DB2327:
MEASUREMENT INTERVAL
10:00-10:30,16FEB2001
10:30-11:00,16FEB2001
11:00-11:30,16FEB2001
I/O
RATE
59.1
57.2
54.2
--- AVERAGE PER SECOND DELAYS--RESP
CONN
DISC
PEND
IOSQ
1.308 0.316 0.004 0.988 0.000
3.792 0.300 0.004 3.483 0.006
5.769 0.279 0.004 5.464 0.023
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
MAJOR
PROBLEM
PEND TIME
PEND TIME
PEND TIME
www.cpexpert.com
24
DASD Component - sample report
RULE DAS130:
PEND TIME WAS MAJOR CAUSE OF I/O DELAY.
A major cause of the I/O delay with VOLSER DB2327 was PEND time.
average per-second PEND delay for I/O is shown below:
MEASUREMENT INTERVAL
10:00-10:30,16FEB2001
10:30-11:00,16FEB2001
11:00-11:30,16FEB2001
RULE DAS160:
PEND
CHAN
0.492
1.927
2.840
PEND
DIR PORT
0.000
0.000
0.000
PEND
CONTROL
0.000
0.000
0.000
PEND
DEVICE
0.000
0.000
0.000
PEND
OTHER
0.495
1.556
2.624
The
TOTAL
PEND
0.988
3.483
5.464
DISCONNECT TIME WAS MAJOR CAUSE OF I/O DELAY.
A major cause of the I/O delay with VOLSER DB26380 was DISCONNECT time.
DISC time for modern systems is a result of cache read miss operations,
potentially back-end staging delay for cache write operations,
peer-to-peer remote copy (PPRC) operations, and other miscellaneous
reasons.
MEASUREMENT INTERVAL
8:30- 8:45,22OCT2001
8:45- 9:00,22OCT2001
--PERCENT-----CACHE---- READ WRITE
READS WRITES HITS
HITS
14615
932 19.2 100.0
14570
921 20.7 100.0
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
DASD
TO
CACHE
11825
11567
CACHE
TO
DASD PPRC
903
0
907
0
BPCR
0
0
ICLR
0
0
www.cpexpert.com
25
DASD Component - sample report
RULE DAS300:
PERHAPS SHARED DASD CONFLICTS CAUSED PERFORMANCE PROBLEMS
Accessing conflicts caused by sharing VOLSER DB2700 between systems
might have caused performance problems for the device during the
measurement intervals shown below. Conflicting systems had the
indicated I/O rate, average CONN time per second, average DISC time
per second, average PEND time per second, and average RESERVE time
to the device. Even moderate CONN, DISC, or RESERVE can cause delays
to shared devices.
..
I/O
MAJOR
OTHER -------OTHER SYSTEM DATA-------MEASUREMENT INTERVAL
RATE PROBLEM
SYSTEM I/O RATE CONN DISC PEND RESV
8:30- 8:45,22OCT2001 31.3 QUEUING
SY1
35.0
0.041 0.001 0.455 0.000
SY2
88.2
0.100 0.003 0.714 0.000
SY3
109.0
0.123 0.003 0.723 0.000
TOTAL
232.2
0.264 0.006 1.892 0.000
8:45- 9:00,22OCT2001 25.7 QUEUING
SY1
46.4
0.054 0.001 0.565 0.000
SY2
98.2
0.112 0.003 0.836 0.000
SY3
119.0
0.136 0.003 0.846 0.000
TOTAL
263.5
0.303 0.007 2.247 0.000
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
26
DASD Component - sample report
RULE DAS607: VSAM DATA SET IS CLOSE TO MAXIMUM NUMBER OF EXTENTS
VOLSER: RLS003. More than 225 extents were allocated for the VSAM data
sets listed below. The VSAM data sets are approaching the maximum
number of extents allowed. The below shows the number of extents
and the primary and secondary space allocation:
..
SMF TIME STAMP
JOB NAME VSAM DATA SET
..
10:30,11MAR2002
CICS2ABA RLSADSW.VF01D.DATAENDB.DATA.................
RULE DAS625:
TOTAL
EXTENTS
229
EXTENTS
THIS OPEN
4
---ALLOCATIONS--PRIMARY SECONDARY
30 CYL
1 CYL
NSR WAS USED, BUT LARGE PERCENT OF ACCESS WAS DIRECT
VOLSER: MVS902. Non-Shared resources (NSR) was specified as the
buffering technique for the below VSAM data sets, but more than 75%
of the I/O activity was direct access. NSR is not designed for direct
access, and many of the advantages of NSR are not available for direct
access. You should consider Local Shared Resources (LSR) for the below
VSAM data sets (perhaps using System Managed Buffers to facilitate the
use of LSR). The I/O RATE is for the time the data set was open. The
SMF TIME STAMP and JOB NAME are from the last record for the data set.
..
SMF TIME STAMP
JOB NAME VSAM DATA SET
..
13:19,19SEP2002
NRXX807. SDPDPA.PK.MVSP.RT.NDMGIX.DATA...............
13:19,19SEP2002
NRXX807. SDPDPA.PR.MVSP.RT.NDMGIXD.DATA..............
13:33,19SEP2002
TSJHM... SDPDPA.PR.MVSP.RT.NDMRQFDA.DATA.............
13:33,19SEP2002
TSJHM... SDPDPA.PR.MVSP.RT.NDMRQF.DATA...............
13:33,19SEP2002
TSJHM... SDPDPA.PK.MVSP.RT.NDMTCF.DATA...............
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
I/O
RATE
8.4
11.2
0.3
2.8
11.1
OPEN
DURATION
0:07:08
0:06:42
2:21:58
3:37:53
6:24:10
-ACCESS TYPE
SEQUENTIAL
0.0
0.0
0.0
0.0
0.1
(PCT)DIRECT
100.0
100.0
100.0
100.0
99.9
www.cpexpert.com
27
DASD Component
(Application Analysis)
• Requires simple modification to MXG or MICS
• Modification collects job step data while
processing SMF Type 30 (Interval) records
• Typically requires less than 10 cylinders
• Data is correlated with Type 74 information
• CPExpert associated performance problems to
specific applications (jobs and job steps)
• CPExpert can perform “Loved one” analysis of
DASD performance problems
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
28
WMQ Component
Analyzes SMF Type 115 statistics, as processed
by MXG or NeuMICS and placed into performance
data base.
• MQMLOG
- Log manager statistics
• MQMMSGDM - Message/data manager statistics
• MQMBUFER - Buffer Manager statistics
• MQMCFMGR - Coupling Facility Manager stats
Type 115 records should be synchronized with
SMF interval recording interval.
IBM says overhead to collect accounting data is
negligible.
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
29
WMQ Component
Optionally analyzes SMF Type 116 accounting
data, as processed by MXG or NeuMICS and
placed into performance data base.
• MQMACCTQ - Thread-level accounting data
• MQMQUEUE - Queue-level accounting data
Type 116 records should be synchronized with
SMF interval recording interval.
IBM says overhead to collect accounting data is
5-10%
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
30
WebSphere MQ
Typical queue manager problems
Assignment of queues to page sets
Assignment of page sets to buffer pools
Queue manager parameters
Index characteristics of queues
Characteristics of messages in queues
Characteristics of MQ calls
CPExpert analysis uses SMF Type 116 records
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
31
WebSphere MQ
Typical buffer manager problems
Buffer thresholds exceeded for pool
Buffers assigned per pool (too few/too many)
Message traffic
Message characteristics
Application design
CPExpert analysis uses SMF Type 115 records
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
32
WebSphere MQ
Typical log manager problems
Log buffers assigned
Active log use characteristics
Archive log use characteristics
Tasks backing out
System paging of log buffers
Excessive checkpoints taken
CPExpert analysis uses SMF Type 115 records
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
33
WebSphere MQ
Typical DB2-interface problems
Thread delays
DB2 server processing delays
Server requests queued
Server tasks experienced ABENDs
Deadlocks in DB2
Maximum request queue depth was too large
CPExpert analysis uses SMF Type 115 records
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
34
WebSphere MQ
Typical Shared queue problems
Structure was full
Large number of application structures defined
MINSIZE is less than SIZE for CSQ.ADMIN
SIZE is more than double MINSIZE
ALLOWAUTOALT(YES) not specified
FULLTHRESHOLD value might be incorrect
CPExpert analysis uses SMF Type 115 records
and Type 74 (Coupling Facility) records
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
35
WebSphere MQ – sample report
RULE WMQ100: MESSAGES WERE WRITTEN TO PAGE SET ZERO
More than 0 messages were written to Page Set Zero during the intervals
shown below. Messages should not be written to Page Set Zero, since
serious WebSphere MQ system problems could occur if Page Set Zero
should become full. This finding relates to queue
SYSTEM.COMMAND.INPUT
STATISTICS INTERVAL
13:16-14:45, 28AUG2003
MESSAGES WRITTEN
TO PAGE SET ZERO
624
RULE WMQ122: DEAD.LETTER QUEUE IS INAPPROPRIATE FOR PAGE SET ZERO
Buffer Pool 0. The DEAD.LETTER queue was assigned to Page Set Zero.
A dead-letter queue stores messages that cannot be routed to their
correct destinations. If the DEAD-LETTER queue grows large unexpectedly,
Page Set Zero can become full, and WebSphere MQ can enter a serious
stress condition. You should redefine the DEAD.LETTER queue to a page
set other than Page Set Zero. This finding relates to queue
SYSTEM.DEAD.LETTER.QUEUE
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
36
WebSphere MQ – sample report
RULE WMQ110: EXPYRINT VALUE IS OFF OR TOO SMALL
Buffer Pool 3. There were more than 25 expired messages skipped when
scanning a queue for a specific message. Processing expired messages
adds both CPU time and elapsed time to the message processing. With
WebSphere 5.3, the EXPYRINT keyword was introduced to allow the queue
manager to automatically determine whether queues contained expired
messages and to eliminate expired messages at the interval specified
by the EXPYRINT value. This finding applies to queue:
DPS.REPLYTO.RCB.IVR04
STATISTICS INTERVAL
13:41-13:41, 03JUL2003
GET
SPECIFIC
0
BROWSE
SPECIFIC
0
EXPIRED MESSAGES
PROCESSED
313
RULE WMQ320: APPLICATIONS WERE SUSPENDED FOR LOG WRITE BUFFERS
Applications were suspended while in-storage log buffers are being
written to the active log. This finding normally means that too
few log buffers were assigned. However, the finding could mean
that there is an I/O configuration problem and the log buffer writes
to the active log are delayed for I/O reasons. This finding applies
to the following statistics intervals.
STATISTICS INTERVAL
14:19-14:44, 12SEP2003
NUMBER OF SUSPENSIONS
WAITING ON OUTPUT BUFFERS
139
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
37
WebSphere MQ – sample report
RULE WMQ201: BUFFER POOL ENCOUNTERED SYNCHRONOUS (5%) THRESHOLD
Buffer Pool 0. This buffer pool encountered the Synchronous Write
threshold (less than 5% of the pages in the buffer pool were "stealable"
or more than 95% of the pages were on the Deferred Write queue). While
the Synchronous Page Writer is executing, updates to any page cause the
page to be written immediately to the page set (the page is not placed
on the Deferred Write Queue, but is written immediately to the page set
as a synchronous write operation). This situation harms performance of
applications, and is an indicator that the buffer pool is in danger of
encountering a Short on Storage condition.
STATISTICS INTERVAL
17:08-17:09, 07OCT2003
BUFFERS
ASSIGNED
1,050
TIMES AT
5% THRESHOLD
19
IMMEDIATE
WRITES
19
RULE WMQ205: HIGH I/O RATE TO PAGE SETS WITH SHORT-LIVED MESSAGES
Buffer Pool 0. This buffer pool had short-lived messages assigned.
The total I/O rate (read and write activity) to page sets for the
short-lived messages was more than 0.5 pages per second. Writing
pages to the page set and subsequently reading the pages from the
page set cause I/O overhead and delay to the application. This
finding applies to the following intervals:
STATISTICS INTERVAL
11:32-11:32, 24JUL2006
BUFFERS
ASSIGNED
50,000
PAGES
WRITTEN
101
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
PAGES
READ
0
I/O RATE
WITH DASD
50.5
www.cpexpert.com
38
WebSphere MQ – sample report
RULE WMQ300: ARCHIVE LOGS WERE USED FOR BACKOUT
WebSphere MQ applications issued log reads to the archive log
file for backout more than 0 times during the WebSphere MQ
statistics intervals shown below. Most log read requests should
come from the output buffer or the active log. Using archive
logs for backout purposes often indicates that either the active
log files were too small or long-running applications were backing
out work.
NUMBER OF LOG READS
STATISTICS INTERVAL
FROM ARCHIVE LOG
4:30- 5:00, 12SEP2003
192
RULE WMQ611: LARGE NUMBER OF APPLICATION STRUCTURES WERE DEFINED
SMF TYPE74 (Structure) statistics showed that more than 5 application
structures were defined to a coupling facility. IBM suggests that you
should have as few application structures as possible. Having multiple
application structures in a coupling facility can degrade performance.
COUPLING FACILITY
CF1
CF2
CF3
WEBSPHERE MQ
STRUCTURES DEFINED
8
9
8
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
39
License fees
(Site license)
Components
First Year
Additional year
WLM Component
7,500
5,000
DB2 Component
7,500
5,000
CICS Component(see note) 5,000
3,000
WMQ Component
5,000
3,000
DASD Component
3,000
1,500
Note: Fees shown for the CICS Component are for analyzing no more than 50 CICS regions.
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
40
CPExpert Release 17.1
(Issued April 2007)
Major enhancements with this update:
• Provided email alert feature for serious problems
• Provided additional analysis of paging problems
• Provided additional analysis of XCF and coupling
facility problems
• Redesigned CICS Component to use wild-card
feature (with significant reduction in processing
time for large CICS users of CICS Component)
• Provided additional analysis of WebSphere MQ
performance problems
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
41
CPExpert Release 17.2
(Issued October 2007)
Major enhancements with this update:
• Provided support for z/OS Version 1, Release 9
• Provided additional analysis of z/OS performance
problems (in WLM Component), including “blocked
workload” analysis
• Provided support for CICS/TS Release 3.2 and
added significant additional analysis of CICS
performance problems
• Provided support for DB2 Version 9.1 and additional
analysis of DB2 performance problems
• Provided support for NeuMICS in WebSphere MQ
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
42
Summary
• The major objective is to share solutions and
provide insight into new z/OS features.
• CPExpert is updated every six months;
support for new versions of z/OS has been
available within 30 days after General
Availability of the new z/OS release.
• CPExpert is offered at a low cost (affordable
by all z/OS shops).
• 45-day trial is available (see license agreement for
details).
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
43
For more information, please contact
Don Deese
Computer Management Sciences, Inc.
634 Lakeview Drive
Hartfield, VA 23071-3113
Phone:
Fax:
email:
(804) 776-7109
(804) 776-7139
[email protected]
Visit www.cpexpert.com for more information, to
review sample output, to review documentation in
SAS ODS “point-and-click” format, to download
license agreements in .pdf “form” mode, etc.
©Copyright 1998, Computer Management Sciences, Inc., Alexandria, VA
www.cpexpert.com
44