Transcript Document

Washington Area Informix/DB2 User’s Group
July 26, 2006
DB2 Performance Tuning: A Practical Approach
By Jim Cleveland
[email protected]
Bluepoint Consulting Inc.
Introduction
What is our aim?
– Performance tuning is:




Analyzing the system’s configuration
Observing performance characteristics
Formulating recommendations intended to enhance performance
(Rinse, shampoo, repeat)
– Metrics driven model involves:




Gathering system performance data (at all levels)
Focusing on a particular quantifiable performance measurement
Adjusting performance parameters or modifying architectural characteristics
Gauging the effect of those changes on overall performance and selected
measurement
Look familiar?
The Diagnostic Tools – DB2
DB2 Snapshot Monitors
– Internal counters set at global or session levels using monitor switches
– Monitors collect cumulative statistics either from the initiation of a CLP session
(switches and stats are local to session), since time of last reboot (global
switches set and stats available to all users), or last “reset monitors”
command at either level
– Useful for collecting point-in-time information regarding performance metrics
with respect to overall db/dbm behavior. Also able to provide more specific
information on:






Locking – at given moment lists all locks & types held by all applications
Bufferpools – cumulative stats for memory, physical & logical I/O’s, synch/asych
Sorts – provides detailed statistics regard sortheap usage, overflows, # active etc.
Tablespaces – detailed I/O stats and other activity for a given tablespace
UOW – displays status of application Unit of Work at point-in-time
Dynamic SQL – shows contents of package cache and related stats at point-in-time
The Diagnostic Tools – DB2
DB2 Event Monitors
Event Monitors are the DB2 facility used to collect information regarding a specific event or chain
of events within the context of database operations.
– They are event driven versus continuously collected as with snapshots
– Must be explicitly created and activated via DB2 commands or API’s
– Are the best way to effectively diagnose and resolve deadlock issues from the
database perspective, since deadlocks require tracing a chain of events through
time
– Output may be directed to a DB2 table and results then analyzed using SQL. For
example, output from statement Event Monitor can be used to:




identify and rank most frequently executed or most time consuming SQL
track the sequence of statements leading up to a lock timeout or deadlock
track connections to the database
breakdown overall SQL statement execution time into sub-operations such as sort time
and use or system CPU time
The Diagnostic Tools - Unix
Concurrent Monitoring of OS Resource Usage is Imperative.
When collecting DB2 snapshot/event data it will be useful (necessary
even?) to have, at a minimum, time stamped output from:
–
iostat [interval in sec. #iterations] provides detailed information, broken down by hdisk for %
of time disk busy, read/write transfer rates and totals, xacts per sec. Particularly useful for
identifying heavily used (‘hot’) disks which may be the cause I/O bottlenecks (i.e. – hdisks with
continuous activity > 40% deserve attention)
–
vmstat [interval in sec. #iterations] provides information regarding CPU activity (% for system,
user, and time waiting for I/O), available/free memory and paging activity. Useful for
identifying shortage of CPU resources - overall activity > 70% indicates there may be
inadequate ‘headroom’ for handling peak loads  DB2 processes may back up in wait queue.
–
Other useful OS performance monitoring/config commands:
 filemon - drill down to retrieve detailed I/O statistics on individual devices
 netstat - identify latency in transactions due to network data transfer time
 ps - provides capability to identify DB2 (or other) processes
 sar - system activity report, combines metrics from iostat, vmstat, and others
 lsps – describes paging space
 lscfg/smit – system hardware inventory command and configuration utilitiy
The Diagnostic Tools – Unix/AIX
In the best of all possible worlds...get topas installed (or nmon):
Resort to bribery….
The DB2 Process Model
To understand the context in which performance issues arise we need to understand how the various
DB2 components function together as a whole. That is, how the DB2 process model works –
Identifying the Problem – Step I
Collect Preliminary Configuration and Performance Information
–
Do a cursory configuration and architecture review (beforehand, if possible)
–
Capturing db and dbm level snapshot stats over a ‘representative’ period of time provides a
good jumping off/drill-down point for non-specific performance issues (i.e. – focus is not on a
single transaction or process).
–
Simple shell script run from Unix prompt or scheduled using crontab will suffice
to collect data over specified period (sub # of iterations for i1, interval len for i2):
For example:
#!/bin/ksh
date; x=o;
db2 update monitor switches using lock sort on bufferpool on uow on table on
statement on;
db2 reset monitor all
do while x < i1; # determines number of iterations, total monitoring period
date
db2 get snapshot for dbm;
db2 get snapshot for db on xxxxxx;
sleep i2 # interval in seconds for loop, longer interval for longer monitoring period
x=x+1
enddo
As mentioned cross time indexed OS level stats should be collected concurrently – same script using iostat/vmstat maybe.
Identifying the Problem – Step II
The question at this stage is: on what tasks and in what proportion is
DB2 spending its time?
To identify common performance issues, breakdown as follows:
–
–
–
–
–
–
Locking - time waiting on resolution of lock & deadlock issues
Sorts - time spent performing sorts, hash joins, (other?)
Logging Activity - time for log writes, reads and overhead
System CPU - time executing system calls
User CPU - time spent on SQL prep, ovhd for locking and sorting activity
General I/O - no single cause, likely a confluence of factors for e.g.
ts design, container mapping, bp configuration, db/dbm params etc.
(Specific calculations for above available in Admin Guide: Performance)
The Tuning Process – a Roadmap
Excessive DB2 or
OS System /User CPU
Tune for Sort,
Logging, P-fetch
or
Lock problem
Remaining lock/
sort? list db2
apps., examine
OS processes,
finally: stack trace
Yes
Collect db/dbm
& OS stats
Sort,
Logging,Lock or
Prefetch?
No
Memory,
CPU,or I/O
problems?
Wait on IO >
10%, or
Disk Busy
> 40%
Demand Paging
Perform I/O
diagnosis & tuning
Perform memory
accounting, adjust
paging params,
release
overcommitted
memory
SQL analysis and/
or application
design consult
Sort Issues
SQL containing GROUP BY, ORDER BY, or DISTINCT will result in a sort
if no index is available to order the result set. Also in prep for merge
join. To identify sort problems, look for the following:
–
–
–
–
–
–
–
Sort overflows are nominally < 3% (Total Overflows/Total Sorts * 100)
More than minimal post-threshold sorts or rejected piped sorts
Sort heap high water mark > SHEAPTHRESH (post-threshold sorts)
The # of active sorts at a given time is greater than x2 ( or 3 σ) baseline
Any large or significant small hash join overflows, any hash loops
Rows selected >> rows will generally causes ^ sorts  need for indexes
> 5 sorts per statement (Total sorts/Total Statements)  Problem SQL
Adjusting Sort Parameters
Increasing Sort Heap/Threshold (remember this is treating the symptom!):
– may allow for larger in memory sorts preventing disk spills
– fewer sort passes/merge joins and may also cut down overall exec time
– same benefits for hash joins, dynamic bit index AND’ing, & star joins
Caveats: large sort memory may cause excessive paging, in memory sort rather
than index scan, and sort space may come at the expense of bufferpool space
Rule Of Thumb:
– Calculate average sort space = (sort_heap allocated/avg. # active sorts)  use as
baseline sortheap then adjust to minimize overflows, pipe rejects, hash overflows
– Sheapthresh = max(avg # appls connected, avg. active sorts) * max(sortheap
among db’s in instance) adjust to minimize post-thresh sorts, hash overflows etc.
Logging Issues
Transaction logging and overhead can represent a large part of I/O time
in OLTP/mixed environments. To identify logging problems look for:
– Hot (> 40%) hdisks where logs located (use db2 snaps for raw devices)
– Snapshots showing frequent use of secondary logs, large volume of
writes/reads to logs, log high watermark approached, ^ opens/closes
– Logging in Non-OLTP environment? Check db2diag for ROLLBACK’s
Potential adjustments/remedies:
– Increase size & # of primary logs, SOFTMAX, MINCOMMIT, LOGBUFSIZE
– Relocate logs on faster devices?
– Design options: compound SQL, group > 5 statements into Stored Proc.
Prefetch Wait Time – Refining the Cause
To determine why prefetch is not happening effectively examine (for both
tablespaces and individual tables):







Asynch Read Pct: Asynch Reads/Buffer Pool Physical Reads
Asynch Read Ratio: Asynch Pages Read/Logical Reads
Asynch Pages Per Req (APPR): Asynch Pages Read/Asynch Requests
Asynch Read Time: Asynch Read time in ms/Overall Read Time ms
Number of unread prefetch pages
Is wait caused by an underlying I/O bandwidth limitation?
Bufferpool configuration
Purpose of calculations is to determine:




Is prefetch taking place on correct tables/tablespaces?
In large enough chunks?
In proper proportion to overall logical and synchronous I/O?
Without being overwritten by other pfetchers?
Excessive Prefetch Time – Possible Remedies
Check db/dbm configuration and architecture:
– DB2_PARALLEL_IO=* regvar, SEQDETECT=Y, # of IO_SERVERS
– Physical parameters - page/extent/prefetch sized/RAID strip size
Pfetch size = n * [# devices in array] * extent size where n Є 1,2,3…
Incremental improvement may be achieved by:
–
–
–
–
–
Implementing Big-Block I/O
Adjust the number and size of prefetch queues
bp’s only need to be sized to provide for effective pfetch
Need separate bp’s for tables w/high ARR’s or contention will occur
Table/jfs reorganization - Pre-fetch advantage is offset by fragmented
DB2/OS data check contiguity at OS level and w/DB2 REORGCHK
– I/O bandwidth limiting factor? Adequate parallelism with existing
devices?  Solution: map to more containers or more/faster devices.
Detection & Classifying of Locking Issues
To detect and diagnose locking problems, look for:

Snapshots show significant Time Spent Waiting on Locks
(compared to overall execution time)

Snapshots show lock escalations

Locklist High Water Mark > 50% of total space


db2diag.log shows repeated SQL0911N (rc 1 or 2) meaning
Lock timeouts or deadlocks detected and resolved
Look for ROLLBACK’s in diaglog as well  deadlocks
Locking Problem Mitigation
Locking is fundamentally an application logic issue. That said, recourse
on db side includes:
–
–
–
–
–
Insure LOCKTIMEOUT changed from default -1 to e.g. 10000
Set LOCKTABLE for heavily updated tables
Provide sufficient LOCKLIST space to prevent lock escalation
Tune db parameters MAXLOCKS, LOCKTIMEOUT & DLCHKTIME
Set registry variables LOCK_TO_RB , KEEPTABLELOCK, MAX_NON_TABLE_LOCKS,
DB2_EVALUNCOMMITTED, DB2_SKIPDELETED and DB2_SKIPINSERTED
– Type II indexes ^ concurrency if migrated from v7, Use REORG to convert
Application development principles:
– Frequent of COMMIT’s (use event monitor or db2diag.log to track)
– Specifying FOR FETCH cursors whenever possible
– Lowest isolation level for entire application or single statement
System or User CPU
User CPU is often a result of waiting on locks or sorts. If that’s still the
case after after previous steps, application design consult is required.
Barring that, the need is to identify processes or particular SQL
statement consuming CPU cycles:
– Use ps -elf to identify processes at OS level
– List applications show detail for connections within DB2
– Event monitors to isolate apps/SQL using extensive CPU resources
Stack trace, explained in the DB2 Problem Determination Guide’s, is
the ultimate means of identifying process (and time) flow within DB2.
Paging Issues –
AIX VMM allows system to assign more memory to a process than
physically exists. Properly tuned DB2 system on dedicated server should
incur NO paging whatsoever. Page faults/sec > 50 or > 10-20 pi/po  problem.
Paging means memory is overcommitted – common sources in 32-bit envirnoments:
- Over allocation of bufferpools or sortheap
- Database memory allocated based on avg. # applications
- Private memory allocated by large number of connections
Remedies:
- Identify and release overcommitted memory (use db2mtrk/visualizer)
- AIX minperm and maxperm parameters
- Paging space on disk x2 physical memory
- Pin bufferpools in memory using DB2_PINNED_BP
- Release agent private memory using DB2 regvars MEMDISCLAIM/MAXFREE
- Remember that Intra-parallelism and FCM take up 2 segments (i.e. – less for BP’s)
Whatever is left is I/O, right?
Sorta-kinda-maybe. I/O efficiency can be addressed within context of
existing configuration But, later we’ll need consider performance drivers
inherent in design.
Checklist of common opportunities for improving I/O performance:
◊
◊
◊
◊
◊
Parallel I/O – Registry variables, num_ioservers, adequate # containers (hdisks)
Turn off Memory Mapped I/O (MMAP_R/W) but beware of i-node latching
Consider DMS RAW DEVICE containers. DMS file are worst of both worlds
Examine behavior of HASH JOIN’s – may degrade performance wo/resources
RUNSTATS and REORG’s – automate! Don’t leave to discretion of overtaxed DBA!
◊ OLTP specifics – consider optimization 2/3, AVG_APPLS = 1, watch logging closely,
CHNGPGS_THRESH 20 – 30%, smaller pages  more efficient I/O, MINFREE and
PCTFREE to avoid block splitting/overflows
◊ DSS specifics – consider INTRAPARALLEL, optimization > 5? AA=connections,
watch sorts closesly, 32 K pages? (but remember only 255 rows per
page), DLCHKTIME ^, large result sets - rqioblock to 64 K
Laundry List of I/O issues (continued)
ESS: stripe size is 64 K, stripe ts containers across LUN’s, keep in mind
LUN’s can be on same rank – (same array), do not duplicate RAID at OS
FAStT: stripe size configurable from 8 – 256 K, multiple RAID configs
Less prevalent, but no less debilitating architectural/configuration issues:
–
–
–
–
–
–
Page size correct for row size and access method?
Unless there is a distinct advantage, (synch/asych I/O) fewer bufferpools
Benchmark index/data separation – not always advantageous
Pay attention to victim page steals – adjust # of cleaner/threshold
CPU Speed -1, let optimizer decide
File open/close overhead  MAXFILEOPEN/ulimits settings
AIX Rec’s - The good, the bad & the ugly
General settings, rules, suggestions, incantations:
–
–
–
–
–
–
–
–
–
–
–
–
–
–
Use AIX 5.2 with latest maintenance release if possible
Use 64 bit DB2 with 64 bit kernel, 32 bit / 32 bit, recompile & rebind
SP’s etc. when migration to 64 bit takes place, point to 64 bit libraries
Steady state load of hardware config (CPU, Memory, I/O) should use no
more than 70% of available resources to allow for peak loads/unseen overhead
Number of licensed users and maxuproc set to max DB2 connections
Maxperm/minperm settings along with DB2 reg vars for memory disclaim
Maxmemory/minmemory to regulate AIX
Ulimits – maxprocs increased, unlimited for other params
Make sure asynchronous I/O (aio) is enabled w/adequate # of agents
Paging space at least x2 physical memory
If pinning memory using regvar, vmtune tuning parameter ?
Disabling MMAP reads/writes in DB2 means jfs cache bybassed…can provide significant I/O
improvement, but i-node latches  3 containers
Make sure all processors in an SMP environment are enabled! Ask about WLM and DLPAR
settings.
Check in smit for LP/PP ratio – is there disk mirroring on top of RAID-5?
db2oscnfg – makes rec’s for OS kernel settings in HP-UX and Solaris
SQL Analysis & Tips
The Golden Rule for Join Performance:
– Columns used to join tables or that are used in a group by, order by or
distinct clause should be indexed to improve performance
Other useful SQL rules:
– Optimizer is all about ‘SARGABLE’ predicates (stage 1, indexable,
– For join with N tables, there must be N-1 relationships defined to avoid a Cartesian
product -- Corollary: establishing higher selectivity early in plan.
– All SQL optimization efforts marginalized with outdated statistics or fragged data
– Use optimize/fetch only rows clause if possible
– Test alternate optimization classes for problem SQL (i.e. – 5, 7 & 9)
– Use singleton select instead of cursor – use routines to identify node in EEE, make
cursors READ ONLY where possible  non-deletable  no exclusive locks,
BLOCKING ALL option also
– Use UR or lowest possible isolation level when possible to avoid waiting on locks
– Consider Global Temporary tables – scanning workfile faster sometimes than
subquery
– Run logic as function instead of procedure