More Examples of Interpreting Wait Events To Boost System

Download Report

Transcript More Examples of Interpreting Wait Events To Boost System

More Examples of Interpreting
Wait Events To Boost System
Performance
Roger Schrag and Terry Sutton
Database Specialists, Inc.
www.dbspecialists.com
1
Session Objectives
 Briefly introduce wait events:
– Define wait events
– Discuss how to use the wait event interface
 Walk through five examples of how wait
event information was used to troubleshoot
production problems
2
“Wait Event” Defined
 We say an Oracle process is “busy” when it wants
CPU time.
 When an Oracle process is not busy, it is waiting
for something to happen.
 There are only so many things an Oracle process
could be waiting for, and the kernel developers at
Oracle have attached names to them all.
 These are wait events.
3
Wait Event Examples
 An Oracle process waiting for the client
application to submit a SQL statement waits
on a “SQL*Net message from client” event.
 An Oracle process waiting on another
session to release a row-level lock waits on
an “enqueue” event.
4
Wait Event Interface
 Each Oracle process identifies the event it is
waiting for each time a wait begins.
 The instance collects cumulative statistics
about events waited upon since instance
startup.
 You can access this information through v$
views and a wait event tracing facility.
 These make up the wait event interface.
5
Viewing Wait Events
http://dbrx.dbspecialists.com/pls/dbrx/view_report
6
Why Wait Event
Information Is Useful
 Wait events touch all areas of Oracle—from
I/O to latches to parallelism to network
traffic.
 Wait event data can be remarkably detailed.
“Waited 0.02 seconds to read 8 blocks from
file 42 starting at block 18042.”
 Analyzing wait event data will yield a path
toward a solution for almost any problem.
7
Important Wait Events
 There were 158 wait events in Oracle 8.0.
 There are 363 wait events in Oracle 9i Release 2
(9.2.0).
 Most come up infrequently or are rarely significant
for troubleshooting performance.
 Different wait events are significant in different
environments, depending on which Oracle
features have been deployed.
8
A Few Common Events
buffer busy waits
control file parallel write
control file sequential read
db file parallel read / write
db file scattered read
db file sequential read
direct path read / write
enqueue
free buffer waits
latch free
library cache load lock
library cache pin
log buffer space
log file parallel write
log file sequential read
log file switch completion
log file sync
undo segment extension
write complete waits
9
Idle Events
 Sometimes an Oracle process is not busy
simply because it has nothing to do.
 In this case the process will be waiting on an
event that we call an “idle event.”
 Idle events are usually not interesting from
the tuning and troubleshooting perspective.
10
Common Idle Events
client message
dispatcher timer
gcs for action
gcs remote message
ges remote message
i/o slave wait
jobq slave wait
lock manager wait for remote message
null event
parallel query dequeue
pipe get
PL/SQL lock timer
pmon timer
PX Deq Credit: need buffer
PX Deq Credit: send blkd
PX Deq: Execute Reply
PX Deq: Execution Msg
PX Deq: Signal ACK
PX Deq:Table Q Normal
PX Deque wait
PX Idle Wait
queue messages
rdbms ipc message
slave wait
smon timer
SQL*Net message from client
SQL*Net message to client
SQL*Net more data from client
virtual circuit status
wakeup time manager
11
Accounted for by the
Wait Event Interface
 Time spent waiting for something to do (idle
events)
 Time spent waiting for something to happen
so that work may continue (non-idle events)
12
Not Accounted for by the
Wait Event Interface
 Time spent using a CPU
 Time spent waiting for a CPU
 Time spent waiting for virtual memory to be
swapped back into physical memory
 Time spent on CPU-intensive activities:
– Logical reads
– Spinning while waiting for latches
– Statement parsing
13
Timed Statistics
The wait event interface will not collect timing
information unless timed statistics are enabled.
 Enable timed statistics dynamically at the instance
or session level:
ALTER SYSTEM SET timed_statistics = TRUE;
ALTER SESSION SET timed_statistics = TRUE;
 Enable timed statistics at instance startup by
setting the instance parameter:
timed_statistics = true
14
The Wait Event Interface
 Dynamic performance views
– v$system_event
– v$session_event
– v$event_name
– v$session_wait
 Wait event tracing
15
The v$system_event View
Shows one row for each wait event name, along
with cumulative statistics since instance startup.
Wait events that have not occurred at least once
since instance startup do not appear in this view.
Column Name
-------------------------EVENT
TOTAL_WAITS
TOTAL_TIMEOUTS
TIME_WAITED
AVERAGE_WAIT
TIME_WAITED_MICRO
Data Type
-----------VARCHAR2(64)
NUMBER
NUMBER
NUMBER
NUMBER
NUMBER
16
Columns In v$system_event
 EVENT: The name of a wait event
 TOTAL_WAITS: Total number of times a process has
waited for this event since instance startup
 TOTAL_TIMEOUTS: Total number of timeouts while
waiting for this event since instance startup
 TIME_WAITED: Total time waited for this wait event
by all processes since startup (in centiseconds)
 AVERAGE_WAIT: The average length of a wait for
this event since instance startup (in centiseconds)
 TIME_WAITED_MICRO: Same as TIME_WAITED
but in microseconds (Oracle 9i)
17
Sample v$system_event Query
SQL> SELECT event, time_waited
2 FROM
v$system_event
3 WHERE event IN ('smon timer',
4
'SQL*Net message from client',
5
'db file sequential read',
6
'log file parallel write');
EVENT
TIME_WAITED
--------------------------------- ----------log file parallel write
159692
db file sequential read
28657
smon timer
130673837
SQL*Net message from client
16528989
18
The v$session_event View
Shows one row for each wait event name within
each session, along with cumulative statistics
since session start.
Column Name
-------------------------SID
EVENT
TOTAL_WAITS
TOTAL_TIMEOUTS
TIME_WAITED
AVERAGE_WAIT
MAX_WAIT
TIME_WAITED_MICRO
Data Type
-----------NUMBER
VARCHAR2(64)
NUMBER
NUMBER
NUMBER
NUMBER
NUMBER
NUMBER
19
Columns In v$session_event
 SID: The ID of a session (from v$session)
 EVENT: The name of a wait event
 TOTAL_WAITS: Total number of times this session has
waited for this event
 TOTAL_TIMEOUTS: Total number of timeouts while this
session has waited for this event
 TIME_WAITED: Total time waited for this event by this
session (in centiseconds)
 AVERAGE_WAIT: The average length of a wait for this
event in this session (in centiseconds)
 MAX_WAIT: The maximum amount of time the session had
to wait for this event (in centiseconds)
20
Sample v$session_event Query
SQL> SELECT event, total_waits, time_waited_micro
2 FROM
v$session_event
3 WHERE SID =
4
(SELECT sid FROM v$session
5
WHERE audsid =
6
USERENV ('sessionid') );
EVENT
WAITS TIME_WAITED_MICRO
--------------------------- ----- ----------------db file sequential read
552
2409173
db file scattered read
41
315928
SQL*Net message to client
73
347
SQL*Net message from client
72
3397382712
21
Oracle 9i Bug #2429929
SQL> SELECT event, total_waits, time_waited_micro
2 FROM
v$session_event
3 WHERE SID + 1 =
4
(SELECT sid FROM v$session
5
WHERE audsid =
6
USERENV ('sessionid') );
EVENT
WAITS TIME_WAITED_MICRO
--------------------------- ----- ----------------db file sequential read
552
2409173
db file scattered read
41
315928
SQL*Net message to client
73
347
SQL*Net message from client
72
3397382712
22
The v$event_name View
Shows one row for each wait event name known
to the Oracle kernel, along with names of up to
three parameters associated with the wait event.
Column Name
-------------------------EVENT#
NAME
PARAMETER1
PARAMETER2
PARAMETER3
Data Type
-----------NUMBER
VARCHAR2(64)
VARCHAR2(64)
VARCHAR2(64)
VARCHAR2(64)
23
Columns In v$event_name
 EVENT#: An internal ID
 NAME: The name of a wait event
 PARAMETERn: The name of a parameter
associated with the wait event
24
Sample v$event_name Query
SQL> SELECT *
2 FROM
v$event_name
3 WHERE name = 'db file scattered read';
EVENT# NAME
---------- -----------------------------PARAMETER1
PARAMETER2
PARAMETER3
------------- ------------- ------------95 db file scattered read
file#
block#
blocks
25
The v$session_wait View
Shows one row for each session, providing detailed
information about the current or most recent wait event.
Column Name
-------------------------SID
SEQ#
EVENT
P1TEXT
P1
P1RAW
P2TEXT
P2
P2RAW
P3TEXT
P3
P3RAW
WAIT_TIME
SECONDS_IN_WAIT
STATE
Data Type
-----------NUMBER
NUMBER
VARCHAR2(64)
VARCHAR2(64)
NUMBER
RAW(4)
VARCHAR2(64)
NUMBER
RAW(4)
VARCHAR2(64)
NUMBER
RAW(4)
NUMBER
NUMBER
VARCHAR2(19)
26
Columns In v$session_wait
 SID: The ID of a session
 SEQ#: A number that increments by one on each
new wait
 STATE: An indicator of the session status:
– ‘WAITING’: The session is currently waiting, and details
of the wait event are provided.
– ‘WAITED KNOWN TIME’: The session is not waiting, but
information about the most recent wait is provided.
– ‘WAITED SHORT TIME’ or ‘WAITED UNKNOWN TIME’:
The session is not waiting, but partial information about
the most recent wait is provided.
27
Columns In v$session_wait (cont.)
 EVENT: The name of a wait event
 PnTEXT: The name of a parameter associated with the
wait event
 Pn: The value of the parameter in decimal form
 PnRAW: The value of the parameter in raw form
 WAIT_TIME: Length of most recent wait (in
centiseconds) if STATE = ‘WAITED KNOWN TIME’
 SECONDS_IN_WAIT: How long current wait has been
so far if STATE = ‘WAITING’
28
Sample v$session_wait Query
SQL> SELECT * FROM v$session_wait WHERE sid = 16;
SID SEQ# EVENT
---- ----- -----------------------------P1TEXT
P1 P1RAW
P2TEXT
P2 P2RAW
------ ---- -------- ------ ---- -------P3TEXT
P3 P3RAW
WAIT_TIME SECONDS_IN_WAIT
------ ---- -------- --------- --------------STATE
------------------16
303 db file scattered read
file#
17 00000011 block# 2721 00000AA1
blocks
8 00000008
-1
0
WAITED SHORT TIME
29
Tracing Wait Event Activity
Using the dbms_support package or setting
debug event 10046 enables SQL trace, and
can optionally include wait event information
and bind variable data in trace files as well.
Methods for setting debug events:
 ALTER SESSION SET events
 oradebug
 dbms_system.set_ev
30
Activating Wait Event Tracing
 dbms_support is missing from many releases of
Oracle 8i, but is available as a patch.
 dbms_support is not installed by default; run
dbmssupp.sql in ?/rdbms/admin to install it.
 dbms_system.set_ev is not supported by Oracle
Corporation because it lets you set any debug
event and some can put your database at risk.
 Tracing imposes serious system overhead, so
trace only what you need.
31
Debug Event 10046 Settings
ALTER SESSION SET events
'10046 trace name context forever, level N';
Value of N
Effect
1
Enables ordinary SQL trace
4
Enables SQL trace with bind variable values
included in trace file
8
Enables SQL trace with wait event information
included in trace file
12
Equivalent of level 4 and level 8 together
32
Sample Oracle 8i Trace Output
=====================
PARSING IN CURSOR #1 len=80 dep=0 uid=502 oct=3 lid=502
tim=2293771931 hv=2293373707 ad='511dca20'
SELECT /*+ FULL */ SUM (LENGTH(notes))
FROM
customer_calls
WHERE status = :x
END OF STMT
PARSE #1:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=0,tim=2293771931
BINDS #1:
bind 0: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=03 oacfl2=0
size=24 offset=0
bfp=09717724 bln=22 avl=02 flg=05
value=43
EXEC #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=2293771931
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=675562835 p2=1 p3=0
WAIT #1: nam='db file scattered read' ela= 3 p1=17 p2=923 p3=8
WAIT #1: nam='db file scattered read' ela= 1 p1=17 p2=931 p3=8
WAIT #1: nam='db file scattered read' ela= 2 p1=17 p2=939 p3=8
WAIT #1: nam='db file sequential read' ela= 0 p1=17 p2=947 p3=1
WAIT #1: nam='db file scattered read' ela= 3 p1=17 p2=1657 p3=8
33
Wait Event Tracing Enhancements
In Oracle 9i
 The dbms_support package is provided for
easier trace activation.
 Elapsed times in the trace file are shown in
microseconds instead of centiseconds.
 A “waits=yes” option has been added to
TKPROF to include wait event statistics in
the TKPROF report.
34
Using Wait Event Information
Five examples of how wait event information
was used to diagnose production problems
35
Example #1:
Buffer Busy Waits
A magazine publisher has a website that
displays content stored in a database. At
times the website would get bogged down—
response time would become poor and the
database server would become extremely
busy (near-zero idle time).
36
Viewing Wait Events
Statistics With Statspack
 Collect Statspack snapshots at regular
intervals.
 Statspack report shows top wait events for
entire instance during snapshot interval.
 Oracle 9i Statspack also shows CPU time
used during the interval.
37
Statspack Report Output
Snap Id
Snap Time
Sessions
------- ------------------ -------Begin Snap:
61 11-Dec-02 13:00:52
145
End Snap:
71 11-Dec-02 14:00:26
145
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - Top 5 Wait Events
~~~~~~~~~~~~~~~~~
Wait
% Total
Event
Waits Time (cs) Wt Time
------------------------ ---------- ---------- ------buffer busy waits
1,962,372 1,278,649
50.03
db file sequential read
1,336,870 1,050,878
41.12
db file scattered read
47,717
49,326
1.93
direct path write
8,070
40,574
1.59
latch free
38,220
31,012
1.21
38
What We See in the
Statspack Report
 Dominant wait events:
– buffer busy waits
– db file sequential read
 Over 23,000 seconds of wait time on these
two events in a one hour period (over 6
seconds of waiting per elapsed second)
39
Understanding the
Buffer Busy Waits Event
SQL> SELECT parameter1, parameter2, parameter3
2 FROM
v$event_name
3 WHERE name = 'buffer busy waits';
PARAMETER1
PARAMETER2
PARAMETER3
------------ ------------ -----------file#
block#
id
 file#: Data file containing the desired data block
 block#: Block within the data file that is desired
 id: Reason the buffer in the buffer cache is
busy (see Metalink bulletin #34405.1)
40
Finding Which Data Blocks Are
Experiencing Buffer Contention
SQL>
2
3
4
5
SID
--12
31
SELECT
sid, event, state, seconds_in_wait,
wait_time, p1, p2, p3
FROM
v$session_wait
WHERE
event = 'buffer busy waits'
ORDER BY sid;
EVENT
----------------buffer busy waits
buffer busy waits
STATE SEC TIME
P1
P2
P3
----- --- ---- ----- ----- ----WAITE
1
0
30 62157
130
WAITE
1
0
30 23558
130
41
Finding Which Data Blocks Are
Experiencing Buffer Contention
SQL>
2
3
4
5
SELECT
FROM
WHERE
AND
owner, segment_name, segment_type
dba_extents
file_id = &absolute_file_number
&block_number BETWEEN block_id
AND block_id + blocks -1;
Enter value for absolute_file_number: 30
Enter value for block_number: 62157
OWNER
SEGMENT_NAME
SEGMENT_TYPE
----------------- ------------------- -----------PRODMGR
SAMPLES
TABLE
42
Reason Codes from Metalink
Bulletin #34405.1
P3
Reason Code
110
We want the CURRENT block either shared or
exclusive but the Block is being read into
cache by another session, so we have to wait
until their read() is completed.
Block is being read by another session and no
other suitable block image was found, so we
wait until the read is completed.
During buffer lookup for a CURRENT copy of
a buffer we have found the buffer but someone
holds it in an incompatible mode so we have to
wait.
130
220
43
What We Have Learned So Far
 A buffer containing a data block of the
SAMPLES table is experiencing contention.
 The buffer in the buffer cache is busy
because another session is reading the
same data block from disk.
44
Understanding the DB File
Sequential Read Event
SQL> SELECT parameter1, parameter2, parameter3
2 FROM
v$event_name
3 WHERE name = 'db file sequential read';
PARAMETER1
PARAMETER2
PARAMETER3
------------ ------------ -----------file#
block#
blocks
 file#: Data file containing the desired data block
 block#: Block within the data file that is desired
 blocks: How many blocks are being read
(typically 1 for db file sequential read)
45
Finding Which Data Blocks
Are Being Read
SQL>
2
3
4
5
SID
--17
19
33
SELECT
sid, event, state, seconds_in_wait,
wait_time, p1, p2, p3
FROM
v$session_wait
WHERE
event = 'db file sequential read'
ORDER BY sid;
EVENT
----------------db file sequentia
db file sequentia
db file sequentia
STATE SEC TIME
P1
P2
P3
----- --- ---- ----- ----- ----WAITE
1
0
30 62042
1
WAITE
1
0
30 61731
1
WAITI
0
0
30 57292
1
46
Finding Which Data Blocks
Are Being Read
SQL>
2
3
4
5
SELECT
FROM
WHERE
AND
owner, segment_name, segment_type
dba_extents
file_id = &absolute_file_number
&block_number BETWEEN block_id
AND block_id + blocks -1;
Enter value for absolute_file_number: 30
Enter value for block_number: 62042
OWNER
SEGMENT_NAME
SEGMENT_TYPE
----------------- ------------------- -----------PRODMGR
SAMPLES
TABLE
47
The SAMPLES Table
 Contained a LONG column with very
large values
 Excessive row chaining
 Most queries did not retrieve the
LONG data
 Table assigned to KEEP pool, but too
large to fit entirely in memory
48
Long-Term Problem Resolution
 Convert the LONG column to a CLOB.
 Large CLOB data will be stored in a
separate LOB segment.
 Row chaining will be reduced or eliminated.
 The table segment will be much smaller and
more likely to fit in memory.
49
Short-Term Problem Resolution
 Added index on most columns of
SAMPLES table
– Allowed most queries to avoid table segment
 Enlarged KEEP pool
– Allowed index segment to fit in memory
50
Statspack Report Output
Snap Id
Snap Time
Sessions
------- ------------------ -------Begin Snap:
1192 20-Dec-02 13:00:49
102
End Snap:
1202 20-Dec-02 14:00:18
102
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - Top 5 Wait Events
~~~~~~~~~~~~~~~~~
Wait
% Total
Event
Waits Time (cs) Wt Time
------------------------ ---------- ---------- ------direct path write
6,467
13,545
30.61
log file sync
4,914
7,493
16.93
library cache pin
1,175
6,090
13.76
direct path read
5,488
3,428
7.75
latch free
14,528
2,931
6.62
51
What We See in the
Statspack Report Now
 No db file sequential read or buffer busy
waits.
– All data was already in the buffer cache.
 Physical reads reduced by over 90%.
 Total wait time on all non-idle events
reduced by over 98%.
– Before: 12,786.49 / 0.5003 = 25,557.65
– After: 135.45 / 0.3061 = 442.50
52
What We Learned from
Wait Event Information
 Large amounts of time were being spent waiting
on single block disk reads and buffer contention in
the buffer cache.
 Random samples showed the disk reads and
contention involved the SAMPLES table.
 The buffer contention was the result of multiple
sessions needing the same block from disk.
 Wait events pointed us directly to the problem
segment.
53
Example #2:
More Buffer Busy Waits,
Plus Latch Contention
A genetic research company stored their
data in Oracle. Applications running
concurrently on many workstations would
fetch raw data, process it, and put the data
back in the database. But throughput bogged
down as they added more workstations.
54
Activating Wait Event Tracing
 Added to application code on workstation
#30:
ALTER SESSION SET events
'10046 trace name context forever, level 8';
 Could have used dbms_support if it was
installed:
dbms_support.start_trace;
 Modified application code to exit after 500
iterations
55
TKPROF Wait Events
Reporting in Oracle 9i
tkprof prodgen_ora_16466.trc report_16466.prf waits=yes
56
TKPROF Report Output
UPDATE processing_stations
SET
status = 'ACTIVE', status_date = SYSDATE,
data_set_id_being_processed = :b1
WHERE station_id = 30
call
count
cpu
elapsed disk query current
rows
------- ------ ------- --------- ----- ----- ------- ----Parse
1
0.01
0.00
0
0
0
0
Execute
500
0.23
10.14
0 3616
1010
500
Fetch
0
0.00
0.00
0
0
0
0
------- ------ ------- --------- ----- ----- ------- ----total
501
0.24
10.14
0 3616
1010
500
Elapsed times include waiting on following events:
Event waited on
Times Max. Wait Total Waited
--------------------------- Waited ---------- -----------buffer busy waits
26
0.71
7.87
latch free
17
0.57
2.08
log file switch completion
3
0.09
0.20
57
What We See In the TKPROF
Report
 500 trivial updates took 10.14 seconds
 Most of that time was spent waiting
 Dominant wait events:
– buffer busy waits
– latch free
 CPU time plus wait time does not add up to
elapsed time due to round-off errors
58
Waits In the Trace File
WAIT
WAIT
WAIT
WAIT
WAIT
WAIT
WAIT
WAIT
WAIT
WAIT
#2:
#2:
#2:
#2:
#2:
#2:
#2:
#2:
#2:
#2:
nam='buffer
nam='buffer
nam='buffer
nam='buffer
nam='buffer
nam='buffer
nam='buffer
nam='buffer
nam='buffer
nam='buffer
busy
busy
busy
busy
busy
busy
busy
busy
busy
busy
waits'
waits'
waits'
waits'
waits'
waits'
waits'
waits'
waits'
waits'
ela=
ela=
ela=
ela=
ela=
ela=
ela=
ela=
ela=
ela=
527727 p1=18 p2=10 p3=220
498765 p1=18 p2=10 p3=220
137611 p1=18 p2=10 p3=220
124165 p1=18 p2=10 p3=220
5237 p1=18 p2=10 p3=220
264050 p1=18 p2=10 p3=220
270177 p1=18 p2=10 p3=220
330912 p1=18 p2=10 p3=220
156317 p1=18 p2=10 p3=220
710696 p1=18 p2=10 p3=220
Elapsed times are in microseconds in
Oracle 9i
59
Finding Which Data Blocks Are
Experiencing Buffer Contention
SQL>
2
3
4
5
SELECT
FROM
WHERE
AND
owner, segment_name, segment_type
dba_extents
file_id = &absolute_file_number
&block_number BETWEEN block_id
AND block_id + blocks -1;
Enter value for absolute_file_number: 18
Enter value for block_number: 10
OWNER
SEGMENT_NAME
SEGMENT_TYPE
----------------- ------------------- -----------GEN
PROCESSING_STATIONS TABLE
60
Reason Codes from Metalink
Bulletin #34405.1
P3
Reason Code
110
We want the CURRENT block either shared or
exclusive but the Block is being read into
cache by another session, so we have to wait
until their read() is completed.
Block is being read by another session and no
other suitable block image was found, so we
wait until the read is completed.
During buffer lookup for a CURRENT copy of
a buffer we have found the buffer but someone
holds it in an incompatible mode so we have to
wait.
130
220
61
What We Have Learned So Far
 A buffer containing a data block of the
PROCESSING_STATIONS table is
experiencing contention.
 The buffer in the buffer cache is busy
because another session has the buffer in
an incompatible mode.
 All 26 buffer busy waits totaling 7.87
seconds involved the same data block.
62
The PROCESSING_STATIONS
Table
SQL> SELECT SYSDATE - last_analyzed, blocks,
2
avg_row_len, avg_space, num_rows
3 FROM
user_tables
4 WHERE table_name = 'PROCESSING_STATIONS';
SYSDATEAVG_ AVG_
LAST_ANALYZED BLOCKS ROW_LEN SPACE NUM_ROWS
------------- ------ ------- ----- -------2.132118056
1
62 1686
100
63
Two Important Observations
 There were 100 workstations running the
processing application concurrently.
 The trace we ran on workstation #30
completed in just under one minute.
64
Lots of Updates!
 Workstation #30 updated the
PROCESSING_STATIONS table 500 times
in less than one minute.
 If all 100 workstations do similar things:
More than 50,000 updates to one data block
every minute by 100 concurrent sessions!
65
Why So Many Updates?
 Workstations use the PROCESSING_STATIONS
table to track which workstation is processing
which data set.
 Processing one data set takes between 0.1
second and 20 minutes.
 Workstations update the table frequently to keep
the timestamp current. This would be helpful in the
event of a workstation crash.
66
Long-Term Problem Resolution
 Modify the application to update the
PROCESSING_STATIONS table less
frequently—once per data set or once per
second for larger data sets:
– Will reduce updates by over 80%
– Buffer busy waits will disappear or dramatically
decrease
67
Short-Term Problem Resolution
 Rebuilt the PROCESSING_STATIONS table
with PCTFREE set to 99:
– Oracle reserved 99% of each data block for
future row expansion.
– Each row got its own data block.
– Each workstation session now updates a
separate data block.
68
The Rebuilt
PROCESSING_STATIONS Table
SQL> SELECT SYSDATE - last_analyzed, blocks,
2
avg_row_len, avg_space, num_rows
3 FROM
user_tables
4 WHERE table_name = 'PROCESSING_STATIONS';
SYSDATEAVG_ AVG_
LAST_ANALYZED BLOCKS ROW_LEN SPACE NUM_ROWS
------------- ------ ------- ----- -------.130868056
100
62 8014
100
69
TKPROF Report Output
UPDATE processing_stations
SET
status = 'ACTIVE', status_date = SYSDATE,
data_set_id_being_processed = :b1
WHERE station_id = 30
call
count
cpu
elapsed disk query current
rows
------- ------ ------- --------- ----- ----- ------- ----Parse
1
0.00
0.00
0
0
0
0
Execute
500
0.20
2.22
0
500
1009
500
Fetch
0
0.00
0.00
0
0
0
0
------- ------ ------- --------- ----- ----- ------- ----total
501
0.20
2.22
0
500
1009
500
Elapsed times include waiting on following events:
Event waited on
Times Max. Wait Total Waited
--------------------------- Waited ---------- -----------latch free
2
0.35
0.61
70
What We See in the TKPROF
Report Now
 500 updates took 2.22 seconds, down from 10.14
seconds
 No more buffer busy waits
 Waited 0.61 seconds on latches, down from 2.08
seconds
 CPU time was 0.20 seconds, down from 0.23
seconds
 1.41 seconds unaccounted for—likely a mix of
waiting for CPU and round-off error
71
Understanding the
Latch Free Event
SQL> SELECT parameter1, parameter2, parameter3
2 FROM
v$event_name
3 WHERE name = 'latch free';
PARAMETER1
PARAMETER2
PARAMETER3
------------ ------------ -----------address
number
tries
 address: Join to addr in v$latch
 number: Join to latch# in v$latchname
 tries: Number of times the session has waited
while trying to acquire the latch
72
Waits In the Trace File
WAIT
WAIT
WAIT
WAIT
#2:
#2:
#2:
#2:
nam='latch
nam='latch
nam='latch
nam='latch
free'
free'
free'
free'
ela=
ela=
ela=
ela=
47004
14629
20652
37737
p1=15113593728
p1=15113593728
p1=15113593728
p1=15113593728
p2=97
p2=97
p2=97
p2=97
p3=0
p3=1
p3=2
p3=3
Four consecutive waits for one acquisition of
the latch
73
Finding Which Latches Are
Experiencing Contention
SQL> SELECT latch#, name
2 FROM
v$latchname
3 WHERE latch# = &latch_number;
Enter value for latch_number: 97
LATCH# NAME
---------- -------------------97 cache buffers chains
74
What We Learned from
Wait Event Information
 Much time was spent waiting on latch contention
and buffer contention in the buffer cache.
 The buffer contention was all for one data block.
 The buffer contention was the result of multiple
sessions needing to update the same data block.
 The latch contention involved the latch that
protects the buffer cache chains data structure.
 Wait events pointed us directly to the hot buffer in
the buffer cache.
75
Example #3:
Log File Waits
A data warehouse loader application was
tuned in a test environment until it met user
acceptance. The production server was
larger and more powerful, but the data loads
actually took longer in production than in the
test environment.
76
Summarizing Wait Events
During A Period of Time
 v$system_event shows wait event totals since
instance startup.
 v$session_event shows wait event totals since the
beginning of a session.
 You can capture view contents at different points
in time and compute the delta in order to get wait
event information for a specific period of time.
 Statspack and many third-party tools can do this,
but a simple script has less overhead and can be
quicker to deploy.
77
Simple Script to See Wait Events
During a 30 Second Period
CREATE TABLE previous_events AS
SELECT SYSDATE timestamp, v$system_event.*
FROM
v$system_event;
EXECUTE dbms_lock.sleep (30);
SELECT
A.event,
A.total_waits
- NVL (B.total_waits, 0) total_waits,
A.time_waited
- NVL (B.time_waited, 0) time_waited
FROM
v$system_event A, previous_events B
WHERE
A.event NOT IN (list of idle events)
AND
B.event (+) = A.event
78
ORDER BY time_waited;
Wait Events During 30 Seconds
of Data Loading
EVENT
TOTAL_WAITS TIME_WAITED
---------------------------- ----------- ----------control file sequential read
61
1
latch free
2
1
db file sequential read
6
7
control file parallel write
41
31
log file single write
6
164
db file parallel write
13
220
enqueue
6
486
log buffer space
24
2007
log file sequential read
30
2655
log file switch completion
33
2883
log file parallel write
19
3561
log file sync
113
10249
79
What We See in the
Script Output
 Over 215 seconds spent waiting on logrelated events:
– Sessions waited 102 seconds for LGWR to
flush the log buffer to disk for a commit
– Sessions waited 48 seconds for LGWR to make
space for more redo
– LGWR waited 37 seconds for disk writes
– ARCH waited 26 seconds for disk reads
80
Investigate the Redo Log
 Check production online redo log location
for contention and slow hardware:
– All log files located on one disk
– Same disk held hot files for another database
 Compare to test environment:
– Log files located on a striped volume
– Minimal other activity on the volume
– Database in NOARCHIVELOG mode
81
Problem Resolution
 Sped up online redo log file performance:
– Moved log files to dedicated disks
– Spread log files over multiple disks
82
What We Learned from
Wait Event Information
 The disk I/O speed writing and reading the
online redo log files was the bottleneck
slowing down the data warehouse load.
 Wait events pointed us directly to the area
within Oracle that was holding up the works.
83
Example #4:
Direct Path I/O Waits
Analysts in a customer service unit were
satisfied with the response time when they
queried individual customer orders from their
data warehouse. However, queries involving
summarizations of multiple orders were
unacceptably slow.
84
Database Rx Wait Event Report
85
What We See in the
Database Rx Report
 Dominant wait events:
– direct path write
– db file scattered read
– direct path read
 Above account for 99% of non-idle event
wait time
 Insignificant db file sequential read waits
86
What We Have Learned So Far
 Large amount of direct path I/O activity
– Usually involves temporary segments
 Significant multi-block I/O reads
– Full table scans are common in a data
warehouse environment
 Insignificant single-block I/O reads
– Frequently accessed data blocks probably in
buffer cache
87
Understanding the
Direct Path I/O Events
SQL> SELECT name, parameter1, parameter2, parameter3
2 FROM
v$event_name
3 WHERE name LIKE 'direct path%';
NAME
----------------direct path read
direct path write
PARAMETER1
----------file number
file number
PARAMETER2
---------first dba
first dba
PARAMETER3
---------block cnt
block cnt
 file number: File containing data block
 first dba: First block within the file to be accessed
 block cnt: Number of blocks to be accessed
88
Finding Which Files Are
Being Accessed
SQL>
2
3
4
5
SID
--39
47
SELECT
sid, event, state, seconds_in_wait,
wait_time, p1, p2, p3
FROM
v$session_wait
WHERE
event = 'direct path write'
ORDER BY sid;
EVENT
----------------direct path write
direct path write
STATE SEC TIME
P1
P2
P3
----- --- ---- ----- ----- ----WAITI
0
0
201
65
7
WAITI
0
0
201 2248
7
89
Finding Which Files Are
Being Accessed
SQL> SELECT tablespace_name, file_id "AFN"
2 FROM
dba_data_files
3 WHERE file_id = 201;
no rows selected
SQL>
2
3
4
SELECT
FROM
WHERE
AND
tablespace_name, file_id + value "AFN"
dba_temp_files, v$parameter
name = 'db_files'
file_id + value = 201;
TABLESPACE_NAME
AFN
------------------------------ ---------TEMP
201
90
Problem Resolution
 Increased sort_area_size:
– Was set to default of 65536
– Increased to 10485760 (few concurrent
sessions)
 If that had not solved the problem:
– Tune application code to reduce sorting
– Check for other active files on disks holding
temp files
– Move temp files to a striped volume
91
What We Learned from
Wait Event Information
 Direct path I/O accounted for 75% of the non-idle
event wait time on the system.
 Multi-block reads accounted for 24% of the nonidle event wait time—not unusual in a data
warehouse environment.
 Random samples showed direct path I/O involved
the temporary tablespace.
 Wait events pointed us directly to the area within
Oracle that needed adjustment.
92
Logical vs. Physical Reads
93
Logical vs. Physical Reads
 During the Database Rx sample interval
there were more physical reads than logical
reads.
 Direct path reads count as physical reads
but not logical reads.
 Be careful how you compute your buffer
cache hit ratios—in this example you might
come up with a negative figure!
94
Example #5:
Database Link Wait Events
A company had five Oracle databases, one
per region. Due to human error, the same
customer transactions would sometimes get
loaded into multiple databases. A report was
built to identify these duplicates, but it took
30 minutes to run.
95
Isolating a Query and
Analyzing Its Wait Events
 Start a new database session in SQL*Plus or a
similar tool.
 Run the query.
 Monitor the session’s wait events and statistics
from another session:
– v$session_event
– v$sesstat
 This is a handy technique when you know which
statement is the bottleneck.
96
Query Output from
v$session_event
SQL>
2
3
4
SELECT
FROM
WHERE
ORDER BY
event, total_waits, time_waited, max_wait
v$session_event
sid = 47
event;
EVENT
TOTAL_WAITS TIME_WAITED
MAX_WAIT
--------------------------- ----------- ----------- ---------SQL*Net message from client
32
4435
2432
SQL*Net message from dblink
1525516
104919
31
SQL*Net message to client
33
0
0
SQL*Net message to dblink
1525516
466
9
db file sequential read
27199
8025
28
latch free
40
5
4
log file sync
1
2
2
97
Query Output from v$sesstat
SQL>
2
3
4
5
SELECT
FROM
WHERE
AND
AND
A.name, B.value
v$statname A, v$sesstat B
A.statistic# = 12
B.statistic# = A.statistic#
B.sid = 47;
NAME
VALUE
------------------------------ ---------CPU used by this session
67937
98
What We See In the v$ Data
 1.5 million waits on network roundtrips
through a database link: 1053 seconds
– Network latency
– Time for the remote database to respond to
each request
 27,000 waits for single-block disk reads: 80
seconds
99
The Query We Are Studying
SELECT
customer_id, batch_serial_number, batch_date,
load_date, batch_comment, control_total
FROM
customer_xfer_batches A
WHERE
exists
(SELECT 1
FROM
customer_xfer_batches@prdwest B
WHERE B.customer_id = A.customer_id
AND
B.batch_serial_number =
A.batch_serial_number)
ORDER BY customer_id, batch_serial_number;
100
The Query We Are Studying
Execution Plan
---------------------------------------------------------0
SELECT STATEMENT
1 0
FILTER
2 1
TABLE ACCESS (BY INDEX ROWID) OF 'CUSTOMER_XFER_BATCHES'
3 2
INDEX (FULL SCAN) OF 'CUST_XFER_BAT_PK' (UNIQUE)
4 1
REMOTE* PRDWEST
4 SERIAL_FROM_REMOTE SELECT "CUSTOMER_ID","BATCH_SERIAL_NUMBER"
FROM "CUSTOMER_XFER_BATCHES" "B"
WHERE "BATCH_SERIAL_NUMBER"=:1
AND "CUSTOMER_ID"=:2
101
CUSTOMER_XFER_BATCHES
SQL> SELECT blocks, num_rows
2 FROM
user_tables
3 WHERE table_name =
4
'CUSTOMER_XFER_BATCHES';
BLOCKS NUM_ROWS
------ -------21825 1526003
102
What We Have Learned So Far
 Oracle is doing a full scan of the index on
the local table and fetching each row one at
a time
– This does avoid a sort
– Very high price to pay to skip sorting a few rows
 Oracle is doing one remote query for each
row fetched from the local table
103
Problem Resolution - Part 1
SELECT
customer_id, batch_serial_number, batch_date,
load_date, batch_comment, control_total
FROM
customer_xfer_batches
WHERE
(customer_id, batch_serial_number) IN
(SELECT customer_id, batch_serial_number
FROM
customer_xfer_batches
INTERSECT
SELECT customer_id, batch_serial_number
FROM
customer_xfer_batches@prdwest)
ORDER BY customer_id, batch_serial_number;
104
Query Output from
v$session_event
SQL>
2
3
4
SELECT
FROM
WHERE
ORDER BY
event, total_waits, time_waited, max_wait
v$session_event
sid = 49
event;
EVENT
TOTAL_WAITS TIME_WAITED
MAX_WAIT
--------------------------- ----------- ----------- ---------SQL*Net message from client
46
3680
2481
SQL*Net message from dblink
24
31
18
SQL*Net message to client
47
0
0
SQL*Net message to dblink
24
0
0
SQL*Net more data from dbli
5978
1337
13
db file scattered read
3430
675
8
db file sequential read
182
60
2
direct path read
148
233
11
direct path write
920
3572
33
105
Query Output from v$sesstat
SQL>
2
3
4
5
SELECT
FROM
WHERE
AND
AND
A.name, B.value
v$statname A, v$sesstat B
A.statistic# = 12
B.statistic# = A.statistic#
B.sid = 49;
NAME
VALUE
------------------------------ ---------CPU used by this session
3227
106
What We See in the v$ Data Now
 24 network roundtrips through a database
link instead of 1.5 million: 14 seconds (down
from 1053)
– Fewer, larger network packets
– Fewer requests to remote database
 3,600 waits on mostly multi-block disk reads
instead of 27,000 waits on single-block disk
reads: 7 seconds (down from 80)
– Fewer multi-block reads instead of many single107
block reads
What We See in the v$ Data Now
 1100 waits on direct path I/O: 38 seconds
(new)
– Sorting to implement the INTERSECT operation
 32 seconds of CPU time (down from 679)
– Fewer logical reads and network roundtrips
Elapsed time: 92 seconds (down from
31 minutes)
108
Iterative Tuning
 Curing one bottleneck often reveals or
creates another, smaller bottleneck.
 Repeat the wait event evaluation process
after each change until performance goals
are met.
 In this situation, a 95% reduction in runtime
from 31 minutes to 92 seconds still did not
meet the performance goal.
109
What We Have So Far
 Rewritten query completes in 92 seconds:
– 32 CPU seconds
– 38 seconds of wait on direct path I/O
– 14 seconds of wait on network roundtrips
– 7 seconds of wait on multi-block and singleblock reads
110
Problem Resolution - Part 2
 Eliminating or speeding up direct path I/O
seems like the logical next step:
– sort_area_size set to 1 Mb
– Try dynamically changing it to 100 Mb?
111
Query Output from
v$session_event
SQL>
2
3
4
SELECT
FROM
WHERE
ORDER BY
event, total_waits, time_waited, max_wait
v$session_event
sid = 46
event;
EVENT
TOTAL_WAITS TIME_WAITED
MAX_WAIT
--------------------------- ----------- ----------- ---------SQL*Net message from client
47
442
287
SQL*Net message from dblink
25
25
14
SQL*Net message to client
48
0
0
SQL*Net message to dblink
25
0
0
SQL*Net more data from dbli
6050
1378
26
db file scattered read
3430
945
8
db file sequential read
191
59
1
log file sync
1
3
3
112
Query Output from v$sesstat
SQL>
2
3
4
5
SELECT
FROM
WHERE
AND
AND
A.name, B.value
v$statname A, v$sesstat B
A.statistic# = 12
B.statistic# = A.statistic#
B.sid = 46;
NAME
VALUE
------------------------------ ---------CPU used by this session
3296
113
What We See in the v$ Data Now
 Waits on network roundtrips through a
database link, multi-block reads, and singleblock reads unchanged
 CPU time used unchanged
 Direct path I/O waits eliminated completely
– Entire sort now performed in memory
Elapsed time: 55 seconds (down from 92)
114
What We Learned from
Wait Event Information
 A query ran slowly due to excessive network
roundtrips and single-block reads.
 After these problems were corrected, 40% of the
query execution time was devoted to sorting to
disk.
 Wait events showed us how Oracle was spending
its time while executing the query, helping us
improve the query’s performance in an iterative
fashion.
115
A Summary Of Wait Event
Techniques
 Using Statspack snapshots and reports to
analyze wait events at the instance level
 Polling v$session_wait to determine which
buffers or latches have contention
 Enabling wait event tracing in a session
 Using Oracle 9i TKPROF to tabulate waits
at the statement level within one session
116
A Summary Of Wait Event
Techniques (continued)
 Collecting wait event data for a session or
the entire instance at two different times and
computing the difference to find the wait
events during a specific period of time
 Ranking cumulative wait event data in order
to see which wait events account for the
most wait time
 Isolating a statement and analyzing its wait
events
117
Send Us Your Wait Event Puzzles
 We are always looking for interesting wait
event situations to learn from!
 If you are trying to diagnose a problem using
the wait event interface, feel free to email us
wait events data and a problem description.
 We’ll do our best to look over what you send
us and share our thoughts with you.
118
The White Paper
A companion white paper to this presentation
is available for free download from our
company’s website at:
www.dbspecialists.com/presentations.html
119
Resources from
Database Specialists
 The Specialist newsletter
– www.dbspecialists.com/specialist.html
 Database Rx®
– dbrx.dbspecialists.com/guest
• Provides secure, automated monitoring, alert
notification, and analysis of your Oracle
databases
120
In Conclusion
 The wait event interface gives you access to a
detailed accounting of how Oracle processes
spend their time.
 Wait events touch all aspects of the Oracle
database server.
 The wait event interface will not always give you
the answer to every performance problem, but it
will just about always give you insights that guide
you down the proper path to problem resolution.
121
Contact Information
Roger Schrag
[email protected]
Terry Sutton
[email protected]
Database Specialists, Inc.
388 Market Street, Suite 400
San Francisco, CA 94111
Tel: 415/344-0500
Web: www.dbspecialists.com
122