Interpreting Wait Events To Boost System Performance

Download Report

Transcript Interpreting Wait Events To Boost System Performance

Interpreting Wait Events To Boost System Performance Roger Schrag Database Specialists, Inc.

www.dbspecialists.com

1

Session Objectives

   Define wait events Discuss how to use the wait event interface Walk through four examples of how wait event information was used to troubleshoot production problems 2

“Wait Event” Defined

 We say an Oracle process is “busy” when it wants CPU time.

 When an Oracle process is not busy, it is waiting for something to happen.

 There are only so many things an Oracle process could be waiting for, and the kernel developers at Oracle have attached names to them all.

 These are wait events.

3

Wait Event Examples

 An Oracle process waiting for the client application to submit a SQL statement waits on a “SQL*Net message from client” event.

 An Oracle process waiting on another session to release a row-level lock waits on an “enqueue” event.

4

Wait Event Interface

 Each Oracle process identifies the event it is waiting for each time a wait begins.

 The instance collects cumulative statistics about events waited upon since instance startup.

 You can access this information through v$ views and tracing events.

 These make up the wait event interface.

5

Viewing Wait Events

http://dbrx.dbspecialists.com/pls/dbrx/view_report 6

  

Why Wait Event Information Is Useful

Wait events touch all areas of Oracle —from I/O to latches to parallelism to network traffic.

Wait event data can be remarkably detailed. “Waited 0.02 seconds to read 8 blocks from file 42 starting at block 18042.” Analyzing wait event data will yield a path toward a solution for almost any problem.

7

Important Wait Events

  There were 102 wait events in Oracle 7.3.

There are 217 wait events in Oracle 8i Release 3 (8.1.7).

 Most come up infrequently or are rarely significant for troubleshooting performance.

 Different wait events are significant in different environments, depending on which Oracle features have been deployed.

8

A Few Common Wait Events

enqueue library cache pin library cache load lock latch free buffer busy waits control file sequential read control file parallel write log buffer space log file sequential read log file parallel write log file sync db file scattered read db file sequential read db file parallel read db file parallel write direct path read / write

9

Idle Events

 Sometimes an Oracle process is not busy simply because it has nothing to do.

 In this case the process will be waiting on an event that we call an “idle event.”  Idle events are usually not interesting from the tuning and troubleshooting perspective.

10

Common Idle Events

client message dispatcher timer Null event smon timer PX Idle Wait pipe get PL/SQL lock timer pmon timer parallel query dequeue rdbms ipc message SQL*Net message from client SQL*Net message to client SQL*Net more data from client wakeup time manager virtual circuit status lock manager wait for remote message

11

Accounted For By The Wait Event Interface

 Time spent waiting for something to do (idle events)  Time spent waiting for something to happen so that work may continue (non-idle events) 12

Not Accounted For By The Wait Event Interface

 Time spent using a CPU   Time spent waiting for a CPU Time spent waiting for virtual memory to be swapped back into physical memory

Note that logical reads from the buffer cache do not appear in the wait event interface.

13

Timed Statistics

The wait event interface will not collect timing information unless timed statistics are enabled.

 Enable timed statistics dynamically at the instance or session level:

ALTER SYSTEM SET timed_statistics = TRUE; ALTER SESSION SET timed_statistics = TRUE;

 Enable timed statistics at instance startup by setting the instance parameter:

timed_statistics = true

14

The Wait Event Interface

 Dynamic performance views – v$system_event – v$session_event – v$event_name – v$session_wait  System event 10046 15

The v$system_event View

Shows one row for each wait event name, along with cumulative statistics since instance startup. Wait events that have not occurred at least once since instance startup do not appear in this view.

Column Name Data Type -------------------------- ----------- EVENT VARCHAR2(64) TOTAL_WAITS NUMBER TOTAL_TIMEOUTS NUMBER TIME_WAITED NUMBER AVERAGE_WAIT NUMBER

16

Columns In v$system_event

    

EVENT:

The name of a wait event

TOTAL_WAITS:

Total number of times a process has waited for this event since instance startup

TOTAL_TIMEOUTS:

Total number of timeouts while waiting for this event since instance startup

TIME_WAITED:

Total time waited for this wait event by all processes since instance startup (in centiseconds)

AVERAGE_WAIT:

The average length of a wait for this event since instance startup (in centiseconds) 17

Sample v$system_event Query

SQL> SELECT event, time_waited 2 FROM v$system_event 3 WHERE event IN ('smon timer', 4 'SQL*Net message from client', 5 'db file sequential read', 6 'log file parallel write'); EVENT TIME_WAITED --------------------------------- ---------- log file parallel write 159692 db file sequential read 28657 smon timer 130673837 SQL*Net message from client 16528989

18

The v$session_event View

Shows one row for each wait event name within each session, along with cumulative statistics since session start.

Column Name Data Type -------------------------- ----------- SID NUMBER EVENT VARCHAR2(64) TOTAL_WAITS NUMBER TOTAL_TIMEOUTS NUMBER TIME_WAITED NUMBER AVERAGE_WAIT NUMBER MAX_WAIT NUMBER

19

Columns In v$session_event

      

SID:

The ID of a session (from v$session)

EVENT:

The name of a wait event

TOTAL_WAITS:

Total number of times this session has waited for this event

TOTAL_TIMEOUTS:

Total number of timeouts while this session has waited for this event

TIME_WAITED:

Total time waited for this event by this session (in centiseconds)

AVERAGE_WAIT:

The average length of a wait for this event in this session (in centiseconds)

MAX_WAIT:

The maximum amount of time the session had to wait for this event (in centiseconds) 20

Sample v$session_event Query

SQL> SELECT event, total_waits, time_waited 2 FROM v$session_event 3 WHERE SID = 4 (SELECT sid FROM v$session 5 WHERE audsid = 6 USERENV ('sessionid') ); EVENT WAITS TIME_WAITED --------------------------- ----- ---------- db file sequential read 552 240 db file scattered read 41 31 SQL*Net message to client 73 0 SQL*Net message from client 72 339738

21

The v$event_name View

Shows one row for each wait event name known to the Oracle kernel, along with names of up to three parameters associated with the wait event.

Column Name Data Type -------------------------- ----------- EVENT# NUMBER NAME VARCHAR2(64) PARAMETER1 VARCHAR2(64) PARAMETER2 VARCHAR2(64) PARAMETER3 VARCHAR2(64)

22

Columns In v$event_name

  

EVENT#:

An internal ID

NAME:

The name of a wait event

PARAMETERn:

The name of a parameter associated with the wait event 23

Sample v$event_name Query

SQL> SELECT * 2 FROM v$event_name 3 WHERE name = 'db file scattered read'; EVENT# NAME ---------- ----------------------------- PARAMETER1 PARAMETER2 PARAMETER3 ------------- ------------- ------------ 95 db file scattered read file# block# blocks

24

The v$session_wait View

Shows one row for each session, providing detailed information about the current or most recent wait event.

Column Name Data Type -------------------------- ----------- SID NUMBER SEQ# NUMBER EVENT VARCHAR2(64) P1TEXT VARCHAR2(64) P1 NUMBER P1RAW RAW(4) P2TEXT VARCHAR2(64) P2 NUMBER P2RAW RAW(4) P3TEXT VARCHAR2(64) P3 NUMBER P3RAW RAW(4) WAIT_TIME NUMBER SECONDS_IN_WAIT NUMBER STATE VARCHAR2(19)

25

Columns In v$session_wait

  

SID:

The ID of a session

SEQ#:

A number that increments by one on each new wait

STATE:

– – – An indicator of the session status:

‘WAITING’:

The session is currently waiting, and details of the wait event are provided.

‘WAITED KNOWN TIME’:

The session is not waiting, but information about the most recent wait is provided.

‘WAITED SHORT TIME’ TIME’:

or

‘WAITED UNKNOWN

The session is not waiting, but partial information about the most recent wait is provided.

26

     

Columns In v$session_wait (continued)

EVENT:

The name of a wait event

PnTEXT:

The name of a parameter associated with the wait event

Pn:

The value of the parameter in decimal form

PnRAW:

The value of the parameter in raw form

WAIT_TIME:

Length of most recent wait (in centiseconds) if STATE = ‘WAITED KNOWN TIME’

SECONDS_IN_WAIT:

How long current wait has been so far if STATE = ‘WAITING’ 27

Sample v$session_wait Query

SQL> SELECT * FROM v$session_wait WHERE sid = 16; SID SEQ# EVENT ---- ----- ----------------------------- P1TEXT P1 P1RAW P2TEXT P2 P2RAW ------ ---- -------- ------ ---- ------- P3TEXT P3 P3RAW WAIT_TIME SECONDS_IN_WAIT ------ ---- -------- --------- -------------- STATE ------------------ 16 303 db file scattered read file# 17 00000011 block# 2721 00000AA1 blocks 8 00000008 -1 0 WAITED SHORT TIME

28

System Event 10046

Setting event 10046 enables SQL trace, and can optionally include wait event information and bind variable data in trace files as well.

   Methods for setting system events:  “event” instance parameter dbms_system.set_ev

oradebug

ALTER SESSION SET events

29

System Event 10046 Settings

ALTER SESSION SET events '10046 trace name context forever, level N’;

Value of N Effect 1 Enables ordinary SQL trace 4 8 Enables SQL trace with bind variable values included in trace file Enables SQL trace with wait event information included in trace file 12 Equivalent of level 4 and level 8 together 30

Sample Trace Output

===================== PARSING IN CURSOR #1 len=80 dep=0 uid=502 oct=3 lid=502 tim=2293771931 hv=2293373707 ad='511dca20' SELECT /*+ FULL */ SUM (LENGTH(notes)) FROM customer_calls WHERE status = :x END OF STMT PARSE #1:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=0,tim=2293771931 BINDS #1: bind 0: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=03 oacfl2=0 size=24 offset=0 bfp=09717724 bln=22 avl=02 flg=05 value=43 EXEC #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=2293771931 WAIT #1: nam='SQL*Net message to client' ela= 0 p1=675562835 p2=1 p3=0 WAIT #1: nam='db file scattered read' ela= 3 p1=17 p2=923 p3=8 WAIT #1: nam='db file scattered read' ela= 1 p1=17 p2=931 p3=8 WAIT #1: nam='db file scattered read' ela= 2 p1=17 p2=939 p3=8 WAIT #1: nam='db file sequential read' ela= 0 p1=17 p2=947 p3=1 WAIT #1: nam='db file scattered read' ela= 3 p1=17 p2=1657 p3=8

31

Using Wait Event Information

Four examples of how wait event information was used to diagnose production problems 32

Example #1: A Slow Web Page

A dynamic web page took several seconds to come up. Developers tracked the bottleneck down to one query. The execution plan showed that the query was using an index, so the developers thought there might be a “database problem.” 33

The Slow Query

SELECT COUNT (*) FROM customer_inquiries WHERE status_code = :b1 AND status_date > :b2; Execution Plan --------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE 1 0 SORT (AGGREGATE) 2 1 TABLE ACCESS (BY INDEX ROWID) OF 'CUSTOMER_INQUIRIES' 3 2 INDEX (RANGE SCAN) OF 'CUSTOMER_INQUIRIES_N2' (NON-UNIQUE)

The CUSTOMER_INQUIRIES_N2 index was a concatenated index with status_code as its first column. The status_date column was not indexed.

34

Wait Events For One User’s Session

A query against v$session_event after the query ran in isolation yielded:

TOTAL TIME EVENT WAITS WAITED ------------------------------ ----- ----- db file scattered read 15 3 db file sequential read 6209 140 latch free 2 1 SQL*Net message to client 8 0 SQL*Net message from client 7 21285

35

The Path To Problem Resolution

  What we learned from wait event information: – The query performed a large number of index lookups. – 1.40 seconds were spent waiting on the index lookups, plus any CPU overhead.

Areas to research further: – – – Was the database server CPU starved? Was the index lookup selective? Idea: Modify the query to use a full table scan instead of the index.

36

Research Results

 The database server was CPU starved. The run queue length often exceeded twice the number of CPUs on the server.

 Using just the status_code column of the CUSTOMER_INQUIRIES_N2 index made for a very unselective index lookup. Over 90% of the rows in the table had a status code of 12.

 A full table scan against CUSTOMER_INQUIRIES appeared to run faster than using the index.

37

Problem Resolution

A query against v$session_event after the modified query ran in isolation yielded:

TOTAL TIME EVENT WAITS WAITED ------------------------------ ----- ----- db file scattered read 460 13 db file sequential read 3 1 latch free 1 0 SQL*Net message to client 10 0 SQL*Net message from client 9 18317

38

Analyzing The Results

 The rule of thumb that a full table scan is better than a scan of an unselective index is true.

 I/O systems can perform a few multi-block I/O requests much faster than many single-block I/O requests.

 Physical reads require a small amount of CPU time. Lack of available CPU can make an I/O intensive statement run even slower, although the wait event interface will not show this.

39

Example #2: Slow Batch Processing

An additional data feed program was added to the nightly batch processing job queue, and the overnight processing no longer finished before the morning deadline. More CPUs were added to the database server, but this did not improve processing speed significantly.

40

Summarizing Wait Events During A Period Of Time

 v$system_event shows wait event totals since instance startup.

 v$session_event shows wait event totals since the beginning of a session.

 You can capture view contents at different points in time and compute the delta in order to get wait event information for a specific period of time.

 Statspack and many third-party tools can do this for you.

41

Simple Script To See Wait Events During A 30 Second Time Period

CREATE TABLE previous_events AS SELECT SYSDATE timestamp, v$system_event.* FROM v$system_event; EXECUTE dbms_lock.sleep (30); SELECT A.event, A.total_waits

- NVL (B.total_waits, 0) total_waits, A.time_waited

- NVL (B.time_waited, 0) time_waited FROM v$system_event A, previous_events B WHERE B.event (+) = A.event

ORDER BY A.event;

42

Wait Events During 30 Seconds Of Batch Processing

EVENT TOTAL_WAITS TIME_WAITED ------------------------------ ----------- ---------- LGWR wait for redo copy 115 41 buffer busy waits 53 26 control file parallel write 45 44 db file scattered read 932 107 db file sequential read 76089 6726 direct path read 211 19 direct path write 212 15 enqueue 37 5646 free buffer waits 11 711 latch free 52 44 log buffer space 2 8 log file parallel write 4388 1047 log file sequential read 153 91 log file single write 2 6 log file switch completion 2 24 write complete waits 6 517

43

The Path To Problem Resolution

  What we learned from wait event information: – – There appeared to be significant lock contention. In 30 seconds of elapsed time, sessions spent over 56 seconds waiting for locks.

Areas to research further: – What type of locks are being waited on? Row-level locks? Table-level locks? Others?

– If the locks are table-level or row-level, then which database tables are experiencing contention? Which SQL statements are causing the contention?

44

Tracing Waits In A Session

The following commands were used to enable wait event tracing in the process with Oracle PID 13:

SQL> oradebug setorapid 13 Unix process pid: 19751, image: [email protected] (TNS V1-V3) SQL> oradebug session_event – > 10046 trace name context forever, level 8 Statement processed.

SQL>

45

Trace File Contents

EXEC #5:c=0,e=0,p=3,cr=2,cu=1,mis=0,r=1,dep=1,og=4,tim=2313020980 XCTEND rlbk=0, rd_only=0 WAIT #1: nam='write complete waits' ela= 11 p1=3 p2=2 p3=0 WAIT #4: nam='db file sequential read' ela= 4 p1=10 p2=12815 p3=1 WAIT #4: nam='db file sequential read' ela= 1 p1=10 p2=12865 p3=1 WAIT #4: nam='db file sequential read' ela= 5 p1=3 p2=858 p3=1 ===================== PARSING IN CURSOR #4 len=65 dep=1 uid=502 oct=6 lid=502 tim=2313021001 hv=417623354 ad='55855844' UPDATE CUSTOMER_CALLS SET ATTR_3 = :b1 WHERE CUSTOMER_CALL_ID=:b2 END OF STMT EXEC #4:c=1,e=10,p=3,cr=2,cu=3,mis=0,r=1,dep=1,og=4,tim=2313021001 WAIT #4: nam='db file sequential read' ela= 0 p1=10 p2=5789 p3=1 WAIT #4: nam='enqueue' ela= 307 p1=1415053318 p2=196705 p3=6744 WAIT #4: nam='enqueue' ela= 307 p1=1415053318 p2=196705 p3=6744 WAIT #4: nam='enqueue' ela= 53 p1=1415053318 p2=196705 p3=6744 WAIT #4: nam='db file sequential read' ela= 0 p1=10 p2=586 p3=1 WAIT #4: nam='db file sequential read' ela= 1 p1=3 p2=858 p3=1 EXEC #4:c=0,e=668,p=3,cr=5,cu=3,mis=0,r=1,dep=1,og=4,tim=2313021669

46

Understanding The enqueue Wait Event

SQL> SELECT parameter1,parameter2,parameter3 2 FROM v$event_name 3 WHERE name = 'enqueue'; PARAMETER1 PARAMETER2 PARAMETER3 ------------ ------------ ----------- name|mode id1 id2 SQL> SELECT CHR (1415053318/65536/256) || 2 CHR (MOD (1415053318/65536, 256)), 3 MOD (1415053318, 65536) lock_mode 4 FROM SYS.dual; CH LOCK_MODE -- --------- TX 6

47

Analyzing The Results

 Contention for exclusive locks on rows in the customer_calls table was responsible for substantial delays in processing.

 Looking at the row_wait_obj# and row_wait_row# columns in v$session would have identified the exact rows undergoing contention.

48

Problem Resolution

 Multiple programs were attempting to update the same rows in tables at the same time. Contention could be reduced by doing one or more of the following: – Running conflicting programs separately – Reducing lock scope – Reducing lock duration 49

Example #3: A Slow Client/Server Application

A client/server application was taking several seconds to bring up a certain screen. The delay was occurring during startup before the user had a chance to kick off a query. The only thing happening in the form on startup was some fetching of basic reference data. All of the SQL had been tuned and was known to run very quickly.

50

Manipulating timed_statistics

The timed_statistics parameter can be changed at any time at the session level with the following commands:

ALTER SESSION SET timed_statistics = TRUE; ALTER SESSION SET timed_statistics = FALSE;

Manipulate timed_statistics to collect wait event times during certain specific points of processing for debugging purposes.

51

Wait Events During Form Startup

Modifying the form to disable timed_statistics at the end of the form startup logic yielded the following information in v$session_event:

TOTAL TIME EVENT WAITS WAITED ------------------------------ ----- ----- SQL*Net message to client 18520 6 SQL*Net message from client 18519 1064

v$sesstat showed the following:

NAME VALUE ------------------------------ --------- session logical reads 9295 CPU used by this session 82 physical reads 0

52

The Path To Problem Resolution

  What we learned from wait event information: – There were over 18,000 network roundtrips during form startup, almost exactly two for every logical read.

– The Oracle process spent over 10 seconds waiting for activity from the client. Since timed statistics were disabled at the end of the form startup logic, this does not include time spent waiting on the end user.

Areas to research further: – How many rows of data does the form read from the database during the startup phase?

– – Does the form really need to fetch all of this data? Is the form fetching one row at a time or is it using Oracle’s array processing interface?

53

Research Results

 The form was fetching 9245 rows of reference data during startup.

 All of this data was necessary; none could be eliminated.

 All data was fetched one row at a time.

54

Problem Resolution

The startup logic of the form was modified to fetch 100 rows at a time. This yielded the following information in v$session_event:

TOTAL TIME EVENT WAITS WAITED ------------------------------ ----- ----- SQL*Net message to client 200 0 SQL*Net message from client 199 28

v$sesstat showed the following:

NAME VALUE ------------------------------ --------- session logical reads 135 CPU used by this session 3 physical reads 0

55

Analyzing The Results

 Fetching rows 100 at a time instead of one at a time dramatically reduced network roundtrips.

 Reducing network roundtrips reduced time spent waiting on the network.

 Fetching rows 100 at a time also significantly reduced the number of logical reads, and therefore the amount of CPU time required.

56

Example #4: A Floundering Database Server

The DBA group discovered that one of the database servers was completely overwhelmed. Connecting to the database took a few seconds, selecting from SYS.dual took more than a second. Everything on the system ran very slowly.

57

Longest Waits In v$system_event

EVENT TIME_WAITED ------------------------------ --------------- log file sync 326284 write complete waits 402284 control file parallel write 501697 db file scattered read 612671 db file sequential read 2459961 pmon timer 31839833 smon timer 31974216 db file parallel write 1353916234 rdbms ipc message 6579264389 latch free 8161581692 SQL*Net message from client 15517359160

58

The Path To Problem Resolution

 What we learned from wait event information: – – Most of the waits involved idle events or I/O events.

A large amount of time was spent waiting on latches.

 Areas to research further: – – How long has the instance been up?

Which latches are experiencing contention?

59

Research Results

  The instance had been up for about seven days.

The latch contention was in the shared pool and library cache, as evidenced by a query against v$latch_misses:

PARENT_NAME SUM(LONGHOLD_COUNT) ------------------------------ ------------------ enqueue hash chains 614 enqueues 637 Checkpoint queue latch 790 session allocation 1131 messages 1328 session idle bit 2106 latch wait list 5977 modify parameter values 6242 cache buffers chains 9876 row cache objects 38899 cache buffers lru chain 125352 shared pool 4041451 library cache 4423229

60

Further Research Results

  The shared pool was 400 Mb in size.

There were over 36,000 statements in the shared pool, almost all executed exactly once.

  The application was not using bind variables.

Modifying the application to use bind variables was not an option.

 Setting the cursor_sharing parameter to FORCE was also not an option.

61

Problem Resolution

Bigger is not always better!

Reducing the shared pool to 100 Mb provided plenty of space for sharable statements while reducing the effort required by Oracle to maintain the library cache LRU list. This reduced latch contention and boosted performance.

62

A Summary Of Wait Event Techniques

 Isolating a statement and analyzing its wait events  Collecting wait event data for a session or the entire instance at two different times and computing the difference to find the wait events during a specific period of time  Enabling wait event tracing in a session 63

A Summary Of Wait Event Techniques (continued)

 Enabling and disabling timed statistics dynamically to measure wait event times for a specific section of code Ranking cumulative wait event data in order to see which wait events account for the most wait time 64

In Conclusion

  Wait events touch all aspects of the Oracle database server.

 The wait event interface gives you access to a detailed accounting of how Oracle processes spend their time.

The wait event interface will not always give you the answer to every performance problem, but it will just about always give you insights that guide you down the proper path to problem resolution.

65

The White Paper

A companion white paper to this presentation is available for free download from my company’s website at: www.dbspecialists.com/present.html

66

Resources from Database Specialists

 

The Specialist

newsletter – www.dbspecialists.com/specialist.html

Database Rx ® – dbrx.dbspecialists.com/guest •

Provides secure, automated monitoring, alert notification, and analysis of your Oracle databases

67

Contact Information

Roger Schrag

Database Specialists, Inc.

388 Market Street, Suite 400 San Francisco, CA 94111 Tel: 415/344-0500 Email: [email protected]

Web: www.dbspecialists.com

68