Transcript Slide 1

Session S317114:
What Else Can I Do with System and Session
Performance Data
Thomas Kyte
http://asktom.oracle.com/
1
Agenda
• History
• Key Concepts
• AWR
– How it works
– What else you can do with it beyond the GUI
• ASH
– How it works
– What else you can do with it beyond the GUI
• Command Line Reporting
2
Historical
Perspective
3
Oracle Tuning Methods
• Prehistoric (v5)
– Debug code
• Dark Ages (v6)
– Counters/Ratios
– BSTAT/ESTAT
– SQL*Trace
• Renaissance (v7)
– Introduction of Wait Event instrumentation
– Move from counters to timers
– STATSPACK
• Modernity (v10)
–
–
–
–
DB Time Tuning – Comprehensive tuning using fundamental notion of time in database
Multiple scoping levels
Always on, non-intrusive
Built into infrastructure: instrumentation, ASH, AWR, ADDM, EM
4
Key Concepts
5
Database Time (DB Time)
• Total time in database calls by foreground sessions
• Includes CPU time, IO time and non-idle wait time
• DB Time <> response time
• New metric for Oracle performance analysis
Database time is total time spent by user processes
either actively working or actively waiting in a database
call.
6
Fundamental Concepts
Active Session = Session currently spending time in a database call
Database Time (DB Time) = Total time session spent in all database calls
Average Activity of the Session (% Activity) = The ratio of time active to
total wall-clock time
Browse
Read Reviews
Books
For One Book
= time spent in database
Add to Cart
Checkout
TIME
7
Multiple Sessions
DB Time = Sum of DB Time Over All Sessions
Avg. Active Sessions = Sum of Avg. Activity Over All Sessions
=
Total Database Time
Wall-Clock (Elapsed) Time
At time t we have 2 active sessions
User 1
User 2
User 3
User n
= time spent in database
t
TIME
8
Key Infrastructure
Components - AWR
9
Automatic Workload Repository (AWR)
• Built-in workload and performance statistics repository in the
database
• Automatically Captures Workload Data
– Every 60 minutes, or manually, saves data for 8 days by default
• Resides in SYSAUX tablespace
• Space requirements automatically managed
– Old data is automatically purged nightly based on retention interval
• Stores different classes of data:
–
–
–
–
BASE STATISTICS e.g., physical reads
SQL STATISTICS e.g., disk reads (per sql stmt)
METRICS e.g., physical reads / sec
ACTIVE SESSION HISTORY (ASH)
10
Automatic Workload Repository (AWR)
• Every N-Units of time, data is flushed from memory to disk (a
snapshot)
• You can generate reports that cover any range of time (n-units of
time at a time)
• We simply “subtract”
Shutdown/startup
T1
T2
T3
T4
You can report on:
T2-T1
T3-T2
T3-T1
T4-T3
T4-T2
T4-T1
select * from
dba_hist_snapshot;
11
How much and how often…
ops$tkyte%ORA11GR2> select snap_interval, retention from dba_hist_wr_control;
SNAP_INTERVAL
RETENTION
------------------- ------------------+00000 01:00:00.0
+00008 00:00:00.0
ops$tkyte%ORA11GR2> begin
2
dbms_workload_repository.modify_snapshot_settings
3
( interval => 20, retention => 14*24*60 );
4 end;
5 /
PL/SQL procedure successfully completed.
ops$tkyte%ORA11GR2> select snap_interval, retention from dba_hist_wr_control;
SNAP_INTERVAL
RETENTION
------------------- ------------------+00000 00:20:00.0
+00014 00:00:00.0
12
How do we access…
• Underlying tables (WRH$ tables in the SYS schema) utilize
partitioning (even if you don’t have it) when sensible
– In support of the data purging
• We use the DBA_HIST_* views
– Documented in the Server Reference Guide
ops$tkyte%ORA11GR2> select view_name from dba_views
2 where owner = 'SYS' and view_name like 'DBA\_HIST\_%' escape '\';
VIEW_NAME
-----------------------------DBA_HIST_ACTIVE_SESS_HISTORY
…
DBA_HIST_WR_CONTROL
108 rows selected.
13
They are snapshots, not movies…
• Remember – these are point in time snapshots
• DBA_HIST_SQLTEXT will not be 100% complete for example
– especially if you have a poorly written application
T1
T2
SQL Enters
SQL Leaves
Shared Pool
Shared Pool
T3
T4
14
Metrics (rates of change) everywhere…
• Many metrics are available
• Joinable to DBA_HIST_SYSMETRIC_SUMMARY (long duration)
• Or to DBA_HIST_SYSMETRIC_HISTORY (all recorded data, shorter term)
ops$tkyte%ORA11GR2> select group_name, metric_name, metric_unit from dba_hist_metric_name
2 order by group_name, metric_name;
GROUP_NAME
-------------------------------Event Class Metrics
Event Class Metrics
Event Class Metrics
Event Class Metrics
…
WLM PC Metrics
METRIC_NAME
--------------------------------------------Average Users Waiting Counts
Database Time Spent Waiting (%)
Total Time Waited
Total Time Waited (Foreground)
METRIC_UNIT
-----------------------Users
% (TimeWaited / DBTime)
CentiSeconds
CentiSeconds
User Calls Per Second
User Calls Per Second
278 rows selected.
15
Metrics (rates of change) everywhere…
• We’d like to see some metric
• For some period of time
• With the interval that metric is valid
for
– Numerically
– “Graphically”
BEGIN_TIME
-------------------14-sep-2010 01:00:27
14-sep-2010 01:59:56
14-sep-2010 02:59:27
14-sep-2010 04:00:54
14-sep-2010 05:00:19
14-sep-2010 05:59:49
14-sep-2010 06:59:06
14-sep-2010 07:58:38
14-sep-2010 09:00:00
14-sep-2010 10:00:55
14-sep-2010 10:59:46
14-sep-2010 12:00:52
14-sep-2010 12:29:24
14-sep-2010 15:00:34
14-sep-2010 15:59:15
14-sep-2010 17:00:21
14-sep-2010 17:59:53
14-sep-2010 19:01:09
14-sep-2010 20:00:43
14-sep-2010 20:59:58
14-sep-2010 21:59:07
14-sep-2010 22:58:44
22 rows selected.
AVERAGE
---------5.66541413
5.77505516
5.52065565
5.71057718
5.5809264
5.87360942
5.1428971
4.97469016
45.8761183
7.83963704
5.43147809
5.29527252
5.86617054
4.04194349
5.12650636
5.61648183
5.39218885
4.96873127
5.38695014
4.93064773
5.85997655
4.61709545
HG
-------------------------------**
**
**
**
**
**
*
*
*******************************
***
*
*
**
*
*
**
*
*
*
*
**
*
16
ASCII Art – who needs a GUI…
ops$tkyte%ORA11GR2> select begin_time, average,
2
rpad( '*', 1+(average-min_avg)/spread, '*' ) hg
3
from (
4 select begin_time, average, min(average) over () min_avg,
5
(max(average) over () - min(average) over ())/30 spread
6
from dba_hist_sysmetric_summary
7
where begin_time >= to_date( '14-sep-2010', 'dd-mon-yyyy' )
8
and begin_time < to_date( '14-sep-2010', 'dd-mon-yyyy' )+1
9
and metric_name = 'User Commits Per Sec'
10
)
11
order by begin_time
12 /
BEGIN_TIME
-------------------14-sep-2010 01:00:27
14-sep-2010 01:59:56
14-sep-2010 02:59:27
14-sep-2010 04:00:54
14-sep-2010 05:00:19
14-sep-2010 05:59:49
14-sep-2010 06:59:06
14-sep-2010 07:58:38
14-sep-2010 09:00:00
14-sep-2010 10:00:55
14-sep-2010 10:59:46
14-sep-2010 12:00:52
14-sep-2010 12:29:24
14-sep-2010 15:00:34
14-sep-2010 15:59:15
14-sep-2010 17:00:21
14-sep-2010 17:59:53
14-sep-2010 19:01:09
14-sep-2010 20:00:43
14-sep-2010 20:59:58
14-sep-2010 21:59:07
14-sep-2010 22:58:44
22 rows selected.
AVERAGE
---------5.66541413
5.77505516
5.52065565
5.71057718
5.5809264
5.87360942
5.1428971
4.97469016
45.8761183
7.83963704
5.43147809
5.29527252
5.86617054
4.04194349
5.12650636
5.61648183
5.39218885
4.96873127
5.38695014
4.93064773
5.85997655
4.61709545
HG
-------------------------------**
**
**
**
**
**
*
*
*******************************
***
*
*
**
*
*
**
*
*
*
*
**
*
17
System Statistics (cumulative counters) abound…
• DBA_HIST_SYS_TIME_MODEL
– The current statistic value from V$SYS_TIME_MODEL over time
– V$SYS_TIME_MODEL = point in time
– DBA_HIST_SYS_TIME_MODEL = over time
ops$tkyte%ORA11GR2> select stat_name, round(value/1000000,2) seconds
2
from v$sys_time_model
3
order by stat_name
4 /
STAT_NAME
SECONDS
---------------------------------------------------------------- ---------DB CPU
39.8
DB time
126.37
PL/SQL execution elapsed time
11.13
…
19 rows selected.
18
System Statistics abound…
• We’d like to see some reported statistic (say parse time)
• Over some period of time
• In an associated interval
BEGIN_INTERVAL_TIME
------------------------14-SEP-10 01.01.01.986 AM
14-SEP-10 02.00.47.933 AM
14-SEP-10 03.00.19.167 AM
…
14-SEP-10 08.01.21.975 PM
14-SEP-10 09.00.47.946 PM
14-SEP-10 10.00.45.535 PM
14-SEP-10 11.00.43.823 PM
LAST_TIME
DIFF_SECS
------------------------- ---------14-SEP-10 12.01.11.096 AM
2.57
14-SEP-10 01.01.01.986 AM
2.86
14-SEP-10 02.00.47.933 AM
2
14-SEP-10
14-SEP-10
14-SEP-10
14-SEP-10
07.01.32.238
08.01.21.975
09.00.47.946
10.00.45.535
PM
PM
PM
PM
3.28
63.1
49.35
4.08
23 rows selected.
19
System Statistics abound…
• DBA_HIST_SYS_TIME_MODEL
– We’ll need LAG() over () to make sense of it
BEGIN_INTERVAL_TIME
ops$tkyte%ORA11GR2> selectLAST_TIME
s.begin_interval_time,
DIFF_SECS
------------------------2
lag(s.begin_interval_time)
------------------------over(order by---------s.begin_interval_time) last_time,
14-SEP-10
3
01.01.01.986
round( (value-lag(m.value)
AM 14-SEP-10 12.01.11.096
over (order
AM by s.begin_interval_time))/1000000,
2.57
2) diff_secs
14-SEP-10
4
from
02.00.47.933
dba_hist_sys_time_model
AM 14-SEP-10 01.01.01.986
m , dba_hist_snapshot
AM
2.86
s
14-SEP-10
5
where
03.00.19.167
m.stat_nameAM= 14-SEP-10
'parse time
02.00.47.933
elapsed'
AM
2
… 6
and m.snap_id = s.snap_id
14-SEP-10
7
and
08.01.21.975
s.begin_interval_time
PM 14-SEP-10between
07.01.32.238
trunc(sysdate)-2
PM
3.28
and trunc(sysdate)-1
14-SEP-10
8
order
09.00.47.946
by s.begin_interval_time
PM 14-SEP-10 08.01.21.975 PM
63.1
14-SEP-10
9 /
10.00.45.535 PM 14-SEP-10 09.00.47.946 PM
49.35
14-SEP-10 11.00.43.823 PM 14-SEP-10 10.00.45.535 PM
4.08
23 rows selected.
20
And of course, ASCII Art still rocks…
• DBA_HIST_SYS_TIME_MODEL
ops$tkyte%ORA11GR2> select
begin_interval_time,
diff_secs,
BEGIN_INTERVAL_TIME
DIFF_SECS
HG
----------------------------------------------------------------2
rpad( '*',
1+(diff_secs-min_diff_secs)/spread,
'*' ) hg
… 3
from (
14-SEP-10 05.01.27.818 AM
3.73 **
4 select begin_interval_time,
14-SEP-10 06.01.02.893 AM
2.69 *
5
diff_secs,
14-SEP-10
07.00.37.968
AM
3.62 **
6
min(diff_secs)
over
14-SEP-10
08.00.39.143
AM
1.55
* () min_diff_secs,
14-SEP-10
09.00.20.581
AM
33.32
****************
7
(max(diff_secs)
over
()-min(diff_secs) over ())/30 spread
14-SEP-10
10.01.14.907
AM
11.55 *****
8
from
(
14-SEP-10 11.00.14.259 AM
1.8 *
9 select s.begin_interval_time,
14-SEP-10 12.01.31.688 PM
2.48 *
10
lag(s.begin_interval_time)
over(order by s.begin_interval_time) last_time,
14-SEP-10
02.19.26.383
PM
6.08 ***
11
round( PM
(value-lag(m.value)
over (order by s.begin_interval_time))/1000000, 2) diff_secs
14-SEP-10
03.01.24.127
13.92 *******
14-SEP-10
04.00.24.823
PM
1.94 *
12
from
dba_hist_sys_time_model
m , dba_hist_snapshot s
14-SEP-10
05.01.48.046
PM
*
13
where
m.stat_name
= 2.24
'parse
time elapsed'
14-SEP-10 06.01.53.858 PM
2.05 *
14
and m.snap_id = s.snap_id
14-SEP-10 07.01.32.238 PM
2.71 *
15
and s.begin_interval_time
between trunc(sysdate)-2 and trunc(sysdate)-1
14-SEP-10
08.01.21.975
PM
3.28 *
16
)
14-SEP-10
09.00.47.946
PM
63.1 ******************************
14-SEP-10
10.00.45.535
PM is not
49.35null
************************
17
where
diff_secs
14-SEP-10
11.00.43.823
PM
4.08 **
18
)
19
order by begin_interval_time
22 rows selected.
20 /
21
Key Infrastructure
Components - ASH
22
Goal: application level trace
• AWR – instance level
– Aggregated at too high a level of detail
– Captured over too long a period of time
– Although, it can be captured automatically without knowing there will
be a problem ahead of time
• SQL_TRACE
– 100% of the details
• That is the good news
• That is also the bad news
– Must be enabled manually
– You have to replicate the problem in order to diagnose the problem
23
Active Session History (ASH)
• Samples active sessions every one second into memory
(v$active_session_history) – always there
• Direct access to kernel structures
• One of ten samples flushed to AWR at every snapshot (but which one…)
• Data captured includes:
•
•
•
•
•
•
SID
SQL ID
Program, Module, Action
Wait event#
Object, File, Block
actual wait time (if captured while waiting)
Enables targeted performance analysis
of transient problems
24
Active Session History (ASH)
Query for
Khalid Husaini
Novels
Browse and
Read
Reviews
Add
item to
cart
Checkout
using
‘one-click’
DB Time
Time
SID
Module
SQL ID
State
Event
7:38:26
213
Book by author
qa324jffritcf
WAITING
7:42:35
213
Get review id
aferv5desfzs5
CPU
7:50:59
213
Add to cart
hk32pekfcbdfr
WAITING
buffer busy wait
7:52:33
213
One click
abngldf95f4de
WAITING
log file sync
db file sequential read
25
Similar to SQL_TRACE
• But remember – this is sampled data
• Sql_trace is 100% complete – everything is there
– Hence the somewhat onerous overhead
• In most cases ASH is better than good enough to identify
problems
• And it is always on
– No need to try to replicate a problem, we’ve already recorded it.
– Attempts to keep approximately 1 hour of activity in memory – ready
to go.
26
Gets fuzzier over time
• V$ACTIVE_SESSION_HISTORY – about every second of activity
• DBA_HIST_ACTIVE_SESS_HISTORY – every 10 seconds of activity
– On demand flush
– When ever in memory buffer (V$) is 2/3rds full
– Retained using AWR retention policies
Point in time:
Short term memory:
Long term memory:
V$SESSION
V$ACTIVE_SESSION_HISTORY
DBA_HIST_ACTIVE_SESS_HISTORY
V$SESSION_WAIT
SGA Circular
Every hour or
Buffer – sized
2/3rds full in SGA
By CPU_COUNT
27
What can you see with ASH on the command line
• Let’s look at sessions in the hour
– or minute or whatever
• That were waiting for user initiated IO to complete
– or whatever wait is interesting
• And report the SQL they were executing at that time
– If still available
– If unambiguous
• And show the SQL in order from the most IO waits to the least
– or by wait time or whatever
28
-- Top *anything* SQL really – replace User I/O with any wait class
select ash.sql_id,
(select distinct decode( count(distinct sql_text) over (),
1, substr(sql_text,1,80)||'...',
'more than one sql matched' )
from v$sql
where sql_id = ash.sql_id) sql,
count(*)
from v$active_session_history ash, v$event_name evt
where ash.sample_time > sysdate - 1/24
and ash.session_state = 'WAITING'
and ash.event_id = evt.event_id
and evt.wait_class = 'User I/O'
group by sql_id
order by count(*) desc;
29
What can you see with ASH on the command line
•
•
•
•
•
•
Let’s look at all available observations (no time frame limit)
By SQL_ID
Summing up time on the CPU
And time spent waiting for IO
And time spent waiting for things other than IO
And report the SQL text in descending order by “almost” dbtime
– We’ll report cpu+wait, but we don’t have the “wait on the cpu” in this
case
30
-- SQL by total CPU plus wait time, broken down by CPU, IO wait and non-IO wait
select sql_id,
(select distinct decode( count(distinct sql_text) over (),
1, substr(sql_text,1,80)||'...', 'more than one sql matched' )
from v$sql
where sql_id = ash.sql_id) sql,
cpu, non_io_wait, io_wait
from (
select ash.sql_id,
sum(case when ash.session_state = 'ON CPU' then 1 else 0 end ) cpu,
sum(case when ash.session_state='WAITING' and ash.event#<>ev.user_io then 1 else 0 end) non_io_wait,
sum(case when ash.session_state = 'WAITING' and ash.event# = ev.user_io then 1 else 0 end ) io_wait
from v$active_session_history ash,
(select event# user_io from v$event_name where wait_class = 'User I/O') ev
group by ash.sql_id
) ash
order by cpu+non_io_wait+io_wait desc;
31
What can you see with ASH on the command line
• Let’s look at the last hour
• And for each event
– Find the min/max sample times – the period of times we observed this
event
– Add up the wait time for that event
• And report the largest waited for things first
32
select ash.event,
min(sample_time) start_time,
max(sample_time)-min(sample_time) duration,
round(sum(ash.wait_time+ash.time_waited)/1000000,2) wait_secs
from v$active_session_history ash
where ash.sample_time between
systimestamp-numtodsinterval(1,'hour') and systimestamp
group by ash.event
order by wait_secs desc
33
What can you see with ASH on the command line
• Let’s look at the last hour
• And for each event
– Find the min/max sample times – the period of times we observed this
event
– Add up the wait time for that event
• And report the largest waited for things first
• By session
34
select ash.session_id,
au.username,
round(sum(ash.wait_time+ash.time_waited)/1000000,2) wait_secs
from v$active_session_history ash,
all_users au
where ash.sample_time >= systimestamp-numtodsinterval(1,'hour')
and ash.user_id = au.user_id
group by ash.session_id, au.username
order by wait_secs DESC
35
What can you see with ASH on the command line
• Let’s look at the last hour
• And for each event
– Find the min/max sample times – the period of times we observed this
event
– Add up the wait time for that event
• And report the largest waited for things first
• For a given session, by sql statement used in that session
36
select (select distinct decode( count(distinct sql_text) over (),
1, substr(sql_text,1,80)||'...', 'more than one sql matched' )
from v$sql
where sql_id = ash.sql_id) sql,
round(sum(ash.wait_time+ash.time_waited)/1000000,2) wait_secs
from v$active_session_history ash
where ash.session_id = 1065
group by ash.sql_id
order by wait_secs DESC
37
And so on…
• The possibilities are limitless
• Again, please remember
– Sampling
– More like digital than analog!
– Summing up V$ACTIVE_SESSION_HISTORY will never equal
tkprof numbers, never equal AWR numbers
– But – typically better than good enough
• Infinitely better than nothing – go ahead and try to trace the
application that wasn’t designed to be traced
38
Command Line
Reporting
39
AWR - Prepared scripts
• In $ORACLE_HOME/rdbms/admin
– awrrpt.sql, awrrpti.sql, generate report for pairs of snapshot ids.
awrrpti.sql allows you to pick a database, awrrpt.sql defaults to
the current database
– awrddrpt.sql, awrddrpi.sql, generates a comparison report
between two AWR reports. Can be for single database
(compare prod at 2pm to prod at 4pm) or for two different
databases (compare prod at 2pm to test at 2pm)
– awrextr.sql, awrload.sql, (must be connected as sys) –
datapump exports/imports AWR data so you can move it to
another system for analysis/maintaining a longer history.
40
AWR - Prepared scripts
• In $ORACLE_HOME/rdbms/admin
– awrgrpti.sql, awrgrpt.sql, generate RAC aware reports by
choosing specific instances to include – or to use all available
instances
– awrgdrpt.sql, awrgdrpi.sql, generate RAC aware time period
comparison reports – within a set of database instances or
across databases
– awrinfo.sql, (must be connected as sys) to review general AWR
information. Tablespace usage, Schema/option usage,
Segment usage, snapshot information, retention policies and so
on.
41
AWR - Prepared scripts
• In $ORACLE_HOME/rdbms/admin
– awrsqrpt.sql, awrsqrpi.sql, generates a report for a single SQL
statement over multiple snapshots. Report on plans used,
resources used
• IO’s
• Parses
• Executions
• CPU/Elapsed times
• And more
42
ASH - Prepared scripts
• In $ORACLE_HOME/rdbms/admin
– ashrpt.sql, ashrpti.sql, generates an ASH report … filter by
• Session
• SQL’s
• Wait Classes
• Database Services
• Modules/Actions/Client_ID
• PL/SQL entry point
– Shows major waits, top services/modules, top clients, SQL
activity (parsing, executions, etc), top SQL and more…
43
ASH - Prepared scripts
• In $ORACLE_HOME/rdbms/admin
– ashrpti.sql, generates an ASH report. The “I” version filters by
• Session
• SQL’s
• Wait Classes
• Database Services
• Modules/Actions/Client_ID
• PL/SQL entry point
– Shows major waits, top services/modules, top clients, SQL
activity (parsing, executions, etc), top SQL and more…
44
AWR – dbms_workload_repository package
• Your interface to managing the repository
• The scripts we just reviewed – a user interface on top of
this package (much like the OEM interface is)
• Meaning – you can make your own interface easily
– 14 pipelined functions can create any of the previous reports
45
AWR – dbms_workload_repository package
• Other administrative API’s
– ADD_COLORED_SQL – to trace a given SQL statement in
every snapshot regardless of whether it was considered “top sql”
or not.
– BASELINE API’s, to create and drop baseline information
– SNAPSHOT API’s to drop existing snapshots or modify how
often snapshots are taken/retained
46
In Conclusion
47
48