Presentation Guidelines
Download
Report
Transcript Presentation Guidelines
CSI ORACLE – INVESTIGATE PAST
PERFORMANCE PROBLEMS USING ASH, AWR,
ADDM, AND OPTIMIZER STATISTICS
Senior DBA and Supervisor of DBA Team at Green
Shield Canada
15 Years DBA experience
Published articles in ODTUG Technical Journal,
IOUG Best Practices Booklet
Presented at Collaborate and SEMOP
Common
Technology
Military
Government
Business
Science
Organization
Medical
Education
Total
22
20
15
2
4
*CRIME SCENE INVESTIGATION
23
2
31
4
123
Here is the story….
You are away and something goes wrong at work
You come back, tanned, relaxed, happy
Your manager is not so tanned, relaxed, happy
He is upset and frustrated
He tells you what went wrong
He wants you to tell him WHY??
That was 5 days ago and this is a live, active,
database
Welcome Baaaackkk!
Recently upgraded from 9i to 10g
Automatic Stats Gathering has been turned
off
DBMS_STATS runs daily and weekly
Tables are checked for Stale Statistics
Gather stats are run for most Stales
Date: Monday, July 19, 2007
Case #10203
Subscriber Coverage Lookup Web Page has
been killed
Cause is still under investigation
Some clues have been gathered but suspect
is unknown at this time
Clues that have been gathered:
◦ Problem appears with multiple users
◦ Multiple users accessing page spikes CPU
◦ Web Page does not have any recent changes
◦ Page times out during data retrieval from
database
◦ Other pages are not affected
◦ Approximate time of death is known
◦ Page is not timing out today!
These are the tools that you can use to
solve this crime:
Database Statistics
Active Session History (ASH)
Automatic Workload Respository (AWR)
Automatic Database Diagnostics Monitoring
(ADDM)
◦ Optimizer Statistics
◦
◦
◦
◦
Analysis, interpretation, explanation of load
and resources
Internally and Externally
Gathered at system, session, statement
levels
Clues to the crime
Most important are wait events, time model,
ASH, system and session
Wait events are clues or symptoms of
problems
Time Model Statistics show time spent in
database calls
A large number of System and Session
statistics are available too
ASH is what is recorded by the security
camera mounted in the database
The data captured by the ‘camera’ includes:
◦ SQL identifier of SQL statement
◦ Object number, file number, and block
number
◦ Wait event identifier and parameters
◦ Session identifier and session serial number
◦ Module and action name
◦ Client identifier of the session
◦ Service hash identifier
Gathers statistics every hour by default
Collected from memory, processed, and
stored
Turned on my default
Controlled by the statistics_level parameter
Parameter is set to typical by default
Object statistics
Time model statistics
Some of the system and session statistics
Highest load SQL statements as determined
by elapsed time and CPU time
Active Session History (ASH) statistics
Data is stored in SYSAUX tablespace
Space used determined by # of active
sessions, snapshot interval, retention period
These values can be modified:
EXECUTE
DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_
SETTINGS(
interval => 30,
retention => 20160);
Adjust the retention period if you need to
Make sure you have sufficient data to solve
your crimes!
Captured data viewed in reports
Between two points in time
Created in Oracle Enterprise Manager (OEM)
or
Created by running Oracle provided scripts
if you have the DBA role
Reports can be summary or detailed
Display general statistics, statistics for a
SQL statement, or performance and
configuration statistics
Can be HTML or Text
Example:
◦ @$ORACLE_HOME/rdbms/admin/awrrpt.sql
Analyzes data in the AWR
Searches for Performance Problems
Suggests solutions
Does this for every AWR snapshot
Stores results in the SYSAUX tablespace
Should be the first thing you look at
Report can be generated in OEM
Report can be created with addmrpt.sql
Need to know approximate time of crime
This determines begin and end snapshot
number
ADDM analyzes performance between the
two snapshot ids
Table statistics
◦ Number of rows
◦ Number of blocks
◦ Average row length
Column statistics
◦ Number of distinct values (NDV) in column
◦ Number of nulls in column
◦ Data distribution (histogram)
Index statistics
◦ Number of leaf blocks
◦ Levels
◦ Clustering factor
System statistics
◦ I/O performance and utilization
◦ CPU performance and utilization
By default statistics gathered automatically
Gather_stats_job
◦ Missing stats
◦ Stale stats
◦ Check the status of the job using:
select enabled from dba_scheduler_jobs
where job_name='GATHER_STATS_JOB';
Can be disabled and run manually
BEGIN
DBMS_SCHEDULER.DISABLE('GATHER_STATS_JOB');
END;
/
Older versions are saved when new stats are
gathered
Previous versions can be restored
Use dbms_stats.restore_table_stats
Query DBA_OPTSTAT_OPERATIONS to find
timestamp of previous stats
select operation,start_time from
DBA_OPTSTAT_OPERATIONS
order by start_time;
OPERATION
START_TIME
---------------------------------------- ----------------------------------gather_schema_stats
09-JUL-07 08.19.17.782334 AM -04:00
gather_schema_stats
09-JUL-07 08.42.39.300132 AM -04:00
gather_schema_stats
09-JUL-07 09.04.27.501769 AM -04:00
exec dbms_stats.restore_table_stats
('PROD',’ADDRESS’, '20-JUL-0707.08.43.917606 AM
-04:00');
Restores statistics from July 20, 2007 7:08
am for the ADDRESS table in the PROD
schema
Back to the crime scene…
So, what went wrong and caused the poor
performance of the web page?
Now we have the tools to solve the crime.
Step 1 – Review emails and vmails
◦ Don’t redo the work others have done
◦ Emails outline estimated time of crime,
suspected criminal, affected citizens
◦ Clues such as CPU usage, other activity at
the crime scene are noted
Step 2 – Review the Reports
◦ Produce the ADDM Report for the estimated
time of the crime
◦ SQL statement using significant DB Time
coming from victim Web Page
◦ Average CPU too high for a Web Page
◦ Review the AWR snapshots
◦ Same SQL statement is Top SQL in CPU,
Elapsed time and Gets
This must be the
criminal!!!
Step 3 – Look for a Motive
◦ Now we have a suspect (SQL Statement)
◦ And a weapon (over-consumption CPU)
◦ Need more data to figure out WHY?
Step 3 – Look for a Motive
◦ Checked out the ASH data
Used dates and times related to the crime
top SQL, wait events, top objects
◦ Review the Optimizer statistics
Created an explain plan
Looked for things that didn’t make sense
Looked at the number of rows being returned
We Know Our Data
Ah-ha! The Explain Plan looks
wrong!
Step 4 – Re-enactment of the Crime
◦ Just to be absolutely sure
◦ Create a mock crime scene
A copy-managed test database is perfect
Restored optmizer statistics from before the
crime
Re-run the SQL statement with old and new
statistics
Checked out the ASH, ADDM, AWR reports in
test
◦ Sure enough the old stats brought the
performance back
Now we have our
proof!
Now that you have charged the criminal,
what are you going to do about?
◦ Match the punishment and/or rehabilitation
to the crime
◦ Restoring the Optimizer Statistics will stop
the bleeding
◦ Decide what you can to to prevent the crime
in the future
Document the crime
Provide the background, steps leading to
the crime
Keep a record of all of the clues
Explain how the information lead to finding
the criminal
Outline the sentencing, punishment,
rehabilitation
Specify plans for preventing the crime in the
future
Communicate the case to others
Celebrate success!
Things happen and change every second in
an Oracle database
You cannot always be there
Make sure your security cameras are
working
Ensure the camera does not run out of
‘tape’
Have process in place to gather clues
Document the case when the crime is solved
Don’t have Oracle Performance Packs? Look
at this: http://www.perfvision.com/ash.php
Simulated ASH
Query the v$ tables
Now You Should Know:
What tools are available to investigate past
database performance issues.
What to look for in the ASH, AWR, ADDM and
optimizer data.
How to document your findings and prepare for
future investigations.