Optimizing Oracle RAC

Download Report

Transcript Optimizing Oracle RAC

Systematic Oracle performance
tuning
Guy Harrison,
Chief Architect, Database Solutions
March 2007
Copyright © 2006 Quest Software
Agenda
• A brief history of Oracle tuning
• Limitations of common approaches
• Systematically tuning by levels
–
–
–
–
Application demand
Database contention
Reducing physical IO
Tuning physical IO
The story so far ….
•
•
•
•
First, the earth cooled …
Then, the dinosaurs came
Then, Oracle 5 was released
The Rules of Thumb:
–
–
–
–
Buffer cache hit rate > 90%
Every segment in a single extent
Index everything
Trial and error
• Limited Instrumentation
– Tkprof (6.0)
– V$sysstat
• No WWW
The performance tuning renaissance
• Oracle 7-8 1992+
• An empirical, profiling based approach
• Many champions, but notably:
– Anjo Kolk
– Carey Millsap
• An Oracle performance community
emerges (Oak Table, etc)
• Yet Another Performance Profiling
Methodology (YAPP):
ResponseTime=ServiceTime+WaitTime
YAPP and Wait interface based tuning
• Based primarily on existence of the “wait
interface” and related Oracle instrumentation
• When Oracle sessions wait for a resource (IO,
lock, latch, buffer, etc) they record wait
durations:
– V$SYSTEM_EVENT, $SESSION_EVENT,
$SESSION_WAIT
– SQL trace files created with the ‘10046 event’
• Largest wait categories represent greatest
opportunities for optimization
• Average wait times can reveal contention
points
The path to enlightenment
• Problems to overcome:
– Many waits were/are mysterious
• Exactly what does “PMON to cleanup pseudo-branches at svc stop time”
mean?
– Message had yet to reach all in the community
• Many still busy tweaking “hit ratios”
– Service time – especially CPU time – was hard to accurately measure
• Oracle’s internal CPU counters notoriously inaccurate
• No breakdown within CPU time
• Tuning less effective for CPU-bound systems
– Response time included components outside of Oracle: especially as clientserver gave way to 3/N tier systems
• No guarantee that tuning would result in end user improvement
• In 10g, most of the technical issues have been resolved:
–
–
–
–
Documentation and external resources (eg, the WWW)
Time model
Ms timings
Extended V$SQL timings
WBT pitfalls : an example
Single block (e.g.
indexed) I/Os represent
83% of DB elapsed time.
Average time for an IO is
about 10x expected.
Therefore it would be
reasonable to assume
insufficient IO bandwidth
(e.g., distinct disk drives)
to support the workload.
But before you go upgrading that Disk array….
• A single missing index
can cause huge
increases in logical IO
demand.
• This magnifies disk IO
demand, internal
contention (latches,
etc) and CPU
utilization.
• Furthermore, IO
demands from a
missing index can
increase faster than
hardware upgrades
can be applied
Making a distinction between symptoms and causes
• When faced with an obviously IO-bound database, it’s
tempting to deal with the IO subsystem immediately.
• However, as often as not, IO problems are symptoms of
problems arising at other levels.
– Typically SQL tuning or schema design
• Likewise, eliminating a contention point might actually
increase physical IO demand
– Lock waits can reduce the demand on the IO subsystem
• Tuning SQL might increase transaction rates
– Which then results in latch contention
• You are best advised to ignore symptoms in the lower levels
until you have optimized the performance of high levels.
Interactions between layers
Application demand
Concurrency Mgt
SQL/
PLSQL
Rows/
return code
Memory: SGA
and PGA
IO Subsystem
Data/
Return code
Data Blocks
Parse SQL, check
security, acquire
locks, latches,
buffer space, etc
Physical IO
Disk reads/
writes
Block read/
write
Oracle Software
Application
Application DB
access code
(parse, execute,
single fetch,
array fetch)
Logical IO
Cache data blocks
for re-use; sort
data as required,
create hash tables
for hash join
Read/write data
from disk devices
What it means (key slide)
•
•
Problems in one layer can be caused or cured by
configuration in the higher layer.
The logical steps in Oracle tuning are therefore:
1. Reduce application demand to it’s logical minimum by tuning
SQL, optimizing physical design (partitioning, indexing) and
tuning PL/SQL
2. Maximize concurrency by minimizing contention for locks,
latches, buffers and so on in the Oracle code layer
3. Having normalized logical IO demand by the preceding steps,
minimize the resulting physical IO by optimizing Oracle memory
4. Now that the physical IO demand is realistic, configure the IO
subsystem to meet that demand by providing adequate
bandwidth and evenly distributing the resulting load
Examining the time model
• High end tuning
tools offer big
returns on
investment, but
basic Oracle
instrumentation
serves well for
most stages.
• A lot of insight
can be gained by
looking at the
time model
combined with
the wait interface
Why it doesn’t all add up….
• Items in the time model are “nested”: some categories
incorporate times from other categories
• Wait time contributes to DB time and background elapsed time
• Does parse time include CPU time, etc?
1) background elapsed time
2) background cpu time
2) background wait time
1) DB time
2) DB CPU
2) User wait time
2) connection management call elapsed time
2) sequence load elapsed time
2) sql execute elapsed time
2) parse time elapsed
3) hard parse elapsed time
4) hard parse (sharing criteria) elapsed time
5) hard parse (bind mismatch) elapsed time
3) failed parse elapsed time
4) failed parse (out of shared memory) elapsed time
2) PL/SQL execution elapsed time
2) inbound PL/SQL rpc elapsed time
2) PL/SQL compilation elapsed time
2) Java execution elapsed time
Reduce the application demand
• If you can, tune the application code
– “The best optimized SQL is the SQL you didn’t do” (Anjo)
• Reduce the amount of logical IO generated by SQL
statements
–
–
–
–
This in turn reduces CPU and IO but which fluctuate less predictably.
Rewrite SQL or use stored outlines/profiles
Index wisely
Consider physical design changes
• Partitioning
• Denormalization
• Don’t forget other application demand factors
– Parse CPU consumption (also can cause latch contention)
– PL/SQL execution time
Searching for a specific plan steps
Table and index scans
might be amenable
to better indexing
We could also look for
row counts relative
to the object being
scanned
Procedurally, we can
even look for
nested table scans
and other “antipatterns”; even
before they cause
problems
PL/SQL tuning workflow
Contention – the proverbial bottleneck
Application
Demand for DB
services
Contention for limited or
serialized resources causes
waits and or queuing
Apparent
demand at lower
layers is reduced
Types of contention
• Locks
– Mostly application, but occasionally system (ST in legacy tablespaces)
• Latches
– Often side effect of excessive application demand
– Lack of bind variables
– But sometimes the final constraint on DB throughput
• Buffers
– Buffer cache, redo buffer, etc
– Hot blocks (buffer busy)
– Slow “lazy” writers processes (DBWR, LGWR, RVWR)
• Sequences
• Redo/Archive logs
– Contention between the LGWR and the DBWR or ARCH
• Shared servers/PQO servers
• Global Cache (RAC) contention
– Most significant for “hot block” types of contention (cbc latch, buffer busy)
Locks
Latch contention & spin count
120
100
%
80
60
40
20
0
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
Spin Count
cpu utilization
Time waited on latch
Execution Rate
Cpu best fit
Execution rate best fit
latch sleep time best fit
20000
IO is increasingly expensive
%change
10,000
1,000
100
10
400
3200
2000
12.5
20
1
IO rate
Disk
Capacity
IO/GB
CPU
IO/CPU
.
Reducing physical IO
Oracle Session
Buffer pools
Program Global
Area (PGA)
Sort area
Hash Area
Data (USER)
tablespace
Temporary
tablespace
• Memory is used to
cache data and to
perform sorting and
hashing
(ORDER/GROUP BY,
joins).
• Oracle 10g manages
allocations within
PGA/SGA well enough
• Determining the trade
offs between these
areas is the most
important IO reduction
method
A good buffer cache hit ratio doesn’t help much
70,000,000
60,000,000
40,000,000
Other
Direct I/O
User I/O
DB CPU
30,000,000
20,000,000
10,000,000
,0
00
25
,5
00
,0
00
20
,5
00
,0
00
,0
00
17
15
12
50
0
Sort Area Size (KB)
10
7,
00
0
5,
00
0
3,
75
0
2,
50
0
2,
25
0
2,
0
0
00
0
1,
75
50
0
25
0
0
10
Elapsed time (microsec)
50,000,000
• A FTS can only
generate so much
IO
• A disk sort can
generate many
times the IO
required to read
the data from disk
• Some of this sort
IO appears to be
hidden from the
wait interface
Advisories
• Advisories exist that provide estimates of the workload impact if a
memory area were a different size
– In essence, Oracle is maintaining LRU chains that are longer than actual
allocations
• Key advisories:
– V_$PGA_TARGET_ADVICE
– V_$DB_CACHE_ADVICE
– V_$SHARED_POOL_ADVICE
– V_$SGA_TARGET_ADVICE (10g)
• Unfortunately, Oracle suffered from “split-brain” during
implementation:
– PGA reports in bytes RW
– DB cache reports in IOs saved
– Shared pool reports in parse time
• When comparing PGA and SGA advisories, you need to convert
them to common units (wait time)
IO management
• The demand for physical IO should be about right now
– You’ve reduced the application logical IO demand
– You’ve eliminated contention that might mask that demand
– You’ve minimized the amount of logical IO than turns into physical IO
by effective memory management
• Now optimize the IO subsystem for the amount of physical
IO you observe
– Buy enough disks to meet IO demand and storage demand
• Remember also that sparse disks are more efficient
– Oracle’s SAME (Stripe and Mirror Everything) is a good default
approach
– Separating log/FRA and datafile IO is arguably the only split up you
should have to make
• Dedicated disks for sequential redo IO OR
• Separate wide, fine-grained stripe for redo and FRA
– ASM makes this very easy, but be careful: intolerant of poor
underlying configurations
ASM makes it very easy
• Only need to provision for total storage for all files and
total IO demand
• Balancing within the diskgroup is (partially) automatic
• Can implement redundancy within the group
• DBA gets more control – Sys admin involvement is
minimal
• BUT:
– Still a version 1.0 technology
– Simplistic algorithm does not work well if underlying disks are of
different sizes/characteristics
– Hard to map segments to spindles, especially if implemented on top
of hardware storage appliance
– Auto-rebalance has not been observed
– ASM instance is a vulnerability
ASM disk groups
Q&A