Explaining the Explain Plan Disclaimer • The goal of this session to provide you with a guide for reading SQL execution plans.

Download Report

Transcript Explaining the Explain Plan Disclaimer • The goal of this session to provide you with a guide for reading SQL execution plans.

1
<Insert Picture Here>
Explaining the Explain Plan
Disclaimer
• The goal of this session to provide you with a guide
for reading SQL execution plans and to help you
determine if that plan is what you should be expecting
• This session will not provide you with sudden
enlightenment making you an Optimizer expert or give
you the power to tune SQL statements with the flick of
your wrist!
Agenda
• What is an execution plan and how to generate one
• What is a good plan for the optimizer
• Understanding execution plans
•
•
•
•
•
•
Cardinality
Access paths
Join order
Join type
Partitioning pruning
Parallelism
• Execution plan examples
<Insert Picture Here>
What is an execution
plan and how to
generate one
What is an Execution plan?
• Execution plans show the detailed steps necessary to
execute a SQL statement
• These steps are expressed as a set of database
operators that consumes and produces rows
• The order of the operators and their implementation is
decided by the optimizer using a combination of query
transformations and physical optimization techniques
• The display is commonly shown in a tabular format,
but a plan is in fact tree-shaped
What is an Execution plan?
Query
SELECT prod_category,
avg(amount_sold)
FROM sales s, products p
WHERE p.prod_id = s.prod_id
GROUP BY prod_category;
Tabular representation of plan
----------------------------------------------------------Id Operation
Name
----------------------------------------------------------0 SELECT STATEMENT
1 HASH GROUP BY
2 HASH JOIN
3 TABLE ACCESS FULL
PRODUCTS
4 PARTITION RANGE ALL
5
TABLE ACCESS FULL SALES
----------------------------------------------------------
Tree-shaped representation of plan
GROUP BY
|
JOIN
_______|_______
|
TABLE ACCESS
PRODUCTS
|
TABLE ACCESS
SALES
How to get an Execution Plan
Two methods for looking at the execution plan
1.EXPLAIN PLAN command
• Displays an execution plan for a SQL statement without actually
executing the statement
2.V$SQL_PLAN
• A dictionary view introduced in Oracle 9i that shows the execution
plan for a SQL statement that has been compiled into a cursor in
the cursor cache
Use DBMS_XPLAN package to display plans
Under certain conditions the plan shown with EXPLAIN PLAN
can be different from the plan shown using V$SQL_PLAN
How to get an Execution Plan
Example 1 EXPLAIN PLAN command & dbms_xplan.display function
SQL> EXPLAIN PLAN FOR
SELECT prod_category, avg(amount_sold)
FROM sales s, products p
WHERE p.prod_id = s.prod_id
GROUP BY prod_category;
Explained
SQL> SELECT plan_table_output
FROM table(dbms_xplan.display('plan_table',null,'basic'));
-----------------------------------------Id Operation
Name
-----------------------------------------0 SELECT STATEMENT
1 HASH GROUP BY
2
HASH JOIN
3
TABLE ACCESS FULL
PRODUCTS
4
PARTITION RANGE ALL
5
TABLE ACCESS FULL
SALES
-------------------------------------------
How to get an Execution Plan
Example 2 Generate & display execution plan for the last SQL stmts
executed in a session
SQL>SELECT prod_category, avg(amount_sold)
FROM sales s, products p
WHERE p.prod_id = s.prod_id
GROUP BY prod_category;
no rows selected
SQL> SELECT plan_table_output
FROM table(dbms_xplan.display_cursor(null,null,'basic'));
-----------------------------------------Id Operation
Name
-----------------------------------------0 SELECT STATEMENT
1 HASH GROUP BY
2
HASH JOIN
3
TABLE ACCESS FULL
PRODUCTS
4
PARTITION RANGE ALL
5
TABLE ACCESS FULL
SALES
-------------------------------------------
How to get an Execution Plan
Example 3 Displaying execution plan for any other statement
from V$SQL_PLAN
1.Directly:
SQL> SELECT plan_table_output FROM
table(dbms_xplan.display_cursor('fnrtqw9c233tt',null,'basic'))
;
2.Indirectly:
SQL> SELECT plan_table_output
FROM v$sql s,
TABLE(dbms_xplan.display_cursor(s.sql_id,s.child_number,
'basic')) t
WHERE s.sql_text like 'select PROD_CATEGORY%';
Note More information on www.optimizermagic.blogspot.com
<Insert Picture Here>
What is a good plan
for the optimizer
What’s a Good Plan for the Optimizer?
The Optimizer has two different goals
• Serial execution: It’s all about cost
• The cheaper, the better
• Parallel execution: it’s all about performance
• The faster, the better
Two fundamental questions:
• What is cost?
• What is performance?
What is Cost?
•
•
•
•
A magically number the optimizer makes up?
Resources required to execute a SQL statement?
Result of complex calculations?
Estimate of how long it will take to execute a statement?
Actual Definition
• Cost represents units of work or resources used
• Optimizer uses CPU & memory usage plus IO as units of work
• Cost is an estimate of the amount of CPU and memory plus the
number of disk I/Os, used in performing an operation
Cost is an internal Oracle measurement
What is performance?
• Getting as many queries completed as possible?
• Getting fastest possible elapsed time using the fewest
resources?
• Getting the best concurrency rate?
Actual Definition
• Performance is fastest possible response time for query
• Goal is to complete the query as quickly as possible
• Optimizer does not focus on resources needed to execute the plan
<Insert Picture Here>
Understanding an
Execution Plan
SQL Execution Plan
When looking at a plan can you determine if the following is
correct?
• Cardinality
• Are the correct number of rows coming out of each object?
• Access paths
• Is the data being accessed in the best way? Scan? Index lookup?
• Join order
• Are tables being joined in the correct order to eliminate as much data as
early as possible?
• Join type
• Are the right join types being used?
• Partitioning pruning
• Did I get partition pruning? Is it eliminating enough data?
• Parallelism
Cardinality
What is it?
• Estimate of number rows that will be returned
• Cardinality for a single value predicate = num_rows total /
num_distinct total
• E.g. 100 rows total, 10 distinct values => cardinality=10 rows
• OR if histogram present num_rows * Density
Why should you care?
• Influences access method and Join Order
• If estimate is off it can have a huge impact on a plan
What causes Cardinality to be wrong?
• Data Skews
• Multiple single column predicates on a table
• A function wrapped where clause predicate
Cardinality or Selectivity
Cardinality the estimated # of rows returned
To determine correct cardinality using a simple SELECT COUNT(*) from each
table applying any WHERE Clause predicates belonging to that table
Data Skew
Cardinality = num_rows / num_distinct
• If there is a data skew the selectivity could be way off
• Create a histogram to correct the selectivity calculation
• Oracle automatically creates a histogram if it suspects a data skew
Be careful
• Histograms have an interesting side effects on statements with binds
• Less relevant for data warehousing
• Prior to 11g stmt with binds had only one plan – based on first literal value
• But presence of a histogram indicate skew unlikely one plan good for all
bind values
• In 11g multiple execution plans allowed for a single statement
Multiple Single Column Predicates
• Optimizer always assumes each additional predicate
increases the selectivity
•Selectivity of predicate 1 * selectivity of predicate 2 …etc
• But real data often shows correlations
•Job title influences salary, car model influences make
• How do you tell the Optimizer about the correlation?
• Extended Optimizer Statistics provides a mechanism to
collect statistics on a group of columns
•Full integration into existing statistics framework
•Automatically maintained with column statistics
•Instantaneous and transparent benefit for any migrated application
A function Wrapped Where Clause
Predicate
SELECT *
FROM customers
WHERE lower(country_id) = 'us';
• Applying a function to a column means the optimizer does not know how it
will effect the cardinality
• Most likely the optimizer will under-estimate the cardinality
• Creating extended statistics for this function allows the optimizer to get the
correct cardinality
exec dbms_stats.gather_table_stats(‘sh’,'customers', method_opt => 'for all columns size skewonly for columns(lower(country_id))');
Access Paths
How to get data out of the table
• The access path can be:
•
•
•
•
•
•
•
•
•
Full table scan
Table access by Rowid
Index unique scan
Index range scan (descending)
Index skip scan
Full index scan
Fast full index scan
Index joins
Bitmap indexes
Access Path
Look in Operation session to see
how obj is being accessed
If you know the wrong access method is being used check cardinality, join
order…
Access Path examples
A table countries contains 10K rows & has a primary key on
country_id – What plan would you expect for these queries?
Select country_id, name from countries
where country_id in ('AU','FR','IE‘);
Select country_id, name from countries
where country_id between 'AU' and 'IE';
Select country_id, name from countries where name='USA';
Join Type
• A Join retrieve data from more than one table
• Possible join types are
•
•
•
•
•
•
Nested Loops joins
Hash Joins
Partition Wise Joins
Sort Merge joins
Cartesian Joins
Outer Joins
Join Type Example 1
What Join type should be use for this Query?
SELECT e.name, e.salary, d.dept_name
FROM hr.employees e, hr.departments d
WHERE d.dept_name IN ('Marketing‘,'Sales')
AND e.department_id=d.department_id;
Employees has 107 rows
Departments has 27 rows
Foreign key relationship between Employees and Departments on dept_id
Join Type Example 2
What Join type should be use for this Query?
SELECT o.customer_id, l.unit_price * l.quantity
FROM oe.orders o ,oe.order_items l
WHERE l.order_id = o.order_id;
Orders has 105 rows
Order Items has 665 rows
Join Type Example 3
What Join type should be use for this Query?
SELECT o.order_id,0.order_date,e.name
FROM oe.orders o , hr.employees e;
Orders has 105 rows
Employees has 107 rows
Join Type Example 4
What Join type should be use for this Query?
SELECT d.department_id,e.emp_id
FROM hr.employees e FULL OUTER JOIN hr.departments d
ON e.department_id = d.department_id;
Employees has 107 rows
Departments has 27 rows
Foreign key relationship between Employees and Departments on dept_id
Join Type
Look in the Operation section to
check the right join type is used
If the wrong join type is used go back and check the stmt is written correctly
and the cardinality estimates are accurate
Join Orders
The order in which the tables are join in a multi table stmt
• Ideally start with the table that will eliminate the most rows
• Strongly effected by the access paths available
Some basic rules
• Joins that definitely results in at most one row always go first
• When outer joins are used the table with the outer join
operator must come after the other table in the predicate
• If view merging is not possible all tables in the view will be
joined before joining to the tables outside the view
Join order
1
2
3
Want to start with the table that
reduce the result set the most
If the join order is not correct, check the statistics, cardinality & access methods
Partition Pruning
Q: What was the total
sales for the weekend of
May 20 - 22 2008?
Sales Table
May 18th 2008
May 19th 2008
May 20th 2008
Select sum(sales_amount)
From SALES
May 21st 2008
Where sales_date between
to_date(‘05/20/2008’,’MM/DD/YYYY’)
May 22nd 2008
And
to_date(‘05/23/2008’,’MM/DD/YYYY’);
Only the 3
relevant
partitions are
accessed
May 23rd 2008
May 24th 2008
Partition Pruning
Pstart and Pstop list the partition
touched by the query
If you see the word ‘KEY’ listed it means the partitions touched will be decided
at Run Time
Bloom Filter
2. Bloom Filter create:
Consumer set creates a
hash table and a BIT
VECTOR. Bit vector
sets a bit for each row
that matches the search
conditions
DFO
Hash Join
Filter Create
1. Table scan:
Time table is
scanned and sent
Receive
7. Hash Join:
Consumers will complete the hash
join by probing into the hash table
from the time time to find actual
matching rows
Receive
4. Bloom Filter send:
BIT VECTOR is sent as
an additional filter
criteria to the scan of the
sales table
Set
DFO
6. Reduced row sent:
Only rows that have a
match in the bit vector
get sent to the
consumers
DFO
Send
Shared Bloom filter
Send
Test
Scan Time
5. Bloom Filter apply:
Join column is hashed
and compared to BIT
VECTOR
Filter Use
Scan Sales
3. Table Scan:
Sales table is scan
and rows are filtered
based on query
predicates
Parallelism
Goal is to execute all aspects of the plan in parallel
• Identify if one or more sets of parallel server processes are used
• Producers and Consumers
• Identify if any part of the plan is running serial
Parallelism
IN-OUT column shows which
step is run in parallel and if it is a
single parallel server set or not
If you see any lines beginning with the letter S you are running Serial check
DOP for each table & index used
Identifying Granules of Parallelism
during scans in the plan
• Data is Partitioned into Granules either
• block range
• Partition
• Each parallel server is allocated one or more granules
• The granule method is specified on line above the
scan in the operation section
Identifying Granules of Parallelism
during scans in the plan
Access Paths and how they are
parallelized
Access Paths
Parallelization method
Full table scan
Block Iterator
Table accessed by Rowid
Partition
Index unique scan
Partition
Index range scan (descending)
Partition
Index skip scan
Partition
Full index scan
Partition
Fast full index scan
Block Iterator
Bitmap indexes (in Star Transformation)
Block Iterator
Parallel Distribution
• Necessary when producers & consumers sets are used
• Producers must pass or distribute their data into
consumers
• Operator into which the rows flow decides the distribution
• Distribution can be local or across other nodes in RAC
• Five common types of redistribution
Parallel Distribution
• HASH
• Assumes one of the tables is hash partitioned
• Hash function applied to value of the join column
• Distribute to the consumer working on the corresponding hash partition
• Broadcast
• The size of one of the result sets is small
• Sends a copy of the data to all consumers
• Range
• Typically used for parallel sort operations
• Individual parallel servers work on data ranges
• QC doesn’t have to sort just present the parallel server results in the correct order
• Partitioning Key Distribution – PART (KEY)
• Assumes that the target table is partitioned
• Partitions of the target tables are mapped to the parallel servers
• Producers will map each scanned row to a consumer based on the partitioning column
• Round Robin
• Randomly but evenly distributes the data among the consumers
Parallel Distribution
Shows how the PQ servers
distribute rows between
each other
<Insert Picture Here>
Example of reading a
plan
Example SQL Statement and Block
Diagram
SELECT '(' || pcode || ')' || pcode_desc
AS PRODUCT, CNT
FROM (SELECT a.pcode, b.pcode_desc, count(a.pcode) CNT
FROM
BMG.t_acct_master_hd
a
,BMG.hogan_pcode_hd_ref b
,BMG.t_tran_detail_hd c
WHERE a.pcode = b.pcode
AND
a.acct_num=c.acct_num
AND
a.co_id=c.co_id
AND
c.asof_yyyymm=200102
AND
c.tran_amt <2000000000
GROUP BY a.pcode , b.pcode_desc
ORDER BY a.pcode , b.pcode_desc )
HOGAN_PCODE_HD_REF
PCODE
T_TRAN_DETAIL_HD
ACCT_NUM
CO_ID
T_ACCT_MASTER_HD
Multiple Terabytes
1 Gigabyte in size
Example Cont’d Execution plan
1. Check the rows returned is
approx correct
2. Are the cardinality
estimates correct?
3.Are the access method correct?
Means no stats gathered
strong indicate this won’t
be best possible plan
Example Cont’d Execution plan
1
2
3
4. Has partition
pruning happen?
5. Are the correct join methods used?
6. Is the join order correct? Is the table that
eliminates the most rows accessed first?
Example Cont’d Execution plan
7. Check all aspects of the plan are executing in
parallel
8. Check the distribution method and make sure we
are not broadcasting a large table?
Example Cont’d Execution plan - Solution
1. Only 1 row is actually returned
and the cost is 4 X lower now
2
1
3
2. Cardinalities
are correct and
with each join
number of rows
reduced
6. The join order has changed - PWJ
them join hash to look-up table
5. Join types are still
hash joins but now a
PWJ
3. Access methods
remains the same
4. Partition
pruning One
range
partition 4
hash
partitions
7. All aspects
of the plan are
executing in
parallel
8. Row
distribution
is now all
hash
Determining if you get the right plan
Query
SELECT quantity_sold
FROM sales s, customers c
WHERE s.cust_id =c.cust_id;ID
What do you expect the plan to look like for this statement?S NOT NULL)
Explanation
Join to customers is redundant as no columns are selected
Presence of primary –foreign key relationship means we can remove table
Q&A