SQL tuning and cardinality

Download Report

Transcript SQL tuning and cardinality

You’re Smarter than a
Database
Overcoming the optimizer’s bad
cardinality estimates
About me




Bobby Durrett
US Foodservice
[email protected]
Scripts in
http://www.geocities.com/bobbyandma
rielle/sqltuning.zip
What you know
What the database knows
Before SQL

Example - mainframe Datacom/DB COBOL


List index names
Write loops
read a from one index i1 where one.c=10
while more table one rows exist get next row
read b from two index i2 where two.a = one.a
while more table two rows exist get next row
print one.a,two.b
end while
end while
SQL

Tell what you want, not how to get it
select one.a,two.b
from
one,two
where
one.c=10 and one.a=two.a;
Pre-SQL versus SQL




Pre-SQL code very efficient – runs in
megabytes – VSE mainframe COBOL
Labor intensive
SQL can be inefficient – runs in
gigabytes (if you are lucky!)
Much more productive – do in minutes
what took hours before – create tables
What the database doesn’t
know



Optimizer has a limited set of statistics
that describe the data
It can miscalculate the number of rows
a query will return, its cardinality
A cardinality error can lead optimizer to
choose a slow way to run the SQL
Example plan/Cardinality
------------------------------------------------| Id | Operation
| Name | Rows | Cost
------------------------------------------------|
0 | SELECT STATEMENT |
|
10 |
3
|* 1 | TABLE ACCESS FULL| TEST1 |
10 |
3
------------------------------------------------


Plan = how Oracle will run your query
Rows = how many rows optimizer thinks that
step will return
Cost = estimate of time query will take, a
function of the number of rows
How to fix cardinality
problems


Find out if it really is a cardinality issue
Determine the reason it occurred



Single column
Multiple columns
Choose a strategy



Give the optimizer more information
Override optimizer decision
Change the application
Four examples


Four examples of how the optimizer
calculates cardinality
Full scripts and their outputs on portal,
pieces on slides – edited for simplicity
Step 1: Find out if it really is a
cardinality issue

Example 1

Data
select a,count(*) from test1 group by a;
A
COUNT(*)
---------- ---------1
10
 Query
select * from test1 where a=1;
Step 1: Find out if it really is a
cardinality issue

Get estimated cardinality from plan
------------------------------------------| Id | Operation
| Name | Rows |
------------------------------------------|
0 | SELECT STATEMENT |
|
10 |
|* 1 | TABLE ACCESS FULL| TEST1 |
10 |
------------------------------------------
Do query for actual number of rows
select count(*) from test1 where a=1;
Step 1: Find out if it really is a
cardinality issue

Plan is a tree – find cardinality and select
count(*) on part of query represented by that
part of plan.
join
table
join
table
table
Step 2: Understand the reason
for the wrong cardinality

Unequal distribution of data:

Within a single column



Last name
“Smith” or “Jones”
Among multiple columns –


Address
Zipcode and State
Step 2: Understand the reason
for the wrong cardinality

Example 2 - Unequal distribution of values in a
single column


1,000,000 rows with value 1
1 row with value 2
select a,count(*) from TEST2 group by a;
A
COUNT(*)
---------- ---------1
1000000
2
1
Step 2: Understand the reason
for the wrong cardinality

SQL statement – returns one row
select * from TEST2 where a=2;
Step 2: Understand the reason
for the wrong cardinality

Plan with wrong number of rows = 500,000

Full scan instead of range scan – 100 times slower
--------------------------------------------| Operation
| Name
| Rows |
--------------------------------------------| SELECT STATEMENT
|
|
500K|
| INDEX FAST FULL SCAN| TEST2INDEX |
500K|
---------------------------------------------
Step 2: Understand the reason
for the wrong cardinality

Column statistics – two distinct values
LOW
HIGH NUM_DISTINCT
---------- ---------- -----------1
2
2

Table statistic – total # of rows – 1,000,001
NUM_ROWS
---------1000001
Step 2: Understand the reason
for the wrong cardinality



Rows in plan = (rows in table)/
(distinct values of column)
500000=1000001/2
Optimizer knew that there were only
two values – assumed they had equal
number of rows
Step 2: Understand the reason
for the wrong cardinality

Example 3 - Combinations of column values
not equally distributed




1,000,000 rows with values 1,1
1,000,000 rows with values 2,2
1 row with value 1,2
~ Equal numbers of 1’s and 2’s in each column
A
B
COUNT(*)
---------- ---------- ---------1
1
1000000
1
2
1
2
2
1000000
Step 2: Understand the reason
for the wrong cardinality

SQL statement – retrieves one row
select sum(a+b)
from TEST3
where
a=1 and b=2;
Step 2: Understand the reason
for the wrong cardinality


Plan with wrong number of rows = 500,000
Inefficient full scan
---------------------------------------------| Operation
| Name
| Rows |
---------------------------------------------| SELECT STATEMENT
|
|
1 |
| SORT AGGREGATE
|
|
1 |
|
INDEX FAST FULL SCAN| TEST3INDEX |
500K|
----------------------------------------------
Step 2: Understand the reason
for the wrong cardinality

Column statistics
C
LOW
HIGH NUM_DISTINCT
- ---------- ---------- -----------A
1
2
2
B
1
2
2

Table statistic – total # of rows – 2,000,001
NUM_ROWS
---------2000001
Step 2: Understand the reason
for the wrong cardinality



Rows in plan = (rows in table)/
(distinct values A * distinct values B)
500000=2000001/(2 * 2)
Optimizer assumes all four
combinations (1,1),(1,2),(2,1),(2,2)
equally likely
Step 2: Understand the reason
for the wrong cardinality

How to tell which assumption is in play?

Select count(*) each column
select a,count(*) from TEST3 group by a;
select b,count(*) from TEST3 group by b;

count(*) each column combination
select a,b,count(*) from TEST3
group by a,b;
Step 3: Choose the best strategy
for fixing the cardinality problem

Giving the optimizer more information



Overriding optimizer decisions


Histograms
SQL Profiles
Hints
Changing the application
Try to use optimizer as much as possible to
minimize development work
Step 3: Choose the best strategy
for fixing the cardinality problem

Giving the optimizer more information –
using histograms



Works for unequal distribution within a
single column
A histogram records the distribution of
values within a column in up to 254
“buckets”
Works best on columns with fewer than
255 distinct values
Step 3: Choose the best strategy
for fixing the cardinality problem

Run gather_table_stats command to get
histograms on the column – 254 is max
number of buckets
method_opt=>'FOR ALL COLUMNS SIZE 254'
Step 3: Choose the best strategy
for fixing the cardinality problem

Plan for Example 2 with correct number of rows
with histogram
Uses range scan
----------------------------------------| Operation
| Name
| Rows |
----------------------------------------| SELECT STATEMENT |
|
1 |
| INDEX RANGE SCAN| TEST2INDEX |
1 |
----------------------------------------
Step 3: Choose the best strategy
for fixing the cardinality problem

Column statistics – two buckets
LOW
HIGH NUM_DISTINCT NUM_BUCKETS
---------- ---------- ------------ ----------1
2
2
2

Table statistic – unchanged
NUM_ROWS
---------1000001
Step 3: Choose the best strategy
for fixing the cardinality problem

Time without histograms (1 second):
Elapsed: 00:00:01.00

Time with histograms(1/100th second):
Elapsed: 00:00:00.01
Step 3: Choose the best strategy
for fixing the cardinality problem

Giving the optimizer more information – using
SQL Profiles


Works for unequal distribution among multiple
columns
Includes information about the relationship
between columns in the SQL – correlated columns
or predicates
Step 3: Choose the best strategy
for fixing the cardinality problem

SQL Tuning Advisor gathers statistics on the
columns
...DBMS_SQLTUNE.CREATE_TUNING_TASK(...
...DBMS_SQLTUNE.EXECUTE_TUNING_TASK(...

Accept the SQL Profile it creates to use the
new statistics
...DBMS_SQLTUNE.ACCEPT_SQL_PROFILE (...
Step 3: Choose the best strategy
for fixing the cardinality problem

Example 3 plan with correct number of rows = 1
using SQL profile
-------------------------------------------------| Operation
| Name
| Rows | Bytes |
-------------------------------------------------| SELECT STATEMENT |
|
1 |
6 |
| SORT AGGREGATE
|
|
1 |
6 |
|
INDEX RANGE SCAN| TEST3INDEX |
1 |
6 |
-------------------------------------------------|
Step 3: Choose the best strategy
for fixing the cardinality problem

Time without a profile (1 second):
Elapsed: 00:00:01.09

Time with a profile(1/100th second):
Elapsed: 00:00:00.01
Step 3: Choose the best strategy
for fixing the cardinality problem

Overriding optimizer decisions – using hints



Example 4 has unequal distribution of column
values across two tables – histograms and SQL
Profiles don’t work
Hint forces index range scan
Small amount of additional code – not like
Cobol on mainframe
Step 3: Choose the best strategy
for fixing the cardinality problem

Example 4 - SMALL table


MANY relates to 1 – there are many rows with
value 1
FEW relates to 2 – there are few with value 2
insert into SMALL values ('MANY',1);
insert into SMALL values ('FEW',2);
Step 3: Choose the best strategy
for fixing the cardinality problem

Example 4 - LARGE table:
1,000,000 rows with value 1
 1 row with value 2
NUM
COUNT(*)
---------- ---------1
1000000
2
1

Step 3: Choose the best strategy
for fixing the cardinality problem

SQL statement – returns one row
select B.NUM
from SMALL A,LARGE B
where
A.NUM=B.NUM and
A.NAME='FEW';
Step 3: Choose the best strategy
for fixing the cardinality problem

Plan with wrong number of rows = 125,000
---------------------------------------------| Operation
| Name
| Rows |
---------------------------------------------| SELECT STATEMENT
|
|
125K|
| HASH JOIN
|
|
125K|
|
TABLE ACCESS FULL
| SMALL
|
1 |
|
INDEX FAST FULL SCAN| LARGEINDEX | 1000K|
----------------------------------------------
Step 3: Choose the best strategy
for fixing the cardinality problem

Column statistics – two buckets on all
columns – using histograms
LOW
HIGH NUM_DISTINCT NUM_BUCKETS
---------- ---------- ------------ ----------1
2
2
2
LOW HIGH NUM_DISTINCT NUM_BUCKETS
---- ---- ------------ ----------FEW MANY
2
2
Step 3: Choose the best strategy
for fixing the cardinality problem

Table statistics – SMALL has 2 rows,
LARGE 1000001
NUM_ROWS
---------2
NUM_ROWS
---------1000001
Step 3: Choose the best strategy
for fixing the cardinality problem




125000=1000001/8
Optimizer appears to assume all eight
combinations of the three columns’
values are equally likely
Can’t verify formula – references don’t
include formula with histograms
Even worse without histograms –
cardinality is 500000
Step 3: Choose the best strategy
for fixing the cardinality problem

No SQL profile from SQL Tuning Advisor:
There are no recommendations to improve the
statement.

Neither histograms nor SQL profiles help
example 4
Step 3: Choose the best strategy
for fixing the cardinality problem

Statement with hints:


Use index
Don’t do full scan
select /*+ INDEX(B LARGEINDEX)
NO_INDEX_FFS(B LARGEINDEX)
B.NUM
from SMALL A,LARGE B
where
a.NUM=B.NUM and
A.NAME='FEW';
*/
Step 3: Choose the best strategy
for fixing the cardinality problem

Time without a hint (1 second):
Elapsed: 00:00:01.03

Time with a hint (1/100th second):
Elapsed: 00:00:00.01
Step 3: Choose the best strategy
for fixing the cardinality problem

Changing the application


Change your tables so that the optimizer gets your
SQL’s cardinality right
Requires more work designing tables, but keeps
productivity benefits of SQL
Step 3: Choose the best strategy
for fixing the cardinality problem

Example 4 – moved NAME column to LARGE table and
split table in two



One million (‘MANY’,1) rows in LARGEA
One (‘FEW’,2) row in LARGEB
Query:
select NUM
from (select * from largea
union
select * from largeb)
where
NAME='FEW';
Step 3: Choose the best strategy
for fixing the cardinality problem

Plan is just as efficient as with hint:


Number of rows = 2 (reality is 1)
Range Scan
-------------------------------------------------------------| Id | Operation
| Name
| Rows |
-------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
2 |
|
1 | VIEW
|
|
2 |
|
2 |
SORT UNIQUE
|
|
2 |
|
3 |
UNION-ALL
|
|
|
|
4 |
TABLE ACCESS BY INDEX ROWID| LARGEA
|
1 |
|* 5 |
INDEX RANGE SCAN
| LARGEAINDEX |
1 |
|
6 |
TABLE ACCESS BY INDEX ROWID| LARGEB
|
1 |
|* 7 |
INDEX RANGE SCAN
| LARGEBINDEX |
1 |
--------------------------------------------------------------
Step 3: Choose the best strategy
for fixing the cardinality problem

Time without table change (1 second):
Elapsed: 00:00:01.03

Time with table change (1/100th second):
Elapsed: 00:00:00.01
Conclusion

SQL improves productivity, optimizer has limits


Identify cases where cardinality is wrong
Understand why the database got it wrong



One column
Multiple columns
Choose best strategy to fix



Give optimizer more info
Override optimizer’s choices
Redesign tables
References




Cost Based Optimizer Fundamentals, Jonathan Lewis
Metalink Note:212809.1, Limitations of the Oracle Cost Based
Optimizer
Metalink Note:68992.1, Predicate Selectivity
Histograms – Myths and Facts, Wolfgang Breitling
Select Journal, Volume 13, Number 3