15. Query Optimization   System Design Parsing  Examples  Modern Optimizers     EXPLAIN – the output of an optimizer Overview of internals Dynamic programming VP – view the internals • Examples 

Download Report

Transcript 15. Query Optimization   System Design Parsing  Examples  Modern Optimizers     EXPLAIN – the output of an optimizer Overview of internals Dynamic programming VP – view the internals • Examples 

15. Query Optimization


System Design
Parsing
 Examples

Modern Optimizers




EXPLAIN – the output of an optimizer
Overview of internals
Dynamic programming
VP – view the internals
• Examples
 Variants
• Top Down Optimization
• Optimizer Hints
 Unnesting Queries
11/6/2015
PSU’s CS 587
1
Learning Objectives
Explain EXPLAIN
 Explain VP’s output
 Explain each variant
 Flatten a nested query.

11/6/2015
PSU’s CS 587
2
Overview of Query Processing
Web Form
Applic. Front end
SQL interface
SQL
Security
Parser
Relational Algebra(RA)
Catalog
Optimizer
Executable Plan (RA+Algorithms)
Concurrency
Crash
Recovery
Plan Executor
Files, Indexes &
Access Methods
Database, Indexes
11/6/2015
PSU’s CS 587
3
Now we focus on the top of this
diagram
SQL Query
Relation Algebra Query
Parser
Query Optimizer
Relational Operator Algs.
Files and Access Methods
Buffer Management
Disk Space Management
DB
11/6/2015
PSU’s CS 587
4
Detail of the top
SQL Query
(SELECT …)
Query Parser
Relational Algebra Expression (Query Tree)
Query Optimizer
Plan
Generator
Plan Cost
Estimator
Catalog
Manager
Query Tree + Algorithms (Plan)
Plan Evaluator
11/6/2015
PSU’s CS 587
5
Parsing and Optimization

The Parser



Verifies that the SQL query is syntactically correct, that
the tables and attributes exist, and that the user has the
appropriate permissions.
Translates the SQL query into a simple query tree
(operators: relational algebra plus a few other ones)
The Optimizer:


Generates other, equivalent query trees
(Actually builds these trees bottom up)
For each query tree generated:



Selects algorithms for each operator (producing a query plan)
estimates the cost of the plan
Chooses the plan with lowest cost (of the plans considered,
which is not necessarily all possible plans)
11/6/2015
PSU’s CS 587
6
Schema for Examples

Download Postgres source from
 postgresql.org/download

Logical and physical schema is at
 src/test/bench/create.source

Simplified version is at
 www.cs.pdx.edu/587/create.bench

Log into the database at
 https://www.cat.pdx.edu/pgCS587/
 Click on PostgreSQL
 User CS587, Password 587dbms, see notes view

11/6/2015
Browse the tables, especially tenk1 and its attributes
unique1, unique2, stringu1
PSU’s CS 587
7
Here’s what the parser does
SQL Query:
SELECT
FROM
USING
WHERE
Relational Algebra Tree:
tenk1.unique1
tenk1 JOIN tenk2
unique2
tenk1.stringu1='xxx';
tenk1.unique1
 tenk1.stringu1='xxx'
⋈
Unique2=unique2
tenk1
11/6/2015
PSU’s CS 587
tenk2
8
Practice: Parse a Query
Describe the parser's output when the input is
SELECT stringu2
FROM tenk1 JOIN tenk2
USING unique1
WHERE tenk1.stringu1='abc';

11/6/2015
PSU’s CS 587
9
How Can We View the Optimizer?


Postgres calls its optimizer the Planner
Postgres' planner algorithm [668] is the same as all
modern DBMSs' optimizer algorithms
 Except SQL Server




We have good news and good news.
We can see both the planner's output AND its
internal processing.
Its output is available to anyone; its internal
processing is avaliable only to us and a few others.
The output is displayed by the EXPLAIN statement
 Every DBMS has a version of EXPLAIN (e.g., SHOW PLAN)
11/6/2015
PSU’s CS 587
10
Postgres’ EXPLAIN
Output for
EXPLAIN SELECT * FROM tenk1;

Seq Scan on tenk1 (cost=0.00.. 445.00 rows=10000 width=244)
Sequential
Scan



I/Os to get
first row*
I/Os to get
last row*
Rows
retrieved
Average Row
Width
These values are estimates from sampling.
Very useful when a query runs longer than expected.
All our examples are from
 www.postgresql.org/docs/8.3/interactive/using-explain.html
*Actually this includes CPU costs but we will call it I/O costs to simplify
11/6/2015
PSU’s CS 587
11
More EXPLAIN examples

EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 7000;





Seq Scan on tenk1 (cost=0.00..470.00 rows=7124 width=244)
Filter: (unique1 < 7000)
Cost is higher because of CPU cost for filtering
#rows is off because of estimation using histogram
EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 1;
 Index Scan using tenk1_unique1 on tenk1 (cost=0.00..8.27 rows=1 width=244)
 Index Cond: (unique1 < 1)
 Why is the cost so much less?

EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 10;
 Bitmap Heap Scan on tenk1 (cost=4.34..42.58 rows=11 width=244)

Recheck Cond: (unique1 < 10)

-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..4.33 rows=11 width=0)

Index Cond: (unique1 < 10)
 Shopping list optimization!
11/6/2015
PSU’s CS 587
12
The planner's internal processing

So far we've seen the planner's output: its opinion as
to what is the fastest plan. How does it reach that
conclusion?

Fortunately, our TA, Tom Raney, has just added a
patch to PostgreSQL (PG) that allows anyone to look
inside the planner.
 This was part of a Google Summer of Code project

One of the lead PG developers says “it’s like finding
Sasquatch”.

No other DBMS has this capability.

We’ll use Tom’s patch to view the planner's internals.
11/6/2015
PSU’s CS 587
13
Overview of DBMS Optimizers

Recall that "optimizing a query" consists of
these 4 tasks
1.
Generate all trees equivalent to the parser-generated
tree*
Assign algorithms to each node of each tree
2.
• A tree with algorithms is called a plan**.
3.
Calculate the cost of each generated plan
• Using the join cost formulas we learned in previous slides***
4.
Choose the cheapest plan
*A nice independent study project would be to write a visualizer for the
parser
**Use Raney's Visual Planner here to look at a plan
*** Statistics for calculating these costs are kept in the system catalog.
11/6/2015
PSU’s CS 587
14
Dynamic Programming



A no-brainer approach to these 4 tasks could take
forever. For medium-large queries there are millions
of plans and it can take a millisecond to compute
each plan cost, resulting in hours to optimize a
query.
This problem was solved in 1979 [668] by Patsy
Selinger's IBM team using Dynamic Programming.
The trick is to solve the problem bottom-up:



11/6/2015
First optimize all one-table subqueries
Then use those optimal plans to optimize all two-table
subqueries
Use those results to optimize all three-table subqueries, etc.
PSU’s CS 587
15
Consider A Query and its Parsed
Form
SELECT tenk1.unique1
FROM tenk1 JOIN tenk2 USING (unique2)
WHERE tenk1.unique1< 100;
tenk1.unique1
 tenk1.unique1<100
⋈
unique2=unique2
tenk1
11/6/2015
PSU’s CS 587
tenk2
16
What Will a Selinger-type
Optimizer Do?
1.
Optimize one table subqueries
• tenk1 WHERE unique1<100
• This is called "pushing selects"
• Then optimize tenk2
2.
Use the results of the previous steps to Optimize
two-table queries
• The entire query

11/6/2015
Let's use Raney's patch, the Visual Planner, to see
what PG's Planner does.
PSU’s CS 587
17
How to Set Up Your Visual Planner

Download, then unzip, in Windows or *NIX:



Read README.TXT, don't worry about details
Be sure your machine has a Java VM


http://www.java.com/en/download/index.jsp
Click on Visual_Planner.jar



cs.pdx.edu/~len/587/VP1.7.zip
If that does not work, use this at the command line:
java -jar Visual_Planner.jar
In the resulting window


File/Open
Navigate to the directory where you put VP1.7
•

11/6/2015
Navigating to C: may take a while
Choose plan1.pln
PSU’s CS 587
18
Windows in the Visual Planner





11/6/2015
The SQL window holds the (canned) query
The Plan Tree window holds the optimal plan for the
query (in this VP view).
The Statistics window holds statistics about the
highlighted node of the Plan Tree's plan
Click a Plan Tree node to see its statistics
Why is a nested loop the optimal plan?
PSU’s CS 587
19
Why Not?





11/6/2015
To see other plans, click on tenk1/tenk2 in the ROI
window.
Then shift-click the plan you want to see
Plans are in alphabetical order, then by total cost
Why isn't a merge join cheaper?
Why isn't a hash join cheaper?
PSU’s CS 587
20
Visualize Dynamic Programming

Recall the first steps of Dynamic Programming:
 Optimize tenk1 where unique1<100
 Optimize tenk2.



VP calls these the ROI* steps and they are displayed
in the ROI window of VP.
In the ROI window, click on the symbol next to tenk2
to see how the PG Planner optimized tenk2.
Note that blue plans are saved for later steps, red
plans are discarded.
*Postgres uses an internal data structure called RelOptInfo to hold the
relations currently being optimized
11/6/2015
PSU’s CS 587
21
Optimizing tenk2; Interesting Orders




The cheapest access path(plan)* is a sequential scan.
However, an index scan is also saved. Why? Because
the index scan has an order associated with it, and
the order is an interesting order.
The order is unique2, and unique2 is the joining
attribute for the later join.
It may be worth sacrificing some cost here to save the
cost of a sort later!
*plan = access path since tenk2 is a single table
11/6/2015
PSU’s CS 587
22
Optimizing tenk1

11/6/2015
Explain each of the planner's decisions in its
optimization of tenk1.
PSU’s CS 587
23
Variants on what we've discussed

SQLServer: Top down
 Graefe, McKenna, “The Cascades Framework for query
optimization”, DEBulletin, 1995.


11/6/2015
Hints
Rewrite optimization rules: unnesting
PSU’s CS 587
24
Top Down Optimization




Begin with original query
Consider subqueries, optimize them.
Depth first search
Example: A ⋈ B ⋈ C
 First optimize, say, (A ⋈ B) ⋈C.
 If its cost is less than B ⋈ C, need not calculate the cost of A
⋈ (B ⋈C).
 Memo structure used to keep track of optimized subqueries.
11/6/2015
PSU’s CS 587
25
Optimizer Hints

A hint tells the optimizer to ignore its algorithm in
part, for example
 Order the joins in a certain way
 Use a particular index
 Use a type of join for a pair of tables.

Oracle has over 120 possible hints
 www.dba-oracle.com/art_otn_cbo_p7.htm

SQL Server
 www.sql-server-performance.com/tips/hints_general_p1.aspx
11/6/2015
PSU’s CS 587
26
15.5 Nested Queries

No-brainer method for
executing nested queries
 Tuple iteration semantics
 For each outer tuple, evaluate
inner block
 Equivalent to simple nested loop
join


Optimizer optimizes inner
block, then outer block
Is there a better way?
11/6/2015
PSU’s CS 587
SELECT S.sid
FROM Sailors S
WHERE EXISTS
(SELECT *
FROM Reserves R
WHERE R.bid=103
AND R.sid=S.sid)
Nested block to optimize:
SELECT *
FROM Reserves R
WHERE R.bid=103
AND R.sid= outer value
27
Unnesting queries


Optimizer (proprietary
systems mostly) or you
(open source systems
mostly) unnest nested
queries.
If the query is unnested, the
optimizer can use bulk join
algorithms (merge, hash join)
and performance can be
much better.
11/6/2015
Equivalent non-nested query:
SELECT DISTINCT S.sid
FROM Sailors S, Reserves R
WHERE S.sid=R.sid
AND R.bid=103
PSU’s CS 587
28
Unnesting with COUNT



SELECT S.sname
FROM Sailors S
WHERE S.rank >
(SELECT COUNT(R.*)
FROM Reserves R
WHERE R.sid=S.sid)
Beware if there is a COUNT
in the subquery
The query may appear to
unnest into a join with a
GROUP BY.
But consider a sailor with a SELECT S.sname
high rank and no
FROM Sailors S, Reserves R
reservations
WHERE S.sid=R.sid
GROUP BY S.sid
HAVING S.rank > COUNT(R.*)
11/6/2015
PSU’s CS 587
29
Unnesting – The Count Bug [298]


The query may not unnest
into a join, but rather an
outer join.
Many queries are much
harder or impossible to
unnest!
SELECT S.sname
FROM Sailors S
WHERE S.rank >
(SELECT *
FROM Reserves R
WHERE R.sid=S.sid)
SELECT S.sname
FROM Sailors S NATURAL RIGHT OUTER JOIN Reserves R
GROUP BY S.sid
HAVING S.rank > COUNT(R.*)
11/6/2015
PSU’s CS 587
30