Relational Algebra Learning Outcomes

Download Report

Transcript Relational Algebra Learning Outcomes

Query Optimization
Dr. Karen C. Davis
Professor
School of Electronic and Computing Systems
School of Computing Sciences and Informatics
Outline
• overview of relational query optimization
• logical optimization
– algebraic equivalences
– transformation of trees
• physical optimization
– selection algorithms
– join algorithms
• cost-based optimization
• research example using relational algebra
Relational Query Optimization
relational algebra
query tree
SQL query
logical
physical
query optimizer
access plan
(executable)
Learning Outcomes
• translate basic SQL to RA query tree
• perform heuristic optimizations to tree
• use cost-based optimization to select algorithms
for tree operators to generate an execution plan
SQL is declarative
• describes what data, not how to retrieve it
select distinct …
from …
where …
• helpful for users, not necessarily good for
efficient execution
Relational Algebra is procedural
• specifies operators and the order of evaluation
• steps for query evaluation:
1. translate SQL to RA operators (query tree)
2. perform heuristic optimizations:
a. push RA select operators down the tree
b. convert select and cross product to join
c. others based on algebraic transformations
Relational Algebra Operators
name
select
project
cross
product
join
symbolically
evaluation
applies condition c to R
σcR
keeps a list (l) of attributes of R
πl R
RXS
all possible combinations of tuples
of R are appended with tuples
from S
R ⋈c S
πl (σc (R X S)), where l is a list of
attributes of R and S with duplicate
columns removed and c is a join
condition
SQL to RA
select distinct …
from …
where …
πl
πl
|
σc
|
X
three relations 
/ \
S
πl
σc
|
|
X
σc
 two relations
|
R
four relations ↓
πl
x
σc
|
X
/ \
X
/\
R T
S
/ \
X
/\
X T
/ \
R U
S
SQL to RA Tree Example
select A.x, A.y, B.z
from A, B
where A.a = B.z and A.x > 10
πA.x, A.y, B.z
|
σA.a = B.z and A.z > 10
evaluated bottom-up left to right;
intermediate values are passed up
the tree to the next operator
|
X
/ \
A
B
SQL to RA Tree Example
πlname
select lname
|
from employee, works_on, projects σ
pname = ‘Aquarius’ and
where pname = ‘Aquarius’ and
pnumber = pno and
essn = ssn and
pnumber = pno and
bdate = ‘1985-12-03’
essn = ssn and
|
bdate = ‘1985-12-03’
X
/ \
X projects
/ \
employee works_on
Simple Heuristic Optimization
πl
1. cascade selects (split them up)
|
σ c1
πl
|
|
σc1 and c2 and c3
|
X
σ c2
|
σc3
|
X
/ \
R
S
/ \
R
S
2. Push any single attribute selects down the
tree to be just above their relation
πl
πl
|
|
σ c1
σ c2
|
σ c2
|
σc3
|
X
/ \
R
S
|
X
/ \
σc 1 σc
|
R
|
S
3
3. Convert 2-attribute select and cross product
to join
efficient join
algorithms
πl
|
|
⋈c
/ \
σ c 1 σc
σ c2
2
|
X
/ \
σc 1 σ c
|
R
πl
|
S
3
smaller
intermediate
results
|
R
|
S
3
Practice problem: optimize RA tree
select P.pnumber, P.dnum, E.lname, E.bdate
from projects P, department D, employee E
where D.dnumber = P.dnum and // c1
D.mgrssn = E.ssn and
// c2
P.plocation = ‘Stafford’;
// c3
RA tree to RA expression
πl
|
⋈c
/ \
σc 1 σc
πl( σc R
1
2
|
R
|
S
3
⋈c2 σc S )
3
Other Operators in Relational Algebra
SQL:
(select pnumber from projects, department, employee
where dnum = dnumber and mgrssn = ssn
and lname = 'Smith‘)
union
(select pnumber from projects, works_on, employee
where pnumber = pno and essn = ssn
and lname = 'Smith');
RA:
π pnumber (σ lname = ‘Smith’ employee ⋈ssn=mgrssn department
⋈ dnumber = dnum projects)
⋃
π pnumber (σ lname = ‘Smith’ employee ⋈ssn=essn works_on
⋈ pnumber = pno projects)
Selection
Algorithms
•
•
•
•
•
•
•
linear search
binary search
primary index or hash for point query
primary index for range query
clustering index
secondary index
conjunctives
– individual index
– composite index or hash
– intersection of record pointers for multiple indexes
Join Algorithms
•
•
•
•
nested loop join
single-scan join
sort-merge join
hash join
example execution plan
sort-merge
using
indexes
http://docs.oracle.com/cd/E13085_01/doc/timesten.1121/e14261/query.htm
Multiple View Processing Plan (MVPP)
 view chromosome:
101100010100001
Q2
Q3
πO.orderkey,
πC.custkey, C.name,
πP.type,
O.shippriority
C.acctbal, N.name,
C.address, C.phone
L.extendedprice
Q1
(v9)
 index chromosome:
1100110
σ C.mktsegment =
σ O.orderdate = “1994-10-
σ L.shipdate = “1995-
“building”
and L.shipdate = “1995-
01”
09-01”
03-15”
 Fitness: sum of query
processing costs of
individual queries
using the views and
indexes selected
(v15)
(v12)
(v8)
(v14)
(v11)
⋈nationkey
(v10)
⋈orderkey
(v7)
⋈custkey
πname, address,
phone, acctbal,
nationkey, custkey,
mktsegment
⋈partkey
(v6)
(v1)
Customer (C)
(v13)
πorderkey,
πpartkey, orderkey,
orderdate, custkey,
shippriority
shipdate,
extendedprice
(v2)
Orders (O)
(v3)
Lineitem (L)
πnationkey,
name
(v4)
Nation (N)
πpartkey,
type(v5)
Part (P)
thesis defense of Sirisha Machiraju: Space Allocation for Materialized Views
and Indexes Using Genetic Algorithms, June 2002