14. Evaluating Relational Operators          Table Statistics Operators and example schema Selection Projection Equijoins General Joins Set Operators Buffering Intergalactic standard reference  G.

Transcript 14. Evaluating Relational Operators          Table Statistics Operators and example schema Selection Projection Equijoins General Joins Set Operators Buffering Intergalactic standard reference  G.

14. Evaluating Relational Operators









Table Statistics
Operators and example schema
Selection
Projection
Equijoins
General Joins
Set Operators
Buffering
Intergalactic standard reference
 G. Graefe, "Query Evaluation Techniques for Large Databases,"
ACM Computing Surveys, 25(2) (1993), pp. 73-170
11/6/2015
PSU’s CS 587
1
Learning Objectives
In a typical major DBMS, what statistics are
automatically collected and when?
 Given collected statistics, estimate a
predicate’s output size/selectivity.
 For each relational operator, describe the
major algorithms, their optimizations, their
pros and cons, and their costs






11/6/2015
Selection
Projection
Equijoins
General Joins
Set Operators
PSU’s CS 587
2
Motivation/Review
Assume Sailors has an index on age.
 Does the optimal plan for this query use the
index?
SELECT *
FROM Sailors S
WHERE S.age < 31
 Moral: In order to choose the optimal plan we
need to know the selectivity of the
predicate/size of the output.

11/6/2015
PSU’s CS 587
3
Collecting Statistics

What are statistics
 Table sizes, index sizes, ranges of values, etc.


Where are statistics kept? In the system catalog.
Why collect statistics?
 Statistics are needed to determine selectivity of predicates and
sizes of outputs of operators.
 Data in the previous bullet is needed to calculate costs of plans.
 Data in the previous bullet is needed by the optimizer to find the
optimal plan.

How often are statistics collected?
 Typically done when 10% of the data has been updated.
• Can be overidden manually
• UPDATE STATISTICS, ANALYZE
 Typically done by sampling, if table is large

Which tables/columns are monitored?
 Typically all tables/columns are monitored
11/6/2015
PSU’s CS 587
4
Sailors Example

What statistics
would you
collect to find
the number of
rows satisfying
 Age < 31?
 Rank =4 ?

In Intro DB we
assumed data
was uniformly
distributed.
30/30.2/30.4/30.6/30.8/30.9/31/32/33/34.5
35/36/36.6/37.8/39/39.5/40/41/43/44
/46/48/50/51/52/54/56/58/59/60
11/6/2015
PSU’s CS 587
5
Histograms


Histograms are more accurate than the uniformity assumption.
Equiwidth Histogram:
#vals 9
range 30

3
3 3
2 1
2
2 2
3
33 36 39 42 45 48 51 54 57
Estimate number of rows/selectivity for the predicate “age< 31”:

Equidepth Histogram:
#vals 3 3 3 3 3 3 3 3 3 3
range 30 30.430.9 33 36 39 41 46 51 56
Estimate number of rows/selectivity for the predicate “age< 31”:

Actual value: 6. Moral: Equidepth histograms are more accurate.

11/6/2015
PSU’s CS 587
6
Real DBMSs Store Histograms and More
 The next slide is the result of:
SELECT attname, n_distinct, most_common_vals,
most_common_freqs,
histogram_bounds
FROM pg_stats WHERE tablename = 'sailors';
 n_distinct: positive means number of distinct values,
negative means distinct values over number of rows, so
-1 means unique
 most_common_vals: NULL if no values are more
common than others
 histogram: equidepth, for values except most common
11/6/2015
PSU’s CS 587
7
Example Statistics: Postgres
EXPLAIN SELECT * from sailors where age < 31;
QUERY PLAN
Seq Scan on sailors (cost=0.00..1.38 rows=6 width=21)Filter:
(age < 31::double precision)
11/6/2015
PSU’s CS 587
8
Compound Conditions

How many rows:
 WHERE age < 31 AND rank = 4
• Assume statistical independence
• Selectivity of AND is product of selectivities
 WHERE age < 31 OR rank = 4
• Typical formula is s1+s2-s1*s2
11/6/2015
PSU’s CS 587
9
Join Size Estimation

Suppose
 there are ‫ ר‬rows in R, ‫ ש‬rows in S,
 R ⋈ S is a join on ri = sj
 ri is a foreign key that references sj
How many rows are there in R ⋈ S?
 Most real joins are foreign key joins

11/6/2015
PSU’s CS 587
10
Pages, Rows in a Table?
select relpages, reltuples
from pg_class where relname = 'sailors';
11/6/2015
PSU’s CS 587
11
Assumptions

General assumptions:
 All attributes are uniformly distributed.
 2 Meg buffer space, or 256 8K pages
• So can sort up to 512Mof data, 64K pages, in two passes
 All indexes are alternative 2, height 2.
 Data entries are 10% the size of data records.
 Each I/O takes 10ms

Reserves, Sailors, Boats:
 Sailors (sid: integer, sname: string, rank: integer, age: real)
• Clustered index on sid
• Nonclustered index on rank
 Reserves (sid: integer, bid: integer, day: dates, rname: string)
• Clustered index on (sid,bid,day)
• Nonclustered index on bid
 Boats (bid: integer, color:string)
• Clustered index on bid
Bytes
/Row
Rows
/Page
Pages
Rank: 1 to 10
Sailors
50
80
500
Bid: 1 to 100
Reserves
40
100
1000
Boats
16
250
20
11/6/2015
PSU’s CS 587
12
14. Evaluating Relational Operators
Operators and example schema
 Selection

 One term
 Nonclustered index optimization
 Multiple terms
Projection
 Equijoins
 General Joins
 Set Operators
 Buffering

11/6/2015
PSU’s CS 587
13
14. Op.Eval.
14.1 One Table, One Term
SELECT *
FROM Reserves R
WHERE R.rname < ‘C%’
• Predicate is of the form R.attr op value, where op is < .
• Assume 10% of rnames begin with A, B.
• Suppose there is a clustered B+ tree index and an unclustered
B+ tree index, both on rname.
• What are the possibile plans (algorithms)?
• Which plan will the optimizer choose?
• We’ll compute the cost of each plan/algorithm.
11/6/2015
PSU’s CS 587
14
What is the optimal plan?

14. Op.Eval.
File Scan Plan Cost
 I/Os
 Seconds

Nonclustered Index Scan Plan Cost





Clustered Index Scan Plan Cost






11/6/2015
Access first qualifying data entry
Access first qualifying data item
Access all qualifying data items
Total Cost
Access first qualifying data entry
Access first qualifying data item
Access all qualifying data items
Access all qualifying data entries
Total Cost
What is the optimal plan?
PSU’s CS 587
15
CLUSTERED
Index entries
UNCLUSTERED
Data entries
Data entries
(Index File)
(Data file)
Data Records
11/6/2015
PSU’s CS 587
Data Records
16
Costs for a one-term selection on R, with
selectivity ρ

File scan
 Number of pages in R = M

Nonclustered index scan
 Worst case is at least the number of qualifying
rows in R = (MpR)ρ.

Clustered index scan
 Hight of index + number of qualifying data entries
+ number of qualifying data pages
= 2 + .1Mρ + Mρ = 2+1.1Mρ
*The .1 comes from our assumption that data entries are 10% the length of
data items, and should be adjusted accordingly.
11/6/2015
PSU’s CS 587
17
How do the access methods compare?
File Scan
M
Nonclustered Index Scan
Clustered Index Scan

At least pR*ρ*M
2+1.1*ρ*M
Remember
 Clustered Indexes are rare
 pR is typically large

Conclusions
 Clustered Index Scan is optimal if ρ< 1/1.1
 If there is no clustered index, File Scan is a big win unless ρ is
very small

11/6/2015
Let’s look at ways to make a Nonclustered Index Scan
even better
PSU’s CS 587
18
14. Op.Eval.
Nonclustered Index Optimization

Consider this algorithm for using a nonclustered
index (the “shopping list” optimization):
1. Retrieve the RIDs in all the qualifying data entries.
2+ ρ*0.1*M
2. Sort those RIDs in order of page number. ???
3. Fetch data records using those RIDs in sorted order. This
ensures that each data page is retrieved just once (though
the # of such pages likely to be higher than with
clustering). At most M
• If ρ is small, cost is much less.

Cost? At most 2+(1+.1ρ)M + ???
11/6/2015
PSU’s CS 587
19
A Second Nonclustered Index
Optimization: The ??? Term.

14. Op.Eval.
What if RIDs are too large to sort in memory?
1. Retrieve the RIDs in all the qualifying data entries.
2’. Make a bitmap, one bit for each page in the table.
3’. For each page in a qualifying RID, turn on the corresponding bit in
the bitmap
4’. Fetch each page whose bit is on.
5’. Scan every record in every fetched page to find which ones qualify.
Why is step 5’ necessary?
 Cost: At most 2+(1+.1ρ)M

 If ρ is small, cost is much less.
11/6/2015
PSU’s CS 587
20
How do the access methods compare?
File Scan
M
Nonclustered Index Scan
Clustered Index Scan

At most 2+(1+.1ρ)M
if ρ is small it will be much less
2+1.1*ρ*M
Conclusions
 Clustered Index Scan is optimal if ρ< 1/1.1
 If there is no clustered index, File Scan is a slight
win unless ρ is very small.
11/6/2015
PSU’s CS 587
21
Access Paths, Matching

An access path is a way of retrieving tuples.
 A full or partial scan
 An indexed access
So an optimal plan, in this simple case (oneterm WHERE clause), is an optimal access path.
 A term matches an access path if that access path
can be used to retrieve just the tuples satisfying
the term.

 Example: bid>50 matches what access path?
11/6/2015
PSU’s CS 587
22
14. Op.Eval.
General Selections
SELECT attribute list FROM relation list
WHERE term1 AND ... AND termk

k>1
First approach:
1. Choose a term (Which one?)
2. Retrieve tuples using the optimal access path for that term.
3. Apply any remaining terms, in-memory

Second approach
Applicable if we have 2 or more access paths matching
terms and using Alternatives (2) or (3)
1. Get sets of rids of data records using each matching index.
2. Then intersect these sets of rids
3. Retrieve the records and apply any remaining terms.


Which approach is cheaper?
11/6/2015
PSU’s CS 587
23
14. Evaluating Relational Operators
Operators and example schema
 Selection
 Projection

 What is the problem?
 Sort and hash approaches
Equijoins
 General Joins
 Set Operators
 Buffering

11/6/2015
PSU’s CS 587
24
14. Op.Eval.
14.3 Projection
SELECT DISTINCT
FROM


R.sid, R.bid
Reserves R
Hard part of projection is duplicate elimination
Simple sort-based projection

1.
2.
3.
Assume size ratio is 0.50, duplicate ratio is 0.25
Eliminate all attributes except sid, bid.
Sort the result by (sid,bid).
Scan that result, eliminating duplicates
• Cost?

Optimizations?

Combine attribute elimination with sort
• Cost?

Combine duplicate elimination with sort.
•
11/6/2015
Best case cost?
PSU’s CS 587
25
14. Op.Eval.
Projection Based on Hashing

Simple Hash-based projection
 Eliminate unwanted attributes
 Hash the result into B-1 output buffers and
partitions
 For each partition: Read in, eliminate duplicates in
memory, write the answer

Optimizations?
 Combine elimination of unwanted attributes with
hashing.
 Eliminate duplicates as soon as possible
11/6/2015
PSU’s CS 587
26
Discussion of Projection


Sort-based approach is the standard; better handling of skew
and result is sorted.
If an index on the relation contains all wanted attributes in its
search key, can do index-only scan.


Apply projection techniques to data entries (much smaller!)
If an ordered (i.e., tree) index contains all wanted attributes as
prefix of search key, can do even better:

Retrieve data entries in order (index-only scan), discard unwanted
fields, compare adjacent tuples to check for duplicates.
Single index (bid, sid, day)
Prefixes
11/6/2015
{
PSU’s CS 587
27
14. Evaluating Relational Operators




Operators and example schema
Selection
Projection
Equijoins








11/6/2015
Nested Loop Join
Index Nested Loop Join
Sort-Merge Join
Hash Join
Hybrid Hash Join
General Joins
Set Operators
Buffering
PSU’s CS 587
28
14.4 Equijoin With One Join Column

Example:
SELECT S.name, R.bid
FROM
Reserves R, Sailors S
WHERE R.sid=S.sid AND bid = 100
sname
⋈sid=sid
bid=100
Reserves Sailors
11/6/2015
PSU’s CS 587
29
14. Op.Eval.
Nested Loops Join
(1) foreach tuple r in R do
(2) for each tuple s in S
if ri == sj then output <r, s>

11/6/2015
R is called the outer, S the inner table.
PSU’s CS 587
30
14. Op.Eval.
The Join Algorithms

Every in-memory join algorithm is in some sense a
special case of Nested Loops
 Simple Nested Loops: Scan outer, scan inner
 Index Nested Loops: Scan outer, index scan inner
 Sort Merge: Inputs must be sorted on joining attributes.
Scan outer in order, scan matching interval of inner.
 Hash Join: special case of Index Nested Loops, where index
is constructed on-the-fly and inputs are partitioned.

Implications
 Every join algorithm scans its outer relation.
 Every join algorithm costs at least M.
11/6/2015
PSU’s CS 587
31
Cost of Nested Loops Join

(Scan of R)+(# rows in R)*(Scan of S)
= M + (M*pR*ρ)*N
 Where ρ is the selectivity of the selection on R
 For simplicity we are assuming that M is accessed with
a file scan. In general M could be accessed with an
index scan and the first cost “M” would be replaced by
the cost of the index scan.

For our example, the cost is prohibitive:
 1000+(1,000)*500 = 501,000 I/Os = 5,010 seconds
~= 1.4 hours

11/6/2015
Nested loops is used only for small tables.
PSU’s CS 587
32
14. Op.Eval.
Index Nested Loops Join
Valid only when S has an index on sj
foreach tuple r in R do
foreach tuple s in S where ri == sj  use index here
add <r, s> to result

Cost: M + (M*pR* ρ) * (Cost of finding matching S tuples)
 Again, the first cost M might be replaced by an index scan cost.

Cost of finding matching S tuples= cost of probing S index +
cost of finding S tuples.
 For each R tuple, cost of probing S index is about 1.2 for hash index, 24 for B+ tree.
 Cost of then finding S tuples (assuming Alt. (2) or (3) for data entries)
depends on clustering.
•
11/6/2015
Clustered index: 1 I/O (typical), unclustered: up to 1 I/O per matching S
tuple.
PSU’s CS 587
33
14. Op.Eval.
Examples of Index Nested Loops
In our example Sailors has a clustered index
on sid so we can apply INL.
 Cost is M + (M*pR* ρ) * 3 = 1000 +
(1000*100*.01) *3 = 4000 I/Os = 40 seconds.

11/6/2015
PSU’s CS 587
34
14. Op.Eval.
14.4.2 Sort-Merge Join

Costs are in Blue.
 Assume R, S can be sorted in 2 passes, i.e. M, N < B2

No-brainer Implementation:





Sort R 4M
Sort S 4N
Merge-join R and S M+N
Total Cost 5(M+N)
Better Implementation:




Sort* R into runs 2M
Sort* S into runs 2N
Merge runs of R and S and merge-join R and S M+N
Total Cost 3(M+N)
*Use snowplow/replacement sort optimization
11/6/2015
PSU’s CS 587
35
14. Op.Eval.
Example of Sort-Merge Join
sid
22
28
31
44
58
sname rating age
dustin
7
45.0
yuppy
9
35.0
lubber
8
55.5
guppy
5
35.0
rusty
10 35.0
11/6/2015
sid
28
28
31
31
31
58
PSU’s CS 587
bid
103
103
101
102
101
103
day
12/4/96
11/3/96
10/10/96
10/12/96
10/11/96
11/12/96
rname
guppy
yuppy
dustin
lubber
lubber
dustin
36
Sort-Merge Join Optimization
Runs from R
..
...
Fill
Buffers
Merge
Join
Output
Runs from S
...
11/6/2015
..
PSU’s CS 587
37
⋈
Hybrid SMJ costs
R
S
R
S
If R and S are sorted, cost is M+N
 If R is sorted, cost is M+N +2 ρSN – 2( B - ρSN/B )

 Argument similar to preceding page
If neither R nor S are sorted, cost is M+N + 2 ρRM
+ 2 ρSN -2( B - ρRM/B - ρSN/B )
 The initial M and N costs might be replaced by
index scan costs if appropriate.

11/6/2015
PSU’s CS 587
38
⋈
Example: Hybrid SMJ


11/6/2015
bid=100
rank>5
Reserves
Sailors
Assume this plan uses a clustered sid index scan of
Reserves at a cost of 1102 and a file scan of Sailors at
a cost of 500, B = 256.
Plan cost is then
 M+N +2 ρSN – 2( B - ρSN/B ) =
 1102 + 500 +2*.5*500-2(256-.5*500/256) = 1590
PSU’s CS 587
39
14.4.3 Hash-Join [677]

Confusing because it combines two distinct ideas:
1. Partition R and S using a hash function
• You should understand how and why this works. Read the
text if you don’t.
2. Join corresponding partitions using a version of INL with
hash indexes built on-the-fly.
• You should understand why the hash function used here
must differ from the hash function used for partitioning.

The partition step could be followed by any join
method in step 2. Hashing is used because



11/6/2015
The tables are large (they needed partitioning) so NLJ is not
a contender.
The partitions were just created so they have no indexes on
them, so classic INL is not possible.
The partitions were just created so are not sorted, so SMJ is
not a big winner. There may be cases where a sort order is
needed later, but this is not considered.
PSU’s CS 587
40
Original
Relation
OUTPUT
1
Partitions
1
2
INPUT
2
hash
function
...
h
B-1
B-1
Disk
B main memory buffers
Partitions
of R & S
Disk
Join Result
hash
fn
Hash table for partition
Ri (k < B-1 pages)
h2
h2
Input buffer
for Si
11/6/2015
Disk
Output
buffer
B main memory buffers
PSU’s CS 587
Disk
41
Hash-Join Example: Partition
R
16,23,17,
86,84,21,
61
k Partitions of R
0 mod k
Input
1 mod k
S
k Partitions of S
19,17,84,
25, 22
11/6/2015
PSU’s CS 587
42
Hash-Join Example: INL
Partitions of R
Hashed
partition of R
16,86,84
Join Output
Output
Scan
partition
of S
16,86,84
23,17,21,61
Partitions of S
84,22
19,17,25,43
11/6/2015
PSU’s CS 587
43
Cost of Hash Join

We only need one of M,N to be  B2 !
 Let it be M
 Read R, write B partitions: 2M
• Each one is, on the average, at most B pages
• Some fudging going on here
 Read S, write B partitions: 2N
• We don’t care about the size of these partitions
 For each of the B partitions of R and S: M+N
• Build a hash table for the R-partition
• Read each partition of R, and matching partition of S: M+N

Total: 3(M+N)

Can we do better if M << B2? Hybridize!
11/6/2015
PSU’s CS 587
44
Hybrid Hash Join, M  2B, Partition
R
Partition 0 of R
Input Partition 0
Buffer Output Buffer
S
Partition 0 of S
Partition 1 Of R
11/6/2015
R1 ⋈ S1
PSU’s CS 587
45
Hybrid Hash Join, M  2B, INL
Partition 0 of R
Load R
Partition 0
R0 ⋈ S0
Output
Buffer
Partition 0 of S
Scan S
Partition 0
11/6/2015
PSU’s CS 587
46
Hybrid Hash Join, Partition
R
Partitions 0..k-1 of R
Input
Buffer
Partitions 0..k-1
Output Buffers
…
…
S
Partitions 0..k-1 of S
Partition k Of R
…
Rk ⋈ Sk
11/6/2015
PSU’s CS 587
47
Hybrid Hash Join, INL, Partition i
Partition i of R
Load R
Partition i
Ri ⋈ Si
Output
Buffer
Partition i of S
Scan S
Partition 0
11/6/2015
PSU’s CS 587
48
14. Op.Eval.
Fudging in our treatment of Hash-Join
If we build an in-memory hash table to speed up
the matching of tuples, a little more memory is
needed.
 If the hash function does not partition uniformly,
one or more R partitions may not fit in memory.

 In this case we can apply the hash-join technique
recursively to do the join of this R-partition with its
corresponding S-partition.
11/6/2015
PSU’s CS 587
49
Sort-Merge vs. Hash Join

When can we achieve the low cost of 3(M+N)?




Hash: Smaller of M, N  B2
Sort-merge: Both M, N  B2
So Hash Join is preferable if table sizes differ greatly.
But
 Sort-Merge Join is less sensitive to data skew.
 The result of Sort Merge Join is ordered.
11/6/2015
PSU’s CS 587
50
Hash Join comments
Can view first phase of Hash Join of R, S as
divide and conquer: partitioning the join into
subjoins that can be done with one input
smaller than memory.
 What to do if we don’t know the size of R?
Assume the case M  B, then split off more
and more of the hash table onto disk as it
grows. Used in Postgres.

11/6/2015
PSU’s CS 587
51
Equijoin Cost Summary
Nested Loops
M + MN
Index Nested Loops
M + M pR(1.2-4)(1+)
Sort-Merge, Hash
3(M+N)
Nested Loops best for small tables
 Others have high overhead costs

11/6/2015
PSU’s CS 587
52
14. Op.Eval.
14.4.4 General Join Conditions

Equalities over several attributes (e.g.,
R.sid=S.sid AND R.rname=S.sname):



11/6/2015
For Index NL, build index on <sid, sname> (if S is
inner); or use existing indexes on sid or sname.
For Sort-Merge and Hash Join, sort/partition on
combination of the two join columns.
Not much different than one-equality
equijoins.
PSU’s CS 587
53
Inequality Conditions

Typical example: R.rname < S.sname


Relatively rare. Does the example make sense?
For Index NL, need (clustered!) B+ tree index to
get any efficiency.
• Range probes on inner; # matches likely to be much
higher than for equality joins.


11/6/2015
Hash Join, Sort Merge Join not applicable.
With no index, Block NL is the best join method.
PSU’s CS 587
54
14. Op.Eval.
14.5 Set Operations

INTERSECTION:
 INTERSECTION is a special case of join.
• How is it a special case?
 So all the usual join algorithms apply

CROSS PRODUCT
 Also a special case of join
• How?
 What join algorithm is best to compute CROSS
PRODUCT?
• Will Sort-Merge or Hash help?
11/6/2015
PSU’s CS 587
55
UNION and EXCEPT
They are similar; we’ll do UNION.
 The hard part is removing duplicates.
 Sorting based approach to union:





Sort both relations (on a key).
Merge sorted relations, discarding duplicates.
Alternative: Merge runs from Pass 0 for both relations.
Hash based approach to union:


11/6/2015
Partition R and S using hash function h.
For each S-partition, build in-memory hash table, scan
corresponding R-partition and add tuples to table while
discarding duplicates.
PSU’s CS 587
56
14. Op.Eval.
14.6 Aggregates w/o grouping

Example
 SELECT AVE(S.age) FROM SAILORS S

In general, requires scanning the relation.


What is the cost for this query?
Given index whose search key includes all
attributes in the SELECT or WHERE clauses, can
do index-only scan.

11/6/2015
If there is an index on age, what is the cost of this
query?
PSU’s CS 587
57
Aggregates with grouping

14. Op.Eval.
Example
 SELECT S.rating, AVE(S.age) FROM SAILORS
S GROUP BY S.rating

What is an algorithm based on sorting?



What is the cost of this query, assuming 2-pass sort?
Optimizations? Cost?
What is an algorithm based on hashing?


How much memory is needed?
Cost?
Conclusion: Hash is best
 If there is an appropriate index, use index-only
alg.

11/6/2015
PSU’s CS 587
58
14.7 Impact of Buffering
14. Op.Eval.
If several operations are executing concurrently,
estimating the number of available buffer pages is
guesswork.
 Repeated access patterns interact with buffer
replacement policy.



e.g., Inner relation is scanned repeatedly in Simple
Nested Loop Join. With enough buffer pages to hold
inner, replacement policy does not matter. Otherwise,
MRU is best, LRU is worst (sequential flooding).
What replacement policy is best for
•
•
•
•
11/6/2015
Block Nested Loops?
Index Nested Loops?
Sort-Merge Join sort phase?
Sort-Merge Join merge phase?
PSU’s CS 587
59

14. Evaluating Relational Operators          Table Statistics Operators and example schema Selection Projection Equijoins General Joins Set Operators Buffering Intergalactic standard reference  G.

Transcript 14. Evaluating Relational Operators          Table Statistics Operators and example schema Selection Projection Equijoins General Joins Set Operators Buffering Intergalactic standard reference  G.

Directory