Nested-Loop joins

Download Report

Transcript Nested-Loop joins



“one-and-a-half” pass method, since one
relation will be read just once.
Tuple-Based Nested-loop Join Algorithm:
FOR each tuple s in S DO
FOR each tuple r in R DO
IF r and s join to make a tuple t THEN
output t
Improvement to Take Advantage of Disk I/O Model
 Instead of retrieving tuples of R, T(S) times, load
memory with as many tuples of S as can fit, and
match tuples of R against all Stuples in memory.


Assume B(S) ≤ B(R), and B(S) > M
Read M-1 blocks of S into main memory and
compare to all of R, block by block
FOR each chunk of M-1 blocks of S DO
FOR each block b of R DO
FOR each tuple t of b DO
find the tuples of S in memory that join with t
output the join of t with each of these tuples

Number of disk I/O’s:
[B(S)/(M-1)]*(M-1 +B(R))
or
B(S) + [B(S)B(R)/(M-1)]
or approximately B(S)*B(R)/M

This special case of multi-pass algorithms is
sufficient for most of the relation sizes.
Main idea for unary operations on R
 Suppose B(R)  M (main memory size in blocks)

First pass:

Second pass: create final result
– Read M blocks of R into MM
– Sort the content of MM
– Write the sorted result (sublist/run) into M blocks on
disk.



In the second phase (merging) we don’t sort
but copy each tuple just once.
We can do that because the identical tuples
will appear “at the same time,” i.e. they will be
all the first ones at the buffers (for the sorted
sublists).
As usual, if one buffer gets empty we refill it.
•
•

•
•
•
•
•
•
For R(X,Y) ►◄S(Y,Z) with M buffers of memory:
Sort R on Y, sort S on Y
Merge phase
Use 2 input buffers: 1 for R, 1 for S.
Pick tuple t with smallest Y value in the buffer for R (or for S)
If t doesn’t match with the first tuple in the buffer for S, then just
remove t.
Otherwise, read all the tuples from R with the same Y value as t
and put them in the M-2 part of the memory.
When the input buffer for R is exhausted fill it again and again.
Then, read the tuples of S that match. For each one we produce
the join of it with all the tuples of R in the M-2 part of the memory.






Suppose we are not worried about many common
Y values
Create Y-sorted sublists of R and S
Bring first block of each sublist into a buffer
(assuming we have at most M sublists)
Find smallest Y-value from heads of buffers. Join
with other tuples in heads of buffers, use other
possible buffers, if there are “many” tuples with
the same Y values.
Disk I/O: 3*(B(R) + B(S))
Requirement: B(R) + B(S) ≤ M2
Operators
Approx. M required
Disk I/O
, 
Sqrt(B)
3B
, , -
Sqrt(B(R) + B(S))
3(B(R)+B(S))

Sqrt(max(B(R)+B(S)))
5(B(R)+B(S))

Sqrt(max(B(R)+B(S)))
3(B(R)+B(S))