Transcript ppt
B+-Trees: Search
If there are n search-key values in the file,
the path is no longer than log f/2(n) (worst
case).
1
External Sort-Merge
Sorting phase:
Sorts nB pages at a time
nB = # of main memory pages buffer
creates nR = b/nB initial sorted runs on disk
b = # of file blocks (pages) to be sorted
Sorting Cost = read b blocks + write b blocks = 2 b
2
External Sort-Merge
Merging phase:
The sorted runs are merged during one or more
passes.
The degree of merging (dM) is the number of
runs that can be merged in each pass.
dM = Min (nB-1, nR)
nP = (logdM(nR))
nP: number of passes.
In each pass,
One buffer block is needed to hold one block
from each of the runs being merged, and
One block is needed for containing one block of
the merged result.
3
External Sort-Merge
Degree of merging (dM)
# of runs that can be merged together in each pass =
min (nB - 1, nR)
Number of passes nP = (logdM(nR))
In our example
dM = 4 (four-way merging)
min (nB-1, nR) = min(5-1, 205) = 4
Number of passes nP = (logdM(nR)) = (log4(205)) = 4
First pass:
– 205 initial sorted runs would be merged into 52 sorted runs
Second pass:
– 52 sorted runs would be merged into 13
Third pass:
– 13 sorted runs would be merged into 4
Fourth pass:
– 4 sorted runs would be merged into 1
4
External Sort-Merge
External Sort-Merge: Cost Analysis
Disk accesses for initial run creation (sort phase) as well as
in each merge pass is 2b
reads every block once and writes it out once
Initial # of runs is nR = b/nB and # of runs decreases by a
factor of nB - 1 in each merge pass, then the total # of merge
passes is np = logdM(nR)
In general, the cost performance of Merge-Sort is
Cost = sort cost + merge cost
Cost =
2b
+ 2b * np
Cost =
2b
+ 2b * logdM nR
=
2b (logdM(nR) + 1)
5
Catalog Information
Attribute
d:
# of distinct values of an attribute
sl
(selectivity):
the ratio of the # of records satisfying the condition to
the total # of records in the file.
s
(selection cardinality) = sl * r
average # of records that will satisfy an equality
condition on the attribute
For a key attribute:
d = r,
sl = 1/r,
s=1
For a nonkey attribute:
assuming that d distinct values are uniformly
distributed among the records
the estimated sl = 1/d,
s = r/d
6
Using Selectivity and Cost Estimates in
Query Optimization
Examples of Cost Functions for SELECT
S1. Linear search (brute force) approach
CS1a = b;
For an equality condition on a key, CS1a = (b/2) if the
record is found; otherwise CS1a = b.
S2. Binary search:
CS2 = log2b + (s/bfr) –1
For an equality condition on a unique (key) attribute,
CS2 =log2b
S3. Using a primary index (S3a) or hash key (S3b) to
retrieve a single record
CS3a = x + 1; CS3b = 1 for static or linear hashing;
CS3b = 1 for extendible hashing;
7
Using Selectivity and Cost Estimates in
Query Optimization
Examples of Cost Functions for SELECT (contd.)
S4. Using an ordering index to retrieve multiple records:
For the comparison condition on a key field with an
ordering index, CS4 = x + (b/2)
S5. Using a clustering index to retrieve multiple records:
CS5 = x + ┌ (s/bfr) ┐
S6. Using a secondary (B+-tree) index:
For an equality comparison, CS6a = x + s;
For an comparison condition such as >, <, >=, or <=,
CS6a = x + (bI1/2) + (r/2)
8
Using Selectivity and Cost Estimates in
Query Optimization
Examples of Cost Functions for SELECT (contd.)
S7. Conjunctive selection:
Use either S1 or one of the methods S2 to S6 to solve.
For the latter case, use one condition to retrieve the
records and then check in the memory buffer whether
each retrieved record satisfies the remaining conditions
in the conjunction.
S8. Conjunctive selection using a composite index:
Same as S3a, S5 or S6a, depending on the type of
index.
9