B-Trees, Part 2 Hash-Based Indexes R&G Chapter 10 Lecture 10 Administrivia • The new Homework 3 now available – Due 1 week from Sunday – Homework.

Transcript B-Trees, Part 2 Hash-Based Indexes R&G Chapter 10 Lecture 10 Administrivia • The new Homework 3 now available – Due 1 week from Sunday – Homework.

B-Trees, Part 2
Hash-Based Indexes
R&G Chapter 10
Lecture 10
Administrivia
• The new Homework 3 now available
– Due 1 week from Sunday
– Homework 4 available the week after
• Midterm exams available here
Review
• Last time discussed File Organization
– Unordered heap files
– Sorted Files
– Clustered Trees
– Unclustered Trees
– Unclustered Hash Tables
• Indexes
– B-Trees – dynamic, good for changing data, range
queries
– Hash tables – fastest for equality queries, useless
for range queries
Review (2)
• For any index, 3 alternatives for data entries k*:
– Data record with key value k
– <k, rid of data record with search key value k>
– <k, list of rids of data records with search key k>
– Choice orthogonal to the indexing technique
Today:
• Indexes
– Composite Keys, Index-Only Plans
• B-Trees
– details of insertion and deletion
• Hash Indexes
– How to implement with changing data sets
Inserting a Data Entry into a B+ Tree
• Find correct leaf L.
• Put data entry onto L.
– If L has enough space, done!
– Else, must split L (into L and a new node L2)
• Redistribute entries evenly, copy up middle key.
• Insert index entry pointing to L2 into parent of L.
• This can happen recursively
– To split index node, redistribute entries evenly, but
push up middle key. (Contrast with leaf splits.)
• Splits “grow” tree; root split increases height.
– Tree growth: gets wider or one level taller at top.
Indexes with Composite Search Keys
• Composite Search Keys: Search
on a combination of fields.
– Equality query: Every field value is
equal to a constant value. E.g. wrt
<sal,age> index:
• age=20 and sal =75
–
Range query: Some field value is
not a constant. E.g.:
• age =20; or age=20 and sal > 10
• Data entries in index sorted by
search key to support range
queries.
– Lexicographic order, or
– Spatial order.
Examples of composite key
indexes using lexicographic order.
11,80
11
12,10
12
12,20
13,75
<age, sal>
10,12
20,12
75,13
name age sal
bob 12
10
cal 11
80
joe 12
20
sue 13
75
12
13
<age>
10
Data records
sorted by name
80,11
<sal, age>
Data entries in index
sorted by <sal,age>
20
75
80
<sal>
Data entries
sorted by <sal>
Composite Search Keys
• To retrieve Emp records with age=30 AND sal=4000,
an index on <age,sal> would be better than an index
on age or an index on sal.
– Choice of index key orthogonal to clustering etc.
• If condition is: 20<age<30 AND 3000<sal<5000:
– Clustered tree index on <age,sal> or <sal,age> is best.
• If condition is: age=30 AND 3000<sal<5000:
– Clustered <age,sal> index much better than <sal,age>
index!
• Composite indexes are larger, updated more often.
Index-Only Plans
<E.dno>
SELECT D.mgr
FROM Dept D, Emp E
WHERE D.dno=E.dno
• A number of queries <E.dno,E.eid> SELECT D.mgr, E.eid
FROM Dept D, Emp E
can be answered
Tree index!
WHERE D.dno=E.dno
without retrieving
any tuples from one
SELECT E.dno, COUNT(*)
<E.dno> FROM Emp E
or more of the
relations involved if
GROUP BY E.dno
a suitable index is
SELECT E.dno, MIN(E.sal)
<E.dno,E.sal> FROM Emp E
available.
Tree index! GROUP BY E.dno
<E. age,E.sal> SELECT AVG(E.sal)
or
FROM Emp E
<E.sal, E.age> WHERE E.age=25 AND
Tree! E.sal BETWEEN 3000 AND 5000
Index-Only Plans (Contd.)
• Index-only plans
are possible if the
key is <dno,age>
or we have a tree
index with key
<age,dno>
– Which is better?
– What if we
consider the
second query?
SELECT E.dno, COUNT (*)
FROM Emp E
WHERE E.age=30
GROUP BY E.dno
SELECT E.dno, COUNT (*)
FROM Emp E
WHERE E.age>30
GROUP BY E.dno
B-Trees: Insertion and Deletion
• Insertion
– Find leaf where new record belongs
– If leaf is full, redistribute*
– If siblings too full, split, copy middle key up
– If index too full, redistribute*
– If index siblings full, split, push middle key up
• Deletion
– Find leaf where record exists, remove it
– If leaf is less than 50% empty, redistribute*
– If siblings too empty, merge, remove key above
– If index node above too empty, redistribute*
– If index siblings too empty, merge, move above key down
B-Trees: For Homework 2
• Insertion
– Find leaf where new record belongs
– If leaf is full, redistribute*
– If siblings too full, split, copy middle key up
– If index too full, redistribute*
– If index siblings full, split, push middle key up
• Deletion
– Find leaf where record exists, remove it
– If leaf is less than 50% empty, redistribute*
– If siblings too empty, merge, remove key above
– If index node above too empty, redistribute*
– If index siblings too empty, merge, move above key down
This means that after deletion, nodes will often be < 50% full
B-Trees: For Homework 2 (cont)
• Splits
– When splitting nodes, choose the middle key
– If there are even number of keys, choose middle
• Your code must Handle Duplicate Keys
– We promise that there will never be more than 1
page of duplicate values.
– Thus, when splitting, if the middle key is identical
to the key to the left, you must find the closest
splittable key to the middle.
B-Tree Demo
Hashing
• Static and dynamic hashing techniques exist;
trade-offs based on data change over time
• Static Hashing
– Good if data never changes
• Extendable Hashing
– Uses directory to handle changing data
• Linear Hashing
– Avoids directory, usually faster
Static Hashing
• # primary pages fixed, allocated sequentially,
never de-allocated; overflow pages if needed.
• h(k) mod N = bucket to which data entry with
key k belongs. (N = # of buckets)
h(key) mod N
key
0
2
h
N-1
Primary bucket pages
Overflow pages
Static Hashing (Contd.)
• Buckets contain data entries.
• Hash fn works on search key field of record r. Must
distribute values over range 0 ... N-1.
– h(key) = (a * key + b) usually works well.
– a and b are constants; lots known about how to tune h.
• Long overflow chains can develop and degrade
performance.
– Extendible and Linear Hashing: Dynamic techniques to
fix this problem.
Extendible Hashing
• Situation: Bucket (primary page) becomes full.
• Why not re-organize file by doubling # of buckets?
– Reading and writing all pages is expensive!
•
Idea: Use directory of pointers to buckets,
– double # of buckets by doubling the directory,
–
–
–
–
splitting just the bucket that overflowed!
Directory much smaller than file, doubling much cheaper.
No overflow pages!
Trick lies in how hash function is adjusted!
Extendible Hashing Details
•
Need directory with pointer to each bucket
•
Need hash function to incrementally double range
– can just use increasingly more LSBs of h(key)
•
Must keep track of global “depth”
– how many times directory doubled so far
•
Must keep track of local “depth”
– how many time each bucket has been split
LOCAL DEPTH
GLOBAL DEPTH
Example
• Directory is array of size 4.
• To find bucket for r, take last
`global depth’ # bits of h(r);
we denote r by h(r).
–
If h(r) = 5 = binary 101,
it is in bucket pointed to
by 01.
2
00
2
4* 12* 32* 16*
Bucket A
2
1*
5* 21* 13*
Bucket B
01
10
2
11
10*
DIRECTORY
Bucket C
2
15* 7* 19*
Bucket D
DATA PAGES

Insert: If bucket is full, split it (allocate new page, re-distribute).

If necessary, double the directory. (As we will see, splitting a
bucket does not always require doubling; we can tell by
comparing global depth with local depth for the split bucket.)
Insert h(r)=20 (Causes Doubling)
LOCAL DEPTH
GLOBAL DEPTH
2
00
2
4* 12* 32*16* Bucket A
3
32* 16* Bucket A
GLOBAL DEPTH
2
3
1* 5* 21*13* Bucket B
01
000
2
1* 5* 21* 13* Bucket B
001
10
2
11
10*
Bucket C
15* 7* 19*
010
2
011
10*
Bucket C
100
2
DIRECTORY
LOCAL DEPTH
Bucket D
101
2
110
15* 7* 19*
Bucket D
111
3
DIRECTORY
4* 12* 20*
Bucket A2
(`split image'
of Bucket A)
Now Insert h(r)=9 (Causes Split Only)
LOCAL DEPTH
3
32* 16* Bucket A
GLOBAL DEPTH
3
000
LOCAL DEPTH
3
001
000
3
1* 9*
Bucket B
001
010
2
011
10*
Bucket C
100
010
2
011
10*
Bucket C
100
101
2
110
15* 7* 19*
Bucket D
111
101
2
110
15* 7* 19*
Bucket D
111
3
DIRECTORY
32* 16* Bucket A
GLOBAL DEPTH
2
1* 5* 21* 13* Bucket B
3
4* 12* 20*
3
Bucket A2
(`split image'
of Bucket A)
4* 12* 20*
Bucket A2
DIRECTORY
3
5* 21* 13*
Bucket B2
Points to Note
• 20 = binary 10100. Last 2 bits (00) tell us r belongs
in A or A2. Last 3 bits needed to tell which.
– Global depth of directory: Max # of bits needed to tell
which bucket an entry belongs to.
– Local depth of a bucket: # of bits used to determine if
splitting bucket will also double directory
• When does bucket split cause directory doubling?
– If, before insert, local depth of bucket = global depth.
•
•
•
Insert causes local depth to become > global depth;
directory is doubled by copying it over and `fixing’ pointer to split
image page.
(Use of least significant bits enables efficient doubling via copying
of directory!)
Directory Doubling
Why use least significant bits in directory?
 Allows for doubling via copying!
2
2
3
4* 12* 32* 16*
4* 12* 32* 16*
000
2
00
2
1*
5* 21* 13*
001
2
010
1*
5* 21* 13*
011
01
10
2
100
11
10*
101
2
10*
110
2
15* 7* 19*
111
2
15* 7* 19*
Deletion
• Delete: If removal of data entry makes bucket empty,
can be merged with `split image’. If each directory
element points to same bucket as its split image, can
halve directory.
2
00
2
1
4* 12* 32* 16*
4* 12* 32* 16*
2
1*
01
2
Delete 10
5* 21* 13*
00
2
1*
01
10
2
10
11
10*
11
2
2
15*
15*
5* 21* 13*
Deletion (cont)
• Delete: If removal of data entry makes bucket empty,
can be merged with `split image’. If each directory
element points to same bucket as its split image, can
halve directory.
1
2
4* 12* 32* 16*
00
1
4* 12* 32* 16*
Delete 15
01
1
10
2
00
2
1*
1*
11
5* 21* 13*
5* 21* 13*
01
10
1
11
0
2
1
4* 12* 32* 16*
1
1
15*
1*
5* 21* 13*
Comments on Extendible Hashing
• If directory fits in memory, equality search
answered with one disk access; else two.
– 100MB file, 100 bytes/rec, 4K pages contains 1,000,000
records (as data entries) and 25,000 directory elements;
chances are high that directory will fit in memory.
– Directory grows in spurts, and, if the distribution of hash
values is skewed, directory can grow large.
•
Biggest problem:
– Multiple entries with same hash value cause problems!
– If bucket already full of same hash value, will keep
doubling forever! So must use overflow buckets if dups.
Linear Hashing
• This is another dynamic hashing scheme, an
alternative to Extendible Hashing.
• LH handles the problem of long overflow chains
without using a directory, and handles duplicates.
• Idea: Use a family of hash functions h0, h1, h2, ...
– hi(key) = h(key) mod(2iN); N = initial # buckets
– h is some hash function (range is not 0 to N-1)
– If N = 2d0, for some d0, hi consists of applying h and
looking at the last di bits, where di = d0 + i.
– hi+1 doubles the range of hi (similar to directory doubling)
Linear Hashing Example
• Let’s start with N = 4 Buckets
– Start at “round” 0, “next” 0, have 2round buckets
– Each time any bucket fills, split “next” bucket
Start
Next
Add 9
4* 12* 32* 16*
1*
5* 21* 13*
32* 16*
Next
10*
Nround
15*
1*
5* 21* 13*
Add 20
9*
32* 16*
10*
Nround
15*
Next
4* 12*
1*
5* 21* 13*
10*
Nround
15*
4* 12* 20*
– If (O ≤ hround(key) < Next), use hround+1(key) instead
9*
Linear Hashing Example (cont)
• Overflow chains do exist, but eventually get split
• Instead of doubling, new buckets added one-at-a-time
Add 6
Add 17
32* 16*
Next
Nround
1*
5* 21* 13*
32* 16*
9*
1* 13*
10* 6*
Next
10* 6*
15*
Nround
15*
4* 12* 20*
4* 12* 20*
5* 21*
9*
Linear Hashing (Contd.)
• Directory avoided in LH by using overflow pages, and
choosing bucket to split round-robin.
– Splitting proceeds in `rounds’. Round ends when all NR
initial (for round R) buckets are split. Buckets 0 to Next1 have been split; Next to NR yet to be split.
–
Current round number also called Level.
–
Search: To find bucket for data entry r, find hround(r):
• If hround(r) in range `Next to NR’ , r belongs here.
• Else, r could be hround(r) or hround(r) + NR;
– must apply hround+1(r) to find out.
Overview of LH File
• In the middle of a round.
Bucket to be split
Next
Buckets that existed at the
beginning of this round:
this is the range of
Buckets split in this round:
If h Level ( search key value )
is in this range, must use
h Level+1 ( search key value )
to decide if entry is in
`split image' bucket.
hLevel
`split image' buckets:
created (through splitting
of other buckets) in this round
Linear Hashing (Contd.)
• Insert: Find bucket by applying hround / hround+1:
– If bucket to insert into is full:
• Add overflow page and insert data entry.
• (Maybe) Split Next bucket and increment Next.
• Can choose any criterion to `trigger’ split.
• Since buckets are split round-robin, long overflow
chains don’t develop!
• Doubling of directory in Extendible Hashing is
similar; switching of hash functions is implicit in how
the # of bits examined is increased.
Another Example of Linear Hashing
• On split, hLevel+1 is used to
re-distribute entries.
Round=0, N=4
h
h
1
0
000
00
001
01
010
10
011
11
(This info
is for illustration
only!)
Round=0
PRIMARY
Next=0 PAGES
h
32*44* 36*
9* 25* 5*
14* 18*10*30*
Data entry r
with h(r)=5
Primary
bucket page
31*35* 7* 11*
(The actual contents
of the linear hashed
file)
h
PRIMARY
PAGES
1
0
000
00
001
Next=1
9* 25* 5*
01
010
10
011
11
100
00
OVERFLOW
PAGES
32*
14* 18*10*30*
31*35* 7* 11*
44* 36*
43*
Example: End of a Round
h1
Round=0
PRIMARY
PAGES
h0
000
00
001
010
01
10
OVERFLOW
PAGES
Round=1
PRIMARY
PAGES
Next=0
h1
h0
000
00
32*
001
01
9* 25*
010
10
66* 18* 10* 34*
011
11
43* 35* 11*
100
00
44* 36*
101
11
5* 37* 29*
OVERFLOW
PAGES
32*
9* 25*
66*18* 10* 34*
Next=3
31*35* 7* 11*
43*
011
11
100
00
44*36*
101
01
5* 37*29*
110
10
14* 30* 22*
110
10
14*30*22*
111
11
31*7*
50*
LH Described as a Variant of EH
• The two schemes are actually quite similar:
– Begin with an EH index where directory has N elements.
– Use overflow pages, split buckets round-robin.
– First split is at bucket 0. (Imagine directory being doubled
at this point.) But elements <1,N+1>, <2,N+2>, ... are
the same. So, need only create directory element N, which
differs from 0, now.
• When bucket 1 splits, create directory element N+1, etc.
• So, directory can double gradually. Also, primary
bucket pages are created in order. If they are
allocated in sequence too (so that finding i’th is easy),
we actually don’t need a directory! Voila, LH.
Summary
• Hash-based indexes: best for equality searches,
cannot support range searches.
• Static Hashing can lead to long overflow chains.
• Extendible Hashing avoids overflow pages by splitting
a full bucket when a new data entry is to be added to
it. (Duplicates may require overflow pages.)
– Directory to keep track of buckets, doubles periodically.
– Can get large with skewed data; additional I/O if this does
not fit in main memory.
Summary (Contd.)
• Linear Hashing avoids directory by splitting buckets
round-robin, and using overflow pages.
– Overflow pages not likely to be long.
– Duplicates handled easily.
– Space utilization could be lower than Extendible Hashing,
since splits not concentrated on `dense’ data areas.
• Can tune criterion for triggering splits to trade-off
slightly longer chains for better space utilization.
• For hash-based indexes, a skewed data distribution is
one in which the hash values of data entries are not
uniformly distributed!