No Slide Title

Download Report

Transcript No Slide Title

Data Structures and
Algorithms for Information
Processing
Lecture 6:
Heaps, B-Trees, and B+Trees
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
1
Homework Policy
• Late homework will normally be
penalized 10% per day late;
• Each student may turn in one late
homework with no penalty (up to
one week late)
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
2
Grading
• Homeworks(4-5)
• Midterm Exam
• Final Exam
90-723: Data Structures
and Algorithms for
Information Processing
50%
25%
25%
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
3
Today’s Topics
• Ways to Balance Trees
– Heaps & Priority Queues
– B-Trees
• Time Analysis of Trees
– Binary trees
– Heaps
– B-Trees
• See Chapter 10 in Main
• B+ trees
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
4
Binary Trees: Worst Case
1
Inserting nodes that are already
sorted leads to worst-case
behavior: d = (n - 1) = 5
2
How can we use the idea of
balanced trees to avoid this
kind of situation?
3
4
5
6
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
5
Balanced Trees
4321
2 221
453 6
1 1 3 3 5
Trees are “no deeper than they
have to be”
Complete binary trees minimize
depth by forcing each row to be
full before d is increased
90-723: Data Structures
and Algorithms for
Information Processing
7
Heaps are complete binary trees
which limit the depth to a minimum
for any given n nodes, independently
of the order of insertion. Heaps are not
search trees.
Main’s slides on Heaps
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
6
B-Trees
• B-Trees are a type of search tree
• Further reduction in depth for a
given tree of n nodes
• Two adjustments:
– nodes have more than two children
– nodes hold more than a single
element
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
7
B-Trees
• Can be implemented as a set (no
duplicate elements) or as a bag
(duplicate elements allowed)
• This example focuses on the set
implementation
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
8
B-Trees
• Every B-Tree depends on a
positive constant, MINIMUM, which
determines how many elements
are held in a single node
• Rule 1: The root may have as few
as 0 or 1 elements; all other nodes
have at least MINIMUM elements
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
9
B-Trees
• Rule 2: The maximum number of
elements in a node is twice the
value of MINIMUM
• Rule 3: Elements in a node are
stored in a partially-filled array,
sorted from smallest (element 0)
to largest (final position used)
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
10
B-Trees
• Rule 4: The number of subtrees
below a non-leaf node is always
one more than the number of
elements in the node
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
11
B-Trees
• Rule 5: For any non-leaf node:
– The element at index I is greater than
all the elements in subtree number I
of the node
– An element at index I is less than all
the elements in subtree (I + 1) of the
node
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
12
B-Trees
93 and 107
Each element in
subtree 2 is greater
than 107.
Each element in
subtree 0 is less
than 93.
Subtree
Number 0
Subtree
Number 1
Subtree
Number 2
Each element in
subtree 1 is between
93 & 107.
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
13
B-Trees
• Rule 6: Every leaf in a B-Tree has
the same depth
• The implication is that B-Trees are
always balanced.
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
14
B-Tree Example
6
2 and 4
1
3
9
5
7 and 8
NOTE: Every child of the root node
is also a B-Tree!
90-723: Data Structures
and Algorithms for
Information Processing
10
MINIMUM = 1
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
15
Set ADT with B-Trees
public class IntBalancedSet {
// constants
private static final MINIMUM = 200;
private static final MAXIMUM = 2 * MINIMUM;
// info about root node
int dataCount;
int[] data = new int[MAXIMUM + 1];
int childCount;
// info about children
IntBalancedSet[] subset =
new IntBalancedSet [MAXIMUM+2];
…}
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
16
MINIMUM = 1
6
MAXIMUM = 2
2 and 4
1
9
3
5
7 and 8
dataCount
1
data
childCount
2
subset
10
6
?
?
null null
[References to IntBalancedSet instances]
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
17
Invariant for Set B-Tree
• The elements of the set are stored
in a B-Tree, satisfying the 6 rules
• The number of elements in the
root is stored in the instance
variable dataCount, and the
number of subtrees is stored in the
instance variable childCount.
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
18
Invariant for Set B-Tree
• The root’s elements are stored in
data[0] through
data[dataCount - 1] .
• If the root has subtrees, then
subset[0] through
subset[childCount - 1] are
references to those subtrees.
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
19
Searching a B-Tree
• Sets use the method contains to
find if an element is in the set:
– Set I equal to the first index I where
data[I]>=target;
otherwise I = dataCount
– If data[I] == target, return true;
else if (no children) return false;
else
return subset[I].contains(target);
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
20
Sample Search
contains(7);
6
7 > 6, so
I = dataCount = 1
2 and 4
1
3
9
5
7 and 8
Subset[1].contains(7);
10
9>=7, so
I = 0; data[I] != 7
Subset[0].contains(7);
7>=7, so
I = 0; data[I] = 7!
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
21
Add/Remove from B-Tree
• Complex two-pass operations
• pp. 500-512
• Covered on next slide set for 2-3
trees
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
22
Trees, Logs, Time Analysis
• Heaps and B-Trees are efficient
because d is kept small
• How can we relate the depth of a
tree and the worst-case time
required to search, add, and
remove an element?
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
23
Trees, Logs, Time Analysis
• The worst case time performance
for the following operations are all
O(d):
– Adding an element to a binary search
tree, heap, or B-Tree
– Removing an element from a binary
search tree, heap or B-Tree
– Searching for a specified element in a
binary search tree or B-Tree
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
24
Trees, Logs, Time Analysis
• How can we relate the depth d to
the number of elements n?
• Example: binary trees
– d is no more than n - 1
– O(d) is therefore O(n - 1) = O(n)
(remember, we can ignore constants)
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
25
Time Analysis for Heaps
• Heaps
– Level
0
1
2
3
…
d
90-723: Data Structures
and Algorithms for
Information Processing
Nodes to Fill
1
2
4
8
…
2^d
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
26
Time Analysis for Heaps
• Minimum nodes to reach depth d
in a heap:
(1 2  4  ...  2
d 1
) 1  2
d
• The number of nodes in a heap is
d
2
at least
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
27
Review Base-2 Logarithms
• For any positive number x, the
base 2 logarithm of x is an
exponent r such that:
2
r
90-723: Data Structures
and Algorithms for
Information Processing
 x
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
28
Review Base-2 Logarithms
20  1
log 2 1  0
21  2
log 2 2  1
22  4
log 2 4  2
...
2d  2d
log 2 2 d  d
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
29
Worst-Case For Heaps
• In a heap the number of elements
n is at least 2^d
log2 n  log2 2
d
log2 2  d
d
log2 n  d
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
30
Worse-Case For Heaps
• Adding or removing an element in
a heap with n elements is O(d)
where d is the depth of the tree.
Because d is no more than log2(n),
the operations are O(log2(n)),
which is O(log(n)).
• (see discussion p. 516-520)
90-723: Data Structures
and Algorithms for
Information Processing
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
31
Many Databases use B+
Trees
90-723: Data Structures
and Algorithms for
Information Processing
From Wikipedia
Lecture 6: Heaps & B-Trees
Copyright © 1999, Carnegie Mellon. All Rights Reserved.
32