B-tree and B+ tree

Download Report

Transcript B-tree and B+ tree

B-Tree
B-Trees
 a specialized multi-way tree designed especially for use
on disk
 In a B-tree each node may contain a large number of
keys. The number of subtrees of each node, then, may
also be large
 A B-tree is designed to branch out in this large number
of directions and to contain a lot of keys in each node
so that the height of the tree is relatively small
Definitions
 A B-tree of order m (the maximum number of children for each node) is
a tree which satisfies the following properties:
1. Every node has at most m children.
2. Every node (except root and leaves) has at least ceil(m⁄2) children.
3. The root has at least two children if it is not a leaf node.
4. All leaves appear in the same level, and carry information.
5. A non-leaf node with k children contains k–1 key
6. Each leaf node (other than the root node if it is a leaf) must contain at
least ceil(m / 2) - 1 keys
7. Keys and subtrees are arranged in the fashion of search tree
Example--A B-tree of order 5
B-Tree -- Search
 Search is performed in the typical manner, analogous to
that in a binary search tree. Starting at the root, the
tree is traversed top to bottom, choosing the child
pointer whose separation values are on either side of
the value that is being searched.
 Binary search is typically (but not necessarily) used
within nodes to find the separation values and child tree
of interest.
B-Tree Insertion
 When inserting an item, first do a search for it in the B-tree. If the item
is not already in the B-tree, this unsuccessful search will end at a leaf.

If there is room in this leaf, just insert the new item here. Note that
this may require that some existing keys be moved one to the right to
make room for the new item.
 If instead this leaf node is full so that there is no room to add the new
item, then the node must be "split" with about half of the keys going into
a new node to the right of this one. The median (middle) key is moved
up into the parent node. (Of course, if that node has no room, then it
may have to be split as well.) Note that when adding to an internal
node, not only might we have to move some keys one position to the
right, but the associated pointers have to be moved right as well.
 If the root node is ever split, the median key moves up into a new root
node, thus causing the tree to increase in height by one.
Insertion Example
 Insert the following letters into what is originally an
empty B-tree of order 5: C N G A H E K Q M F W L T Z D
PRXYS
 Order 5 means that a node can have a maximum of 5
children and 4 keys. All nodes other than the root must
have a minimum of 2 keys.
Insertion Example -- continued
 The first 4 letters get inserted into the same node,
resulting in this picture:
 Insert H (no room in above node, split it into 2 nodes,
move median G up into a new root node)
Insertion Example -- continued
 Insert E, K, and Q
 Insert M (split the node, M is median, move up)
Insertion Example -- continued
 Insert F, W, L and T
 Insert Z (Split, move median T up)
Insertion Example -- continued
 Insert D (Split, move median D up), then insert P, R, X, Y
 Insert S (Split, move median Q up, Split, move median M
up)
B-Tree Deletion
 locate and delete the item, then restructure the tree to
regain its invariants
 There are two special cases to consider when deleting
an element:
1. the element in an internal node may be a separator
for its child nodes
2. deleting an element may put it under the minimum
number of elements and children
B-Tree Deletion
 Search for the value to delete
 If the value is in an internal node, choose a new
separator (either the largest element in the left subtree
or the smallest element in the right subtree), remove it
from the leaf node it is in, and replace the element to
be deleted with the new separator (for the leaf node
with an element deleted, same as case below)
 If the value is in a leaf node, it can simply be deleted
from the node, perhaps leaving the node with too few
elements; so some additional changes to the tree will
be required
B-Tree Deletion
Additional changes -- Rebalancing after deletion
 If the right sibling has more than the minimum number of
elements
 Borrow one, adjust the separator
 Otherwise, if the left sibling has more than the minimum
number of elements
 Borrow one, adjust the separator
 If both immediate siblings have only the minimum number of
elements
* Create
a new node with all the elements from the deficient node, all the
elements from one of its siblings, and the separator in the parent
between the two combined sibling nodes.
* Remove the separator from the parent, and replace the two children it
separated with the combined node.
* If that brings the number of elements in the parent under the minimum,
repeat these steps with that deficient node, unless it is the root, since
the root may be deficient.
B-Tree Deletion Example
Delete H
Deletion Example -- Continued
 Delete T (internal node, select the smallest element
from the right subtree to replace T)
Deletion Example -- Continued
 Delete R (leaf node, need rebalance after the deletion:
 Borrow a key from right sibling, adjust separator: move W
down, combine with S, move X up to the parent
Deletion Example -- Continued
 Delete E (leaf node, need rebalance after deletion)
 Left and right sibling has only minimum keys,
 Create a new node: combine with left sibling, the
separator from the parent, and the deficient node
Deletion Example -- Continued
 Continue rebalance
 The sibling has only minimum keys
 Create a new node: combine the deficient node with the
separator from the parent, and the right sibling
2-3 B-Trees or simply referred as 2-3 tree
Properties
• trinary tree - 3 or fewer children per node
• each node is either a 2-node or 3-node (subtree count)
• 2-nodes contain 1 value and 3-nodes contain 2 sorted
• BST property holds for node content & left, mid, right subtrees
• all leaves have same level
B-Tree
 A B-tree is kept balanced by requiring that all external nodes are
at the same depth. This depth will increase slowly as elements
are added to the tree, but an increase in the overall depth is
infrequent, and results in all leaf nodes being one more node
further away from the root.
 B-trees have substantial advantages over alternative
implementations when node access times far exceed access
times within nodes. This usually occurs when most nodes are in
secondary storage such as hard drives. By maximizing the
number of child nodes within each internal node, the height of
the tree decreases, balancing occurs less often, and efficiency
increases. Usually this value is set such that each node takes up
a full disk block or an analogous size in secondary storage.
 2-3 B-trees: useful in main memory
2-3 Tree Implementation
public class TwoThreeTree<Content> {
private boolean is2node;
private Content smallContent;
private Content bigContent;
private TwoThreeTree<Content> left;
private TwoThreeTree<Content> mid;
private TwoThreeTree<Content> right;
private TwoThreeTree<Content> parent;
...
}
B+-Tree
Ways to improve a B-tree
• keep all values in the leaves
• form a linked list of leaf nodes
How do these modifications change the performance of
...a search?
...an insertion or removal?
B+ Tree
 The B+ tree is a variant of the B-tree, all records are
stored at the leaf level of the tree; only keys are stored
in interior nodes.
 B-tree can store both keys and records in its interior
nodes; in this sense, the B+ tree is a specialization of
the B-tree.