Optimal Binary Search Tree
Download
Report
Transcript Optimal Binary Search Tree
Optimal Binary Search Tree
• We now want to focus on the construction
of binary search trees for a static set of
identifiers. And only searches are
performed.
• To find an optimal binary search tree for a
given static file, a cost measure must be
determined for search trees.
• It’s reasonable to use the level number of
a node as the cost.
Binary Search Tree
Example
for
for
while
do
return
if
4 comparisons
in worst case
return
do
if
3 comparisons
in worst case
while
Extended Binary Tree
Example
for
for
while
do
return
return
do
if
while
if
(b)
(a)
External Path Length and
Internal Path Length
• External path length of a binary tree is
the sum over all external nodes of the
lengths of the paths from the root to
those nodes.
• Internal path length is the sum over all
internal nodes of the lengths of the paths
from the root to those nodes.
• Let the internal path length be I and
external path length E, then the binary
tree of (a) has I = 0+1+1+2+3 = 7, E =
2+2+4+4+3+2 = 17.
External Path Length and
Internal Path Length (Cont.)
• It can be shown that E = I + 2n.
• Binary trees with maximum E also have
maximum I.
• For all binary ntrees
with n internal nodes,
1
– maximum I = i n(n 1) / 2
(skew tree)
i 0
– minimum I = log 2 i
1i n
(complete binary tree)
Binary Search Tree
Containing A Symbol Table
• Let’s look at the problem of representing a
symbol table as a binary search tree. If a binary
search tree contains the identifiers a1, a2, …, an
with a1 < a2 < … < an, and the probability of
searching for each ai is pi, then the total cost of
any binary search tree is
p * level(a )
1i n
i
i
when only successful searches are made.
Binary Search Tree
Containing A Symbol Table
• For unsuccessful searches, let’s partitioned the identifiers
not in the binary search tree into n+1 classes Ei, 0 ≤ i ≤ n. If qi
is the probability that the identifier being sought is in Ei,
then the cost of the failure node is
q (level(fail ure node i) - 1)
i
• Therefore, the total cost of a binary search tree is
p level( a ) q (level(fail ure node i) - 1)
1i n
i
i
0i n
i
• An optimal binary search tree for the identifier set a1, …, an
is one that minimize the above equation over all possible
binary search trees for this identifier set. Since all searches
must terminate either successfully or unsuccessfully, we
have
pi qi 1
1i n
0i n
Binary Search Tree With
Three Identifiers Example
do
if
while
if
do
if
while
while
do
(a)
while
(b)
(c)
do
do
while
if
(d)
if
(e)
Cost of Binary Search Tree
In The Example
• With equal probabilities, pi = qj = 1/7 for all i and j, we have
cost(tree a) = 15/7; cost(tree b) = 13/7
cost(tree c) = 15/7; cost(tree d) = 15/7
cost(tree e) = 15/7
Tree b is optimal.
• With p1=0.5, p2=0.1, p3=0.05, q0=0.15, q1=0.1, q2=0.05, and
q3=0.05 we have
cost(tree a) = 2.65; cost(tree b) = 1.9
cost(tree c) = 1.5; cost(tree d) = 2.05
cost(tree e) = 1.6
Tree c is optimal.
Determine Optimal Binary
Search Tree
• So to determine which is the optimal binary search, it is
not practical to follow the above brute force approach
since the complexity is O(n4n/n3/2).
• Now let’s take another approach. Let Tij denote an optimal
binary search tree for ai+1, …, aj, i<j. Let cij be the cost of
the search tree Tij. Let rij be the root of Tij and let wij be
the weight of Tij, where
j
wij qi
(q
k i 1
k
pk )
• Therefore, by definition rii=0, wii=qi, 0 ≤ i ≤ n. T0n is an
optimal binary search tree for a1, …, an. Its cost function
is c0n, it weight w0n, and it root is r0n.
Determine Optimal Binary
Search Tree (Cont.)
• If Tij is an optimal binary search tree for ai+1, …,
aj, and rij =k, then i< k <j. Tij has two subtrees L
and R. L contains ai+1, …, ak-1, and R contains ak+1, …,
aj. So the cost cij of Tij is
cij = pk + cost(L) + cost(R) + weight(L) + weight(R)
cij = pk + ci,k-1+ ckj + wi,k-1+ wkj
= wij + ci,k-1+ ckj
• Since Tij is optimal, we have
wij + ci,k-1 + ckj = min{ wij ci ,l 1 cij }
i l j
ci ,k 1 ckj min {ci ,l 1 clj }
i l j
Example 10.2
• Let n=4, (a1, a2, a3, a4) = (do, if return, while). Let (p1, p2,
p3, p4)=(3,3,1,1) and (q0, q1, q2, q3, q4)=(2,3,1,1,1). wii = qii,
cii=0, and rii=0, 0 ≤ i ≤ 4.
w01 = p1 + w00 + w11 = p1 +q1 +w00 = 8
c01 = w01 + min{c00 +c11} = 8
r01 = 1
w12 = p2 + w11 + w22 = p2 +q2 +w11 = 7
c12 = w12 + min{c11 +c22} = 7
r12 = 2
w23 = p3 + w22 + w33 = p3 +q3 +w22 = 3
c23 = w23 + min{c22 +c33} = 3
r23 = 3
w34 = p4 + w33 + w44 = p4 +q4 +w33 = 3
c34 = w34 + min{c33 +c44} = 3
r34 = 4
Example 10.2 Computation
0
1
2
3
4
w44=1
c44=0
r44=0
0
w00=2
c00=0
r00=0
w11=3
c11=0
r11=0
w22=1
c22=0
r22=0
w33=1
c33=0
r33=0
1
w01=8
c01=8
r01=1
w12=7
c12=7
r12=2
w23=3
c23=3
r23=3
w34=3
c34=3
r34=4
2
w02=12
c02=19
r02=1
w13=9
c13=12
r13=2
w24=5
c24=8
r24=3
3
w03=14
c03=25
r03=2
w14=11
c14=19
r14=2
4
w04=16
c04=32
r04=2
Computation Complexity of
Optimal Binary Search Tree
• To evaluate the optimal binary tree we need to compute
cij for (j-i)=1, 2, …,n in that order. When j-i=m, there are
n-m+1 cij’s to compute.
• The computation of each cij’s can be computed in time
O(m).
• The total time for all cij’s with j-i=m is therefore O(nmm2). The total time to evaluate all the cij’s and rij’s is
2
3
(
nm
m
)
O
(
n
)
1 m n
• The computing complexity can be reduced to O(n2) by
limiting the search of the optimal l to the range of ri,j-1 ≤ l
≤ ri+1,j according to D. E. Knuth.
AVL Trees
• Dynamic tables may also be
maintained as binary search trees.
• Depending on the order of the
symbols putting into the table, the
resulting binary search trees would
be different. Thus the average
comparisons for accessing a symbol is
different.
Binary Search Tree for The
Months of The Year
Input Sequence: JAN, FEB, MAR, APR, MAY, JUNE, JULY, AUG,
SEPT, OCT, NOV, DEC
JAN
FEB
MAR
JUNE
APR
MAY
JULY
AUG
SEPT
DEC
Max comparisons: 6
Average comparisons: 3.5
OCT
NOV
A Balanced Binary Search Tree
For The Months of The Year
Input Sequence: JULY, FEB, MAY, AUG, DEC, MAR, OCT, APR, JAN,
JUNE, SEPT, NOV
Max comparisons: 4
JULY
Average comparisons: 3.1
FEB
JAN
AUG
APR
MAY
DEC
MAR
JUNE
OCT
NOV
SEPT
Degenerate Binary Search
Tree
APR
Input Sequence: APR, AUG, DEC, FEB, JAN,
JULY, JUNE, MAR, MAY, NOV, OCT, SEPT
AUG
DEC
FEB
JAN
JULY
JUNE
MAR
MAY
Max comparisons: 12
Average comparisons: 6.5
NOV
OCT
SEPT
Minimize The Search Time of Binary
Search Tree In Dynamic Situation
• From the above three examples, we know that the
average and maximum search time will be
minimized if the binary search tree is maintained
as a complete binary search tree at all times.
• However, to achieve this in a dynamic situation,
we have to pay a high price to restructure the
tree to be a complete binary tree all the time.
• In 1962, Adelson-Velskii and Landis introduced a
binary tree structure that is balanced with
respect to the heights of subtrees. As a result of
the balanced nature of this type of tree, dynamic
retrievals can be performed in O(log n) time if
the tree has n nodes. The resulting tree remains
height-balanced. This is called an AVL tree.
AVL Tree
• Definition: An empty tree is height-balanced. If T
is a nonempty binary tree with TL and TR as its left
and right subtrees respectively, then T is heightbalanced iff
(1) TL and TR are height-balanced, and
(2) |hL – hR| ≤ 1 where hL and hR are the heights of
TL and TR, respectively.
• Definition: The Balance factor, BF(T) , of a node T
is a binary tree is defined to be hL – hR, where hL
and hR, respectively, are the heights of left and
right subtrees of T. For any node T in an AVL tree,
BF(T) = -1, 0, or 1.
Balanced Trees Obtained for
The Months of The Year
-2
0
MAR
MAR
(a) Insert MARCH
0
RR
-1
MAY
0
MAY
MAR
0
NOV
(c) Insert NOVEMBER
-1
MAR
+1
0
+1
MAY
(b) Insert MAY
0
AUG
MAY
MAY
0
NOV
(d) Insert AUGUST
0
NOV
Balanced Trees Obtained for
The Months of The Year (Cont.)
+2
+2
+1
0
+1
MAY
LL
0
0
NOV
MAR
APR
(e) Insert APRIL
APR
AUG
0
AUG
-1
0
APR
AUG
0
JAN
+1
MAR
0
NOV
0
0
NOV
MAR
0
+2
MAY
MAY
0
LR
0
APR
(f) Insert JANUARY
AUG
MAR
0
JAN
-1
MAY
0
NOV
Balanced Trees Obtained for
The Months of The Year (Cont.)
+1
+1
-1
AUG
0
APR
0
MAR
+1
-1
-1
MAY
JAN
DEC
(g) Insert DECEMBER
0
NOV
0
APR
AUG
0
DEC
MAR
0
JAN
-1
MAY
0
JULY
(h) Insert JULY
0
NOV
Balanced Trees Obtained for
The Months of The Year (Cont.)
+2
-2
0
APR
AUG
-1
DEC
MAR
+1
JAN
0
+1
RL
-2
MAY
0
JULY
0
0
+1
NOV
0
AUG
APR
FEB
(i) Insert FEBRUARY
DEC
0
FEB
MAR
0
JAN
-1
MAY
0
JULY
0
NOV
Balanced Trees Obtained for
The Months of The Year (Cont.)
+2
MAR
-1
-1
DEC
AUG
0
APR
0
NOV
JAN
0
FEB
JAN
MAY
-1
+1
0
LR
+1
-1
JULY
AUG
+1
0
DEC
MAR
0
FEB
-1
-1
JULY
MAY
0
0
0
JUNE
APR
JUNE
(j) Insert JUNE
0
NOV
Balanced Trees Obtained for
The Months of The Year (Cont.)
-1
-1
JAN
+1
-1
DEC
+1
0
APR
FEB
-2
JULY
MAY
JUNE
0
DEC
-1
0
JAN
+1
MAR
0
AUG
RR
+1
NOV
0
AUG
-1
0
APR
0
OCT
(k) Insert OCTOBER
MAR
FEB
-1
0
JULY
NOV
0
JUNE
0
MAY
0
OCT
Balanced Trees Obtained for
The Months of The Year (Cont.)
-1
JAN
-1
+1
DEC
+1
AUG
MAR
0
FEB
-1
-1
JULY
NOV
0
0
APR
JUNE
0
MAY
-1
OCT
0
(i) Insert SEPTEMBER
SEPT
Rebalancing Rotation of
Binary Search Tree
• LL: new node Y is inserted in the left subtree of
the left subtree of A
• LR: Y is inserted in the right subtree of the left
subtree of A
• RR: Y is inserted in the right subtree of the right
subtree of A
• RL: Y is inserted in the left subtree of the right
subtree of A.
• If a height–balanced binary tree becomes
unbalanced as a result of an insertion, then these
are the only four cases possible for rebalancing.
Rebalancing Rotation LL
LL
+1
A
+2
A
0
B
BL
0
B
+1
B
BR
AR h
h+2
BL
BR
AR
height of BL increases
to h+1
h+2
0
A
BL
BR
AR
Rebalancing Rotation RR
RR
-1
A
-2
A
0
B
AL
BL
-1
B
h+2
BR
0
B
AL
BL
0
A
h+2
BR
height of BR increases
to h+1
BR
AL
BL
Rebalancing Rotation LR(a)
+1
A
0
B
+2
A
-1
B
0
C
LR(a)
0
B
0
C
0
A
Rebalancing Rotation LR(b)
LR(b)
+1
A
+2
A
0
B
h
-1
B
0
C
BL
CL
0
C
AR
h+2
+1
C
h
BL
CR
0
B
CL
-1
A
h+2
AR
CR
h
BL
CL
CR
AR
Rebalancing Rotation LR(c)
+2
A
LR(c)
-1
B
CL
+1
B
0
A
h+2
-1
C
BL
0
C
AR
CR
h
BL
CL
CR
AR
AVL Trees (Cont.)
• Once rebalancing has been carried
out on the subtree in question,
examining the remaining tree is
unnecessary.
• To perform insertion, binary search
tree with n nodes could have O(n) in
worst case. But for AVL, the
insertion time is O(log n).
AVL Insertion Complexity
• Let Nh be the minimum number of nodes in a
height-balanced tree of height h. In the worst
case, the height of one of the subtrees will be
h-1 and that of the other h-2. Both subtrees
must also be height balanced. Nh = Nh-1 + Nh-2 + 1,
and N0 = 0, N1 = 1, and N2 = 2.
• The recursive definition for Nh and that for the
Fibonacci numbers Fn= Fn-1 + Fn-2, F0=0, F1= 1.
• It can be shown that Nh= Fh+2 – 1. Therefore we
can derive that Nh h2 / 5 1 . So the worst-case
insertion time for a height-balanced tree with n
nodes is O(log n).
Probability of Each Type of
Rebalancing Rotation
• Research has shown that a random
insertion requires no rebalancing, a
rebalancing rotation of type LL or RR,
and a rebalancing rotation of type LR
and RL, with probabilities 0.5349,
0.2327, and 0.2324, respectively.
Comparison of Various
Structures
Operation
Sequential List
Search for x
Linked List
AVL Tree
O(log n)
O(n)
O(log n)
Search for kth
item
O(1)
O(k)
O(log n)
Delete x
O(n)
O(1)1
O(log n)
O(n - k)
O(k)
O(log n)
Insert x
O(n)
O(1)2
O(log n)
Output in order
O(n)
O(n)
O(n)
Delete kth item
1.
Doubly linked list and position of x known.
2.
Position for insertion known
2-3 Trees
•
If search trees of degree greater than 2 is used, we’ll have
simpler insertion and deletion algorithms than those of AVL trees.
The algorithms’ complexity is still O(log n).
• Definition: A 2-3 tree is a search tree that either is empty or
satisfies the following properties:
(1) Each internal ndoe is a 2-node or a 3-node. A 2-node has one
element; a 3-node has two elements.
(2) Let LeftChild and MiddleChild denote the children of a 2-node.
Let dataL be the element in this node, and let dataL.key be its
key. All elements in the 2-3 subtree with root LeftChild have
key less than dataL.key, whereas all elements in the 2-3 subtree
with root MiddleChild have key greater than dataL.key.
(3) Let LeftChild, MiddleChild, and RightChild denote the children
of a 3-node. Let dataL and dataR be the two elements in this
node. Then, dataL.key < dataR.key; all keys in the 2-3 subtree
with root LeftChild are less than dataL.key; all keys in the 2-3
subtree with root MiddleChild are less than dataR.key and
greater than dataL.key; and all keys in the 2-3 subtree with
root RightChild are greater than dataR.key.
(4) All external nodes are at the same level.
2-3 Tree Example
A
40
B
C
10 20
80
The Height of A 2-3 Tree
• Like leftist tree, external nodes are
introduced only to make it easier to
define and talk about 2-3 trees. External
nodes are not physically represented
inside a computer.
• The number of elements in a 2-3 tree
with height h is between 2h - 1 and 3h - 1.
Hence, the height of a 2-3 tree with n
elements is between log 3 (n 1) and log 2 (n 1)
2-3 Tree Data Structure
template<class KeyType> class Two3;
class Two3Node {
friend class Two3<KeyType>;
private:
Element<KeyType> dataL, dataR;
Two3Node *LeftChild, *MiddleChild, *RightChild;
};
template<class KeyType>
class Two3{
public:
Two3(KeyType max, Two3Node<KeyType>* int=0)
: MAXKEY(max), root(init) {}; // constructor
Boolean Insert(const Element<KeyType>&);
Boolean Delete(const Element<KeyType>&);
Two3Node<KeyType>* Search(const Element<KeyType>&);
private:
Two3Node<KeyType>* root;
KeyType MAXKEY;
};
Searching A 2-3 Tree
• The search algorithm for binary search
tree can be easily extended to obtain the
search function of a 2-3 tree
(Two3::Search()).
• The search function calls a function
compare that compares a key x with the
keys in a given node p. It returns the value
1, 2, 3, or 4, depending on whether x is less
than the first key, between the first key
and the second key, greater than the
second key, or equal to one of the keys in
node p.
Searching Function of a 2-3
Tree
template <class KeyType>
Two3Node<KeyType>* Two3<KeyType>:: Search(const
Element<KeyType>& x)
// Search the 2-3 tree for an element x. If the element is not in the tree, then return 0.
// Otherwise, return a pointer to the node that contains this element.
{
for (Two3Node<KeyType>* p = root; p;)
switch(p->compare(x)){
case 1: p = p->LeftChild; break;
case 2: p = p->MiddleChild; break;
case 3: p = p->RightChild; break;
case 4: return p; // x is one of the keys in p
}
}
Insertion Into A 2-3 Tree
• First we use search function to search the 2-3
tree for the key that is to be inserted.
• If the key being searched is already in the tree,
then the insertion fails, as all keys in a 2-3 tree
are distinct. Otherwise, we will encounter a unique
leaf node U. The node U may be in two states:
– the node U only has one element: then the key can be
inserted in this node.
– the node U already contains two elements: A new node is
created. The newly created node will contain the
element with the largest key from among the two
elements initially in p and the element x. The element
with the smallest key will be in the original node, and the
element with median key, together with a pointer to the
newly created node, will be inserted into the parent of U.
Insertion to A 2-3 Tree
Example
A
A
20 40
40
B
C
10 20
70 80
(a) 70 inserted
B
10
D
30
(b) 30 inserted
C
70 80
Insertion of 60 Into Figure
10.15(b)
G
40
A
F
20
B
10
70
D
30
E
C
60
80
Node Split
• From the above examples, we find
that each time an attempt is made to
add an element into a 3-node p, a new
node q is created. This is referred to
as a node split.
2-3 Tree Insertion Function
template <class KeyType>
Boolean Two3<KeyType>::Insert(const Element<KeyType>& y)
{
Two3Node<KeyType>* p;
Element<KeyType> x = y;
if (x.key>=MAXKEY) return FALSE; // invalid key
if (!root) {NewRoot(x, 0); return TRUE;}
if (!(p = FindNode(x))){ InsertionError(); return FALSE;}
for (Two3Node<KeyType> *a = 0;;)
if (p->dataR.key == MAXKEY) { // p is a 2-node
p->PutIn(x, a);
return TRUE;
}
else { // p is a 3-node
Two3Node<KeyType>* olda = a;
a = new(Two3Node<KeyType>);
x = Split(p, x, olda, a);
if (root == p) { // root has been split
NewRoot(x, a);
return TRUE;
}
else p = p->parent();
}
}
Deletion From a 2-3 Tree
• If the element to be deleted is not in a
leaf node, the deletion operation can be
transformed to a leaf node. The deleted
element can be replaced by either the
element with the largest key on the left or
the element with the smallest key on the
right subtree.
• Now we can focus on the deletion on a leaf
node.
Deletion From A 2-3Tree
Example
A
A
50 80
50 80
B
C
10 20
A
(a) Initial 2-3 tree
C
10 20
90 95
60 70
60
(b) 70 deleted
50 80
B
10 20
(c) 90 deleted
B
D
C
60
D
95
D
90 95
Deletion From A 2-3Tree
Example (Cont.)
(d) 60 deleted
B
A
(e) 95 deleted
20 80
C
10
D
95
50
(f) 50 deleted
A
20
A
B
10
20
C
50 80
(g) 10 deleted
B
20 80
B
10
C
80
Rotation and Combine
• As shown in the example, deletion may
invoke a rotation or a combine operations.
• For a rotation, there are three cases
– the leaf node p is the left child of its parent r.
– the leaf node p is the middle child of its parent
r.
– the leaf node p is the right child of its parent r.
Three Rotation Cases
p
x
q
y z
a
c
b
w z
y ?
x ?
p
r
r
r
d
a
q
z
b
c
q
x y
d
c
b
p
d e
(a) p is the left child of r
a
b
q
x
p
c d
w y
y ?
z ?
q
x y
r
r
r
a
p
z
b
c
a
d
(b) p is the middle child of r
b
q
x
p
z
c
d
e
(c) p is the right child of r
Steps in Deletion From a
Leaf Of a 2-3 Tree
• Step 1: Modify node p as necessary to reflect its status
after the desired element has been delete.
• Step 2:
for (; p has zero elements && p != root; p = r) {
let r be the parent of p, and let q be the left or right sibling of p;
if (q is a 3-node) perform a rotation
else perform a combine;
}
• Step 3: If p has zero elements, then p must be the root.
The left child of p becomes the new root, and node p is
deleted.
Combine When p is the Left
Child of r
r
r
z
x z
p
a
p
x y
q
y
a
c
b
b
c
(a)
r
r
x z
q
p
b
p
x
d
y
a
z
c
a
(b)
d
b
c
2-3-4 Trees
Definition: A 2-3-4 tree is a search tree that either is empty
or satisfies the following properties:
(1) Each internal node is a 2-, 3-, or 4-node. A 2-node has one
element, a 3-node has two elements, and a 4-node has three
elements.
(2) Let LeftChild and LeftMidChild denote the children of a
2-node. Let dataL be the element in this node, and let
dataL.key be its key. All elements in the 2-3-4 subtree with
root LeftMidChild have key greater than dataL.key.
(3) LeftChild, LeftMidChild, and RightMidChild denote the
children of a 3-node. Let dataL and dataM be the two
elements in this node. Then, dataL.key < dataM.key; all keys
in the 2-3-4 subtree with root LeftChild are less than
dataL.key; all keys in the 2-3-4 subtree with root
LeftMidChild are less than datM.key and greater than
dataL.key; and all keys in the 2-3-4 subtree with root
RightMidChild are greater than dataM.key.
2-3-4 Trees (Cont.)
(4) Let LeftChild, LeftMidChild, RigthMidChild,
RightChild denote the children of a 4-node. Let
dataL, dataM, dataR be the three elements in this
node. The, dataL.key < dataM.key < dataR.key; all
keys in the 2-3-4 subtree with root LeftChild are
less than dataL.key; all keys in the 2-3-4 subtree
with root LeftMidChild are less than dataM.key
and greater than dataL.key; all keys in the 2-3-4
subtree with root RightMideChild are greater
than dataM.key but less than dataR.key; and all
keys in the 2-3-4 subtree with root RightChild
are greater than dataR.key.
(5) All external nodes are at the same level.
2-3-4 Tree Example
50
10
5
7
8
30 40
70 80
60
75
85 90 92
2-3-4 Trees (Cont.)
• Similar to the 2-3 tree, the height of a 2-3-4 tree with n
nodes h is bound between log 4 (n 1) and log 2 (n 1)
• 2-3-4 tree has an advantage over 2-3 trees in that
insertion and deletion can be performed by a single rootto-leaf pass rather than by a root-to-leaf pass followed
by a leaf-to-root pass.
• So the corresponding algorithms in 2-3-4 trees are
simpler than those of 2-3 trees.
• Hence 2-3-4 trees can be represented efficiently as a
binary tree (called red-black tree). It would result in a
more efficient utilization of space.
Top-Down Insertion
• If the leaf node into which the element is to be inserted is
a 2-node or 3-node, then it’s easy. Simply insert the
element into the leaf node.
• If the leaf node into which the element is to be inserted is
a 4-node, then this node splits and a backward (leaf-to-root)
pass is initiated. This backward pass terminates when either
a 2-node or 3-node is encountered, or when the root is split.
• To avoid the backward pass, we split 4-nodes on the way
down the tree. As a result, the leaf node into which the
insertion is to be made is guaranteed to be a 2- or 3-node.
• There are three different cases to consider for a 4-node:
(1) It is the root of the 2-3-4 tree.
(2) Its parent is a 2-node.
(3) Its parent is a 3-node.
Transformation When the
4-Node Is The Root
root
y
x
a
b
root
z
c
y
d
x
a
Increase height by one
z
b
c
d
Transformation When the 4Node is the Child of a 2-Node
w
a
x
b
e
e
y
c
z
x
z
w
d
a
y
c
b
d
(a)
w
w
a
y
a
y
x
b
c
z
d
x
e
(b)
b
z
c
d
e
Transformation When the 4-Node
is the Left Child of a 3-Node
y
w
v
a
b
e
x
c
d
w
z
y
z
e
f
v
a
x
b
c
d
f
Transformation When the 4-Node is
the Left Middle Child of a 3-Node
v
z
v
f
a
x
w
b
c
z
f
a
y
d
x
w
e
b
y
c
d
e
Transformation When the 4-Node is
the Right Middle Child of a 3-Node
a
b
c
w
v
w
v
a
y
x
d
b
z
e
y
x
f
c
z
d
e
f
Top-Down Deletion
• The deletion of an arbitrary element from a 2-3-4
tree may be reduced to that of a deletion of an
element that is in a leaf node. If the element to
be deleted is in a leaf that is a 3-node or 4-node,
the its deletion leaves behind a 2-node or a 3node. No restructure is required.
• To avoid a backward restructuring path, it is
necessary to ensure that at the time of deletion,
the element to be deleted is in a 3-node or a 4node. This is accomplished by restructuring the 23-4 tree during the downward pass.
Top-Down Deletion (Cont.)
• Suppose the search is presently at node p and will move
next to node q. The following cases need to be considered:
(1) p is a leaf: The element to be deleted is either in p or not
in the tree.
(2) q is not a 2-node. In this case, the search moves to q, and
no restructuring is needed.
(3) q is a 2-node, and its nearest sibling, r, is also a 2-node.
– if p is a 2-node, p must be root, and p, q, r are combined by
reserving the 4-node being the root splitting transformation.
– if p is a 3-node or a 4-node, perform, in reverse, the 4-node
splitting transformation.
(4) q is a 2-node, and its nearest sibling, r, is a 3-node.
(5) q is a 2-node and its nearest sibling, r, is a 4-node.
Deletion Transformation When the
Nearest Sibling is a 3-Node
p
w
z
q
f
r
v
a
p
x
c
b
q
v
y
d
e
z
x
a
f
r
w
y
d
b
e
(a) q is the left child of a 3-node
p
y
v
q
a
z
f
r
u
w
b
c
p
x
g
y
w
q
z
f
r
u
v
a
b
d e
(b) q is the left child of a 4-node
x
d
e
g
Red-Black Trees
•
•
A red-black tree is a binary tree representation of a 2-3-4
tree.
The child pointer of a node in a red-black tree are of two
types: red and black.
–
•
–
If the child pointer was present in the original 2-3-4 tree, it
is a black pointer.
Otherwise, it is a red pointer.
A node in a 2-3-4 is transformed into its red-black
representation as follows:
(1) a 2-node p is represented by the RedBlackNode q with both
its color data members black, and data = dataL; q->LeftChild =
p->LeftChild, and q->RightChild = p ->LeftMidChild.
(2) A 3-node p is represented by two RedBlackNodes connected
by a red pointer. There are two ways in which this may be
done.
(3) A 4-node is represented by three RedBlackNodes, one of
which is connected to the remaining two by red pointers.
Transforming a 3-Node into
Two RedBlackNodes
or
y
y
x
a
b
c
x
a
y
x
c
a
b
b
c
Transforming a 4-Node into
Three RedBlackNodes
y
y
x
a
b
z
c
z
x
d
a
b
c
d
Red-Black Trees (Cont.)
• One may verify that a red-black tree satisfies the following
properties:
(P1) It is a binary search tree.
(P2) Every root-to-external-node path has the same number of
black links.
(P3) No root-to-external-node path has two or more consecutive
red pointers.
• An alternate definition of a red-black tree is given in the
following:
(Q1) It is a binary search tree.
(Q2) The rank of each external node is 0
(Q3) Every internal node that is the parent of an external node
has rank 1.
(Q4) For every node x that has a parent p(x), rank(x) ≤ rank(p) ≤
rank(x) + 1.
(Q5) For every node x that has a grandparent gp(x), rank(x) <
rank(gp(x)).
Red-Black Trees (Cont.)
• Each node x of a 2-3-4 tree T is represented by a
collection of nodes in its corresponding red-black
tree. All nodes in this collection have a rank equal
to height(T) – level(x) +1.
• Each time there is a rank change in a path from
the root of the red-black tree, there is a level
change in the corresponding 2-3-4 tree.
• Black pointers go from a node of a certain rank to
one whose rank is one less.
• Red pointers connect two nodes of the same rank.
Lemma 10.1
Lemma 10.1: Every red-black tree RB with
n (internal) nodes satisfies the following:
(1) height ( RB ) 2log 2 (n 1)
(2) height ( RB ) 2rank ( RB )
(3) rank ( RB ) log 2 (n 1)
Searching a Red-Black Tree
• Since a red-black tree is a binary
search tree, the search operation can
be done by following the same search
algorithm used in a binary search
tree.
Red-Black Tree Insertion
• An insertion to a red-black tree can
be done in two ways: top-down or
bottom-up.
• In a top-down insertion, a single
root-to-leaf pass is made over the
red-black tree.
• A bottom-up insertion makes both a
root-to-leaf and a leaf-to-root pass.
Top-Down Insertion
•
•
We can detect a 4-node simply by looking for nodes q for
which both color data members are red. Such nodes,
together with their two children, form a 4-node.
When such a 4-node q is detected, the following
transformations are needed:
(1) Change both the colors of q to black.
(2) If q is the left (right) child of its parent, then change the
left (right) color of its parent to red.
(3) If we now have two consecutive red pointers, then one is
from the grandparent, gp, of q to the parent, p, of q and the
other from p to q. Let the direction of the first of these be
X and that of the second be Y. Depending on XY = LL, LR, RL,
and RR, transformations are performed to remove violations.
Transformation for a Root
4-Node
root
root
y
y
z
x
a
b
c
z
x
d
a
b
c
d
Transformation for a 4-Node That
is the Child of a 2-Node
z
z
x
x
e
y
w
b
a
w
d
c
y
b
a
d
c
w
w
y
a
x
z
c
y
a
x
b
e
d
e
b
z
c
d
e
Transformation for a 4-Node That
is the Left Child of a 3-Node
z
y
y
f
e
w
v
a
w
v
x
b
c
z
(a) LL rotation
d
a
x
b
c
y
w
a
w
z
x
b
c
f
d
y
v
e
e
d
v
f
(b) color change
z
a
x
b
c
e
d
f
Transformation for a 4-Node That is
the Left Middle Child of a 3-Node
z
v
x
(a) LR rotation
f
x
a
a
w
b
d
b
e
cd
e
v
f
y
c
d
z
v
x
w
x
(b) RL rotation
z
a
b
f
y
w
y
c
z
v
e
a
b
f
y
w
cd
e
Bottom-Up Insertion
• In bottom-up insertion, the element to be
inserted is added as the appropriate child of the
node last encountered. A red pointer is used to
join the new node to its parent.
• However, this might violates the red-black tree
definition since there might be two consecutive
red pointers on the path.
• To resolve this problem, we need to perform color
transformation.
• Let s be the sibling of node q. The violation is
classified as an XYZ violation, where X=L if <p, q>
is a left pointer, and X=R otherwise; Y=L if <q, r>
is a left pointer, and Y=R otherwise; and Z=r if
s≠0 and <p, s> is a red pointer, and Z= b otherwise.
Bottom-Up Insertion (Cont.)
• The color changes potentially
propagate the violation up the tree
and may need to be reapplied several
times. Note that the color change
would not affect the number of black
pointers on a root-to-external path.
LLr and LRr Color Changes for
Bottom-Up Insertion
y
y
a
e
d
c
w
z
x
z
x
(a) LLr color change a
b
b
y
y
x
b
d
c
z
w
z
w
a
e
d
c
w
a
e
(b) LRr color change
d
x
b
c
e
LLb and LRb Rotations for
Bottom-Up Insertion
x
y
a
a
e
d
c
w
y
w
z
x
b
b
z
c
x
y
x
b
d
c
y
w
z
w
a
e
d
(a) LLb rotation
a
e
(b) LRb rotation
b
c
z
d
e
Comparison of Top-Down
and Bottom-Up
• In comparing the top-down and the
bottom-up insertion methods, the topdown method, O(log n) rotations can be
performed, whereas only one rotation is
possible in the bottom up method.
• Both methods may perform O(log n) color
changes. However, the top-down method
can be used in pipeline mode to perform
several insertions in sequence. The
bottom-up cannot be used in this way.
Deletion from a Red-Black
Tree
• If the node to be delete is root, then the result is an empty
red-black tree.
• If the leaf node to be deleted has a red pointer to its
parent, then it can be deleted directly because it is part of
3-node or 4-node.
• If the leaf node to be deleted has a black pointer, then the
leaf is a 2-node. Deletion from a 2-node requires a
backward restructuring pass. This is not desirable.
• To avoid deleting a 2-node, insertion transformation is used
in the reverse direction to ensure that the search for the
element to be deleted moves down a red pointer.
• Since most of the insertion and deletion transformations
can be accomplished by color changes and require no pointer
changes or data shifts, these operations take less time
using red-black trees than when a 2-3-4 tree is
represented using nodes of type Two34Node.
Joining and Splitting RedBlack Trees
• In binary search tree we have the
following operations defined:
ThreeWayJoin, TwoWayJoin, and
Split. These operations can be
performed on red-black trees in
logarithmic time.
Large Search Tree That
Does Not Fit in Memory
• The aforementioned balanced search trees (AVL
trees, 2-3 trees, 2-3-4 trees) only work fine when
the table can fit in the internal memory.
• If the table is larger than the internal memory,
then a search may require O(h) disk accesses,
where h is the height of the tree.
• Since disk accesses tend to take significant
amount of time compared to internal memory
accesses, it is desirable we develop a structure to
minimize the number of disk accesses.
M-Way Search Tree
Definition: An m-way search tree, either is empty or
satisfies the following properties:
(1)The root has at most m subtrees and has the following
structures:
n, A0, (K1, A1), (K2, A2), …, (Kn, An)
where the Ai, 0 ≤ i ≤ n ≤ m, are pointers to subtrees, and
the Ki, 1 ≤ i ≤ n ≤ m, are key values.
(2) Ki < Ki +1, 1 ≤ i ≤ n
(3) All key values in the subtree Ai are less than Ki +1 and
greater then Ki , 0 ≤ i ≤ n
(4) All key values in the subtree An are greater than Kn , and
those in A0 are less than K1.
(5) The subtrees Ai, 0 ≤ i ≤ n , are also m-way search trees.
Searching an m-Way
Search Tree
• Suppose to search a m-Way search tree T
for the key value x. Assume T resides on a
disk. By searching the keys of the root, we
determine i such that Ki ≤ x < Ki+1.
– If x = Ki, the search is complete.
– If x ≠ Ki, x must be in a subtree Ai if x is in T.
– We then proceed to retrieve the root of the
subtree Ai and continue the search until we
find x or determine that x is not in T.
Searching an m-Way
Search Tree
• The maximum number of nodes in a tree of degree
m and height h is
m
i
(m h 1) /( m 1)
0i h 1
• Therefore, for an m-Way search tree, the
maximum number of keys it has is mh - 1.
• To achieve a performance close to that of the
best m-way search trees for a given number of
keys n, the search tree must be balanced.
B-Tree
Definition: A B-tree of order m is an m-way
search tree that either is empty or
satisfies the following properties:
(1) The root node has at least two children.
(2) All nodes other than the root node and
failure nodes have at least m / 2 children.
(3) All failure nodes are at the same level.
B-Tree (Cont.)
• Note that 2-3 tree is a B-tree of order 3 and 2-3-4 tree is
a B-tree of order 4.
• Also all B-trees of order 2 are full binary trees.
• A B-tree of order m and height l has at most ml -1 keys.
• For a B-tree of order m and height l, the minimum number
of keys (N) in such a tree is N 2m / 2l 1 1, l 1.
• If there are N key values in a B-tree of order m, then all
nonfailure nodes are at levels less than or equal to
l, l log m / 2 {( N 1) / 2} 1 . The maximum number of accesses
that have to be made for a search is l.
• For example, a B-tree of order m=200, an index with N ≤
2x106-2 will have l ≤ 3.
The Choice of m
• B-trees of high order are desirable since they
result in a reduction in the number of disk
accesses.
• If the index has N entries, then a B-tree of order
m=N+1 has only one level. But this is not
reasonable since all the N entries can not fit in
the internal memory.
• In selecting a reasonable choice for m, we need to
keep in mind that we are really interested in
minimizing the total amount of time needed to
search the B-tree for a value x. This time has two
components:
(1) the time for reading in the node from the disk
(2) the time needed to search this node for x.
The Choice of m (Cont.)
• Assume a node of a B-tree of order m is of a fixed size and
is large enough to accommodate n, A0 , and m-1 triple (Ki , Ai ,
Bi), 1 ≤ j < m.
• If the Ki are at most charactersα long and Ai and Bi each
characters βlong, then the size of a node is about m(α+2β).
Then the time to access a node is
ts + tl + m(α+2β) tc = a+bm
where a = ts + tl = seek time + latency time
b = (α+2β) tc , and tc = transmission time per character.
• If binary search is used to search each node of the B-tree,
then the internal processing time per node is c log2 m+d for
some constants c and d.
• The total processing time per node is τ= a + bm + c log2 m+d
ad
bm
c}
• The maximum search time is f * log 2{( N 1) / 2}*{
log 2 m log 2 m
where f is some constant.
Figure 10.36: Values of
(35+0.06m)/log2m
m
Search time (sec)
2
35.12
4
17.62
8
11.83
16
8.99
32
7.38
64
6.47
128
6.10
256
6.30
512
7.30
1024
9.64
2048
14.35
4096
23.40
8192
40.50
Total maximum search time
Figure 10.37: Plot of
(35+0.06m)/log2m
6.8
5.7
50
m
125
400
Insertion into a B-Tree
•
•
•
•
Instead of using 2-3-4 tree’s top-down insertion, we generalize
the two-pass insertion algorithm for 2-3 trees because 2-3-4
tree’s top-down insertion splits many nodes, and each time we
change a node, it has to be written back to disk. This increases the
number of disk accesses.
The insertion algorithm for B-trees of order m first performs a
search to determine the leaf node p into which the new key is to
be inserted.
– If the insertion of the new key into p results p having m keys, the node
p is split.
– Otherwise, the new p is written to the disk, and the insertion is
complete.
Assume that the h nodes read in during the top-down pass can be
saved in memory so that they are not to be retrieved from disk
during the bottom-up pass, then the number of disk accesses for
an insertion is at most h (downward pass) +2(h-1) (nonroot splits) +
3(root split) = 3h+1.
The average number of disk accesses is approximately h+1 for
large m.
Figure 10.38: B-Trees of
Order 3
20
10, 30
25, 30
10
(a) p = 1, s = 0
p is the number of
nonfailure nodes
in the final B-tree
with N entries.
s is the number
of split
(b) p = 3, s = 1
20, 28
(c) p = 4, s = 2
10
10
25, 30
Deletion from a B-Tree
• The deletion algorithm for B-tree is also a
generalization of the deletion algorithm for 2-3
trees.
• First, we search for the key x to be deleted.
– If x is found in a node z, that is not a leaf, then the
position occupied by x in z is filled by a key from a leaf
node of the B-tree.
– Suppose that x is the ith key in z (x =Ki). Then x may be
replaced by either the smallest key in the sbutree Ai or
the largest in the subtree Ai-1. Since both nodes are leaf
nodes, the deletion of x from a nonleaf node is
transformed into a deletion from a leaf.
Deletion from a B-Tree
(Cont.)
• There are four possible cases when deleting from a leaf
node p.
– In the first case, p is also the root. If the root is left with at
least one key, the changed root is written back to disk.
Otherwise, the B-tree is empty following the deletion.
– In the second case, following the deletion, p has at least
m / 2 1keys. The modified leaf is written back to disk.
– In the third case, p has m / 2 2 keys, and its nearest sibling, q,
has at least m / 2 keys. Check only one of p’s nearest siblings. p
is deficient, as it has one less than the minimum number of keys
required. q has more keys than the minimum required. As in the
case of a 2-3 tree, a rotation is performed. In this rotation,
the number of keys in q decreases by one, and the number in p
increases by one.
– In the fourth case, p has m / 2 2 keys, and q has m / 2 1 keys.
p is deficient and q has minimum number of keys permissible
for a nonroot node. Nodes p and q and the keys Ki are combined
to form a single node.
Figure 10.39 B-Tree of
Order 5
2 20 35
2 10 15
2 25 30
3 40 45 50
Splay Trees
• If we only interested in amortized complexity rather than
worst-case complexity, simpler structures can be used for
search trees.
• By using splay trees, we can achieve O(log n) amortized time
per operation.
• A splay tree is a binary search tree in which each search,
insert, delete, and join operations is performed in the same
way as in an ordinary binary search tree except that each
of these operations is followed by a splay.
• Before a split, a splay is performed. This makes the split
very easy to perform.
• A splay consists of a sequence of rotations.
Starting Node of Splay
Operation
•
The start node for a splay is obtained as follows:
(1) search: The splay starts at the node containing the
element being sought.
(2) insert: The start node for the splay is the newly
inserted node.
(3) delete: The parent of the physically deleted node is
used as the start node for the splay. If this node is
the root, then no splay is done.
(4) ThreeWayJoin: No splay is done.
(5) split: Suppose that we are splitting with respect to
the key i and that key i is actually present in the tree.
We first perform a splay at the node that contains i
and then split the tree.
Splay Operation
•
•
•
Splay rotations are performed along the path
from the start node to the root of the binary
search tree.
Splay rotations are similar to those performed
for AVL trees and red-black trees.
If q is the node at which splay is being
performed. The following steps define a splay
(1) If q either is 0 or the root, then splay terminates.
(2) If q has a parent p, but no grandparent, then the
rotation of Figure 10.42 is performed, and the splay
terminates.
(3) If q has a parent, p, and a grandparent, gp, then the
rotation is classified as LL (p is the left child of gp,
and q is left child of p), LR ( p is the left child of gp, q
is right child of p), RR, or RL. The splay is repeated at
the new location of q.
Splay Amortized Cost
• Note that all rotations move q up the tree and that following
a splay, q becomes the new root of the search tree.
• As a result, splitting the tree with respect to a key, i, is
done simply by performing a splay at i and then splitting at
the root.
• The analysis for splay trees will use a potential technique.
Let P0 be the initial potential of the search tree, and let Pi
be its potential following the ith operation in a sequence of
m operations. The amortized time for the ith operation is
defined as
(actual time for the ith operation) + Pi – Pi-1
So the actual time for the ith operation is
(amortized time for the ith operation) + Pi – Pi-1
Hence, the actual time needed to perform the m operations
in the sequence is
(amortized
i
time for the ith operation) P0 Pm
Figure 10.42: Rotation when q is Right
Child and Has no Grandparent
q
p
q
p
c
a
b
c
a
a, b, and c are substrees
b
Figure 10.43 RR and RL
Rotations
q
gp
p
p
a
a
gp
q
b
b
c
gp
d
c
d
(a) Type RR
q
p
a
p
gp
q
d
b
c
(b) Type RL
a
b
c
d
Figure 10.44 Rotations In A Splay
Beginning At Shaded Node
1
1
8
2
8
j
2
i
6
3
5
g
4
4
c
3
5
d
e
f
(a) Initial search tree
i
6
h
c
j
7
b
7
b
9
a
9
a
h
g
f
e
d
(b) After RR rotation
Figure 10.44: Rotations In A Splay
Beginning At Shaded Node (Cont.)
1
1
9
a
8
2
j
3
d
3
6
c
7
g
(c) After LL rotation
8
4
b
ef
j
2
i
4
c
5
5
b
9
a
6
e f
d
i
7
g
h
(d) After LR rotation
h
Figure 10.44: Rotations In A Splay
Beginning At Shaded Node (Cont.)
5
9
1
8
2
a
6
4
b
e f
3
c
d
j
i
7
g
(e) After RL rotation
h
Upper Bound of Splay’s
Amortized Cost
• Let the size, s(i) of the subtree with
root i be the total number of nodes in it.
• The rank, r(i), of node i is equal
to log 2 s(i) .
• The potential of the tree is i r (i) .
• Lemma 10.2: Consider a binary search
tree that has n elements/nodes. The
amortized cost of a splay operation that
begins at node q is at most 3(log 2 n r (q)) 1 .
Splay Tree Complexity
Theorem 10.1: The total time needed
for a sequence of m search, insert,
delete, join, and split operations
performed on a collection of initially
empty splay trees is O(m log n),
where n, n > 0, is the number of
inserts in the sequence.
Digital Search Trees
• A digital search tree is a binary tree in which
each node contains one element. The element-tonode assignment is determined by the binary
representation of the element keys.
• Suppose we number the bits in the binary
representation of a key from left to right
beginning at one. Then bit one of 1000 is 1. All
keys in the left subtree of a node at level i have
bit i equal to 0 whereas those in the right subtree
of nodes at this level have bit i = 1.
Figure 10.45 Digital Search
a
a
1000
b
0010
1000
c
b
1001
0010
e
d
0001
f
1100
e
0001
1100
g
0000
(a) Initial tree
1001
d
f
0000
c
0011
(b) After 0011 inserted
Digital Search Trees (Cont.)
• The digital search tree functions to search and insert are
quite similar to the corresponding functions for binary
search trees.
• During insert or search, the subtree to move to is
determined by a bit in the search key rather than by the
result of the comparison of the search key and the key in
the current node.
• Deleting an item in a leaf node is easy by simply removing
the node.
• Deleting the key in a non-leaf node, the deleted item is
replaced by a value from any leaf in its subtree and that
leaf is removed.
• Each of these operations can be performed in O(h) time,
where h is the height of the digital search tree.
• If each key in a digital search tree has KeySize bits, then
the height of the digital search tree is at most KeySize +1.
Binary Tries
•
•
•
•
•
When we are dealing with very long keys, the cost of a key
comparison is high.
The cost can be reduced to one by using a related structure called
Patricia (Practical algorithm to retrieve information coded in
alphanumeric).
Three steps to develop the structure:
– First, introduce a structure called binary trie.
– Then, transform binary tries into compressed binary tries.
– Finally, from compressed binary tries we obtain Patricia.
A binary trie is a binary tree that has two kinds of nodes: branch
nodes and element nodes.
– A branch node has two data members LeftChild and RightChild but no
data data member.
– An element node has single data member data.
Branches nodes are used to build binary tree search structure
similar to that of a digital search tree.
Figure 10.46: Example of A
Binary Trie
1100
0010
0000
0001
1000
1001
Compressed Binary Trie
• Observe that a successful search in a binary trie
always ends at an element node.
• Once this element node is reached, key
comparison is performed.
• Observe from Figure 10.46, we found that there
are some degree one node in the tree. We can use
another data member BitNumber to eliminate all
degree-one branch nodes from the trie.
• The BitNumber gives the bit number of the key
that is to be used at this node.
• A binary trie that has been modified to contain no
branch nodes of degree one is called a
compressed binary trie.
Figure 10.47: Binary Trie of Figure 10.46 With
Degree-One Nodes Eliminated
1
3
2
4
0010
0000
0001
4
1100
1000
1001
Patricia
•
Compressed binary tries may be represented using nodes of a
single type. The new nodes, called augmented branch nodes, are
the original branch nodes augmented by the data member data.
The resulting structure is called Patricia and obtained from a
compressed binary trie in the following way:
(1)Replace each branch node by an augmented branch node.
(2) Eliminate the element nodes.
(3) Store the data previously in the element nodes in the data data
members of the augmented branch nodes. Since every nonempty
compressed binary trie has one less branch node than it has
element nodes, it is necessary to add one augmented branch node.
This node is called head node. The remaining structure is the left
subtree of the head node. The node has BitNumber equal to 0. The
assignment of data to the augmented branch node is done in such a
way that the BitNumber in the augmented branch node is less than
or equal to that in the parent of the element node that contained
this data.
(4) Replace the original pointers to element nodes by pointers to
the respective augmented branch nodes.
Figure 10.48: An Example of
Patricia
0
1100
1
0000
3
0010
4
0001
2
1001
4
1000
Figure 10.49: Insertion Into
Patricia
0
root
0
1
1000
(a) 1000 inserted
1000
(b) 0010 inserted
4
4
1001
0
1000
1
0
1000
1
1
2
1100
1001
(d) 1100 inserted
(c) 1001 inserted
1
0010
0010
0
1000
0010
0 1000
3
0010
0000
3
2
0000
1100
4
1001
(e) 0000 inserted
0010
4
0001
2
1100
4
1001
(f) 0001 inserted
Analysis of Patricia
Insertion
• The complexity of Patricia insertion is O(h)
where h is the height of the Patricia.
• h can be as large as min{KeySize+1, n},
where KeySize is the number of bits in a
key and n is the number of elements.
• When the keys are uniformly distributed,
the height is O(log n).
Tries
• A trie is an index structure that is particularly useful when
key values are of varying size.
• It is the generalized of the binary trie.
• A trie is a tree of degree m ≥ 2 in which the branching at
any level is determined not by the entire key value, but by a
portion of it.
• A trie contains two types of nodes: element node, and
branch node.
– An element node has only data data member.
– A branch node contains pointers to subtrees.
• Assume each character is one of the 26 letters of the
alphabet, a branch node has 27 pointer data members. The
extra data member is used for the blank character
(denoted as b). It is used to terminates all keys.
Figure 10.50: Trie Created Using Characters
Of Key Value From Left To Right, One At A
Time
b a b c
l
g
a
u
o
h
t
w
u
o
h
oriole
d
bluebird
bunting
wren
s
r
gull
cardinal chickadee
godwit
goshawk
thrasher
a
u
thrush
Figure 10.51: Trie Showing Need
For A Terminal Character
t
o
b
to
g
together
Sampling Strategies
• Given a set of key values to be represented in an index, the
number of levels in the trie will depend on the strategy or
key sampling technique used to determine the branching at
each level.
• The trie structure we just discussed had sample(x,i) = ith
character.
• We could choose different sample functions that will result
in different trie structure.
• Ideally, with a fixed set of keys, we should be able to find a
best trie structure that has the fewest number of levels.
However, in reality, it is very difficult to do so. If we
consider dynamic situation with keys being added and
deleted, it is even more difficult. In such case, we wish to
optimize average performance.
• Without the knowledge of future key values, the best
sampling function probably is the randomized sampling
function.
Sampling Strategies (Cont.)
• The sampling strategy is not limited to one
character at a time. In fact, multiple
characters can be used in one sampling.
• In some cases, we want to limit the
number of levels. We can achieve this by
allowing nodes to hold more than one key
value.
• If the maximum number of levels allowed
is l, then all key values that are synonyms
up to level l-1 are entered into the same
element node.
Figure 10.52: Trie Constructed For Data Of
Figure 10.50 Sampling One Character At A
Time, From Right To Left
b a b c d e f g h i j k l m n o p q r s t u v w x y z
bluebird
bunting
goshawk
thrush
e
a
l
l
cardinal
chickadee
wren
oriole
gull
thrasher
godwit
Figure 10.53: An Optimal Trie For The Data Of
Figure 10.50 Sampling On The First Level Done By
The Fourth Character Of The Key Values
b a b c d e f g h i j k l m n o p q r s t u v w x y z
thrasher cardinal goshawk
chickadee
bluebird
wren
gull
bunting
oriole
godwit
thrush
Figure 10.54: Trie Obtained For Data Of Figure
10.50 When Number Of Levels Is Limited To 3;
Keys Have Been Sampled From Left To Right, One
Character At A Time
b a b c d e f g h i j k l m n o p q r s t u v w x y z
l
bluebird
u
o
bunting
a
godwit
goshawk
h
cardinal chickadee
u
gull
oriole
h
thrasher
thrush
wren
Figure 10.55: Selection of Trie of Figure 10.50
Showing Changes Resulting from Inserting
Bobwhite and Bluejay
b
l
o
u
σ
u
bobwhite
δ1
e
δ2
b
j
δ3
ρ
bluebird
bluejay
bunting