Index Structures for Files

Download Report

Transcript Index Structures for Files

Index Structures for Files
• Indexes speed up the retrieval of records
under certain search conditions
• Indexes called secondary access paths do
not affect the placement of records
 They provide fast access for searches based on
the indexing field
• Some types of indexes work only in
conjunction with a certain file organization
Index Structures for Files
• Single level ordered indexes allow us to
search for a record by searching the index
file using binary search
• The index is typically defined on a single
field of the file called the indexing field
• There are several kinds of ordered indexes
 A primary index is specified on the ordering
key field of an ordered file of records
Index Structures for Files
 If the ordering field is not unique, a clustering
index can be used
 We can specify a secondary index on any nonordering field of the file
• A primary index is an ordered file whose
records have two fields
 Key field
 Pointer to a disk block
Index Structures for Files
 There exists one index entry for each block in the data
file
 Note that this only works for ordered files. Why?
• The first record in each block is called the anchor
record of the block or the block anchor
• A primary index is an example of a nondense
index since we don’t have a pointer to every
record in the data file
Index Structures for Files
• Insertion of records can be handled with an
unordered overflow file and periodic
maintenance
• Deletion of records can be handled with
deletion markers and periodic maintenance
• If the ordering field is not unique, we use a
clustering index
 It is common to reserve a block for each value
of the ordering field
Index Structures for Files
• We can also create a secondary index on
any non-ordering field of the data file
 Since the data file is not ordered on the index
field, we must have a pointer to each record
(not to blocks)
 This is an example of a dense index
 Larger than primary index file since it is dense
Index Structures for Files
• Note that the index file is an ordered file
• We can create a primary index on this file
with block anchoring to speed up access to
this file
• Repeat as necessary
• This leads to the idea of a multilevel index
as illustrate in figure 5.6
Index Structures for Files
• B-trees and B+ trees are specialized tree
structures
• A tree is formed of nodes
• Each node in the tree has one parent node
and zero or more child nodes - except for
the root node which has no parents
• A node that has no children is called a leaf
node
Index Structures for Files
• The level of the root node is zero and the
level of every other node is one more than
that of its parent
• A subtree of a node consists of the node
and all of its descendant nodes (its
children, grandchildren, etc.)
• Each node of a tree used as an indexing
structure can contain data and several
pointers
Index Structures for Files
• A search tree is a specialized type of tree
used to guide a search
 A search tree of order p is a tree with at most p1 search values and p pointers to sub-trees
 <P1,K1,P2,K2,…,Pq-1,Kq-1,Pq>
 Each value in the subtree pointed to by P1 is
less than K1 and each value in the subtree
pointed to by P2 is greater than K1 (true also for
the other subtrees
Index Structures for Files
• To find a value in a search tree search the
root node, and if the value is not found,
search in the appropriate subtree. If there is
no subtree (we are at a leaf node) the value
doesn’t exist in the tree
• Insertion and deletion in a search tree will
usually cause a search tree to be
unbalanced
Index Structures for Files
• Record deletion may leave some nodes
nearly empty, wasting space
• The B-tree, a constrained search tree,
solves these two problems
• A B-tree of order p has the following
structure for internal nodes
 <P1, <K1, Pr1>, P2, <K2, Pr2>, …,<Kq-1,Prq-1
>,Pq>
Index Structures for Files




Each Pi is a tree pointer
Each Pri is a data pointer
K1<K2<…<Kq-1
The values in the subtree are between the values of the
two neighboring key values
 Each node has at most p tree pointers
 Each node (except the root) has at least ceiling(p/2) tree
pointers
 A node with q tree pointers has q-1 search key field
values
Index Structures for Files
• If there are empty spaces in a node,
insertion is simple
• Otherwise, the node is split, and the middle
value is promoted (moved up to the parent
node). This in turn, may cause that node to
be split
• Deletion is (relatively!) easy if there are
more than p/2 values after the deletion
Index Structures for Files
• Replace the deleted value with the next
value if the node is non-leaf, then delete the
value
• If there are not enough values in the node,
combine with left or right neighbor and
redistribute
• If we don’t have extras in the right or left
neighbor, combine with the parent
Index Structures for Files
• B+ trees are a variation of B-trees
 Each data value occurs in a leaf node along
with a pointer to the associated record
 The leaf nodes are chained together to provide
fast sequential access to the data records