Structures to Manage External Storage

Download Report

Transcript Structures to Manage External Storage

Preliminaries

• Multiway trees have nodes with greater than two children. Multiway trees of order k have nodes with most k children • 2-3-4 Trees – For all

non

leaf nodes, Nodes with • One data items have two pointers • Two data items have three pointers • Three data items have four pointers – Children of pointer p have keys less than data item p.

– Children of the last pointer contains keys > than the last data item.

• B-Trees (Balanced, Boeing, broad, bushy, or Bayer (for Rudolph Bayer)??) – Each node contains links to as many children as can fit in a disk block .

Node Structures

• 2-3-4 tree

typedef struct Nodelink {

int numElems; Item *items[3]; struct Nodelink*links[4];

} Node;

• B-Tree

typedef struct Nodelink {

Item[k] items; Nodelink[k+1] nodes;

} Node;

2-3-4 Insertion Algorithm

• •

Insert( node )

If node is full

Then Call

splitNode If key is found in node, then

Return

DuplicatesNotAllowed

If

this is a leaf node,

Insert

the Data item and

Return Call

Insert(

appropriateChildPointer

)

SplitNode

Allocate a

newNode

and add the right child to it

If

parent exists

Then

Insert middleChild to parent node and point to

newNode

Else

Allocate new Root containing

middleChild

root’s

firstChildPointer

points to

newNode

root’s

secondChildPointer

points to

node

of

node

2-3-4 Deletion Algorithm

• Find the node to delete. If it is not a leaf node, replace its data by its successor, and then remove the successor.

• Cases to consider when deleting an item from a 2-3-4 node: 1.

If more than one item remains in a leaf node that contains the item to delete, simply remove it 2.

3.

If the item to delete is the only one in the node a. If there is a sibling with more than entry, then promote sibling and demote parent (possibly cascading) till the node to delete has a spare entry. Then delete the item in question b.

If all sibling nodes have only one entry, demote the parent and merge it with the sibling and then delete the current node. If the parent node now is empty. Recursively, traverse up the tree applying the above steps needed. If the root node becomes empty, simply remove it from the tree.

Visual Illustration of the 2-3-4-Delete

Case 1: 11, 22, 33 Case 2: 11, 22, 33 08, 09 12 Case 3: 08 11 12 08 08,11 11, 33 09, 22, 33 11 The algorithm recursively works its way up the tree

Characteristics of External Storage

• Speed is at least three orders of magnitude slower than memory.

• The extra overhead of searching through multiway tree nodes is more than compensated because less tree depth means less disk access.

• It is desirable to design the record sizes with disk block sizes in mind. Each disk read/write will be in multiples of its block size.

B-Tree Insertion Algorithm

• Differences from the 2-3-4 algorithm – Node splitting is from the bottom up rather than the top down.

• Advantage: The tree is kept more full.

• Disadvantage: A tree down could be followed by a tree up if multiple splits are necessary.

– Half of the items go to the new node, half remain in the old node.

– The middle key is promoted to the next level up.

– Contraction occurs when a node and a sibling have less than a full block of data items.

Note:

Standard B-tree implementations require at least half full nodes.

External Storage Optimizations

• It is more efficient to keep the index and data separate – Separate indices allow for multi-keyed files • Refinements exist to guarantee that no record is less than 2/3 full. Nodes are balanced over three siblings.

• Some implementations only have data pointers at the last level.

• A linked list of free disk blocks is often used to reclaim storage space after deletions.

• Efficiency: Assume a block contains 8096 bytes, each key is 24 bytes, the blocks are half full, and the pointers require 4 bytes. How many levels deep is the tree?

Other External Storage Algorithms • Create binary tree in memory for the index • Sorting external data with a type of merge sort

– On Each pass • Read large block from each piece of the file • Perform merge • Write back to second file • Keep reading blocks from each half until they run out.

– There will be log k N merges where k is the number of data elements that can fit in the memory blocks.