Transcript Slide 1
Today
• Review of Directory of Slot Block Organizations • Heap Files • Program 1 Hints • Ordered Files & Hash Files • RAID
Directory of Slots Example
Heap Files
• Heap files are stored as unordered records – the use of “heap” here is unrelated to the “free store” used for dynamic memory allocation • The simplest organization
Name ID Jones, Jim Favor, Sue 983 123 Mach, Chris 401 Rodgers, Bill 616 Paradise, Sal 231 block 1 of file
Heap File Example
Name ID Smith, Mary 183 Alm, Louis 14 Link, Steve 93 Patch, Linda 281 Yost, Ned 819 block 2 of file Name ID Ming, Yao 709 Turing, Alan 933 block N of file assuming N data blocks, R records per block, I/O cost of D and “record processing time” of C what are the costs of: 2D+C 2D+C N(D+RC) N(D+RC)/2 N(D+RC)
inserting a record, deleting record given RID scan,
(ignore reclaiming space)
search for “key” w/ equality selection, search w/ range selection?
Prog 1 Hints
• check slot numbers to make sure they are valid •
sizeof(), memcpy(), memmove()
• • –
man
is your friend
test patterns –
error cases correctly
error reporting
make sure you handle
class HFPage { struct slot_t { short offset; short length; }; // equals EMPTY_SLOT if slot is not in use static const int DPFIXED = sizeof(slot_t) + 4 * sizeof(short)+ 3 * sizeof(PageId); short slotCnt; // number of slots in use short usedPtr; // offset of first used byte in data[] short freeSpace; // number of bytes free in data[] short type; // an arbitrary value used by subclasses as needed PageId prevPage; // backward pointer to data page PageId nextPage; // forward pointer to data page PageId curPage; // page number of this page slot_t slot[1]; // first element of slot array.
char data[MAX_SPACE - DPFIXED]; // methods ...
// ********************************************************** // page class constructor
void HFPage::init(PageId pageNo){ nextPage = prevPage = INVALID_PAGE; slotCnt = 0; // no slots in use curPage = pageNo; usedPtr = sizeof(data); // offset of used space in data array } freeSpace = sizeof(data) + sizeof(slot_t); // amount of space available // (initially one unused slot)
•
init()
•
getNextPage(), setNextPage()
•
getPrevPage(), setPrevPage()
•
insertRecord(), deleteRecord()
•
firstRecord(), nextRecord()
•
getRecord(), returnRecord()
•
available_space()
•
empty()
int HFPage::available_space(void) { // look for an empty slot. if one exists, then freeSpace // bytes are available to hold a record.
int i; for (i=0; i < slotCnt; i++) { if (slot[i].length == EMPTY_SLOT) return freeSpace; } // no empty slot exists. must reserve sizeof(slot_t) bytes // from freeSpace to hold new slot.
return freeSpace - sizeof(slot_t); }
Ordered Files
• Also called a
sequential
file.
• File records are kept sorted by the values of an
ordering field
.
• Insertion is expensive: records must be inserted in the correct order.
– It is common to keep a separate unordered
overflow
(or
transaction
) file for new records to improve insertion efficiency; this is periodically merged with the main ordered file.
• A
binary search
can be used to search for a record on its
ordering field
value.
– This requires reading and searching log 2 of the file blocks on the average, an improvement over linear search.
• Reading the records in order of the ordering field is quite efficient.
File of Ordered Records
Hashed Files
• Hashing for disk files is called
External Hashing
• The file blocks are divided into M equal-sized
buckets
, numbered bucket 0 , bucket 1 , ..., bucket M-1 .
– Typically, a bucket corresponds to one (or a fixed number of) disk block.
• One of the file fields is designated to be the
hash key
of the file.
• The record with hash key value K is stored in bucket i, where i=h(K), and h is the
hashing function
.
• Search is very efficient on the hash key.
• Collisions occur when a new record hashes to a bucket that is already full.
– An overflow file is kept for storing such records.
– Overflow records that hash to each bucket can be linked together.
Hashed Files (contd.)
Hashed Files (contd.)
• There are numerous methods for collision resolution, including the following: –
Open addressing
: Proceeding from the occupied position specified by the hash address, the program checks the subsequent positions in order until an unused (empty) position is found. –
Chaining
: For this method, various overflow locations are kept, usually by extending the array with a number of overflow positions. In addition, a pointer field is added to each record location. A collision is resolved by placing the new record in an unused overflow location and setting the pointer of the occupied hash address location to the address of that overflow location. –
Multiple hashing
: The program applies a second hash function if the first results in a collision. If another collision results, the program uses open addressing or applies a third hash function and then uses open addressing if necessary.
Hashed Files (contd.)
• To reduce overflow records, a hash file is typically kept 70-80% full.
• The hash function h should distribute the records uniformly among the buckets – Otherwise, search time will be increased because many overflow records will exist.
• Main disadvantages of static external hashing: – Fixed number of buckets M is a problem if the number of records in the file grows or shrinks.
– Ordered access on the hash key is quite inefficient (requires sorting the records).
Hashed Files - Overflow handling
Fill in This Table
Heap Sorted Hashed
scan equality range insert delete search search