Slide 1

Transcript Slide 1

Today

• Review of Directory of Slot Block Organizations • Heap Files • Program 1 Hints • Ordered Files & Hash Files • RAID

Directory of Slots Example

Heap Files

• Heap files are stored as unordered records – the use of “heap” here is unrelated to the “free store” used for dynamic memory allocation • The simplest organization

Name ID Jones, Jim Favor, Sue 983 123 Mach, Chris 401 Rodgers, Bill 616 Paradise, Sal 231 block 1 of file

Heap File Example

Name ID Smith, Mary 183 Alm, Louis 14 Link, Steve 93 Patch, Linda 281 Yost, Ned 819 block 2 of file Name ID Ming, Yao 709 Turing, Alan 933 block N of file assuming N data blocks, R records per block, I/O cost of D and “record processing time” of C what are the costs of: 2D+C 2D+C N(D+RC) N(D+RC)/2 N(D+RC)

inserting a record, deleting record given RID scan,

(ignore reclaiming space)

search for “key” w/ equality selection, search w/ range selection?

Prog 1 Hints

• check slot numbers to make sure they are valid •

sizeof(), memcpy(), memmove()

• • –

man

is your friend

test patterns –

error cases correctly

error reporting

make sure you handle

class HFPage { struct slot_t { short offset; short length; }; // equals EMPTY_SLOT if slot is not in use static const int DPFIXED = sizeof(slot_t) + 4 * sizeof(short)+ 3 * sizeof(PageId); short slotCnt; // number of slots in use short usedPtr; // offset of first used byte in data[] short freeSpace; // number of bytes free in data[] short type; // an arbitrary value used by subclasses as needed PageId prevPage; // backward pointer to data page PageId nextPage; // forward pointer to data page PageId curPage; // page number of this page slot_t slot[1]; // first element of slot array.

char data[MAX_SPACE - DPFIXED]; // methods ...

// ********************************************************** // page class constructor

void HFPage::init(PageId pageNo){ nextPage = prevPage = INVALID_PAGE; slotCnt = 0; // no slots in use curPage = pageNo; usedPtr = sizeof(data); // offset of used space in data array } freeSpace = sizeof(data) + sizeof(slot_t); // amount of space available // (initially one unused slot)

•

init()

•

getNextPage(), setNextPage()

•

getPrevPage(), setPrevPage()

•

insertRecord(), deleteRecord()

•

firstRecord(), nextRecord()

•

getRecord(), returnRecord()

•

available_space()

•

empty()

int HFPage::available_space(void) { // look for an empty slot. if one exists, then freeSpace // bytes are available to hold a record.

int i; for (i=0; i < slotCnt; i++) { if (slot[i].length == EMPTY_SLOT) return freeSpace; } // no empty slot exists. must reserve sizeof(slot_t) bytes // from freeSpace to hold new slot.

return freeSpace - sizeof(slot_t); }

Ordered Files

• Also called a

sequential

file.

• File records are kept sorted by the values of an

ordering field

• Insertion is expensive: records must be inserted in the correct order.

– It is common to keep a separate unordered

overflow

(or

transaction

) file for new records to improve insertion efficiency; this is periodically merged with the main ordered file.

• A

binary search

can be used to search for a record on its

ordering field

value.

– This requires reading and searching log 2 of the file blocks on the average, an improvement over linear search.

• Reading the records in order of the ordering field is quite efficient.

File of Ordered Records

Hashed Files

• Hashing for disk files is called

External Hashing

• The file blocks are divided into M equal-sized

buckets

, numbered bucket 0 , bucket 1 , ..., bucket M-1 .

– Typically, a bucket corresponds to one (or a fixed number of) disk block.

• One of the file fields is designated to be the

hash key

of the file.

• The record with hash key value K is stored in bucket i, where i=h(K), and h is the

hashing function

• Search is very efficient on the hash key.

• Collisions occur when a new record hashes to a bucket that is already full.

– An overflow file is kept for storing such records.

– Overflow records that hash to each bucket can be linked together.

Hashed Files (contd.)

• There are numerous methods for collision resolution, including the following: –

Open addressing

: Proceeding from the occupied position specified by the hash address, the program checks the subsequent positions in order until an unused (empty) position is found. –

Chaining

: For this method, various overflow locations are kept, usually by extending the array with a number of overflow positions. In addition, a pointer field is added to each record location. A collision is resolved by placing the new record in an unused overflow location and setting the pointer of the occupied hash address location to the address of that overflow location. –

Multiple hashing

: The program applies a second hash function if the first results in a collision. If another collision results, the program uses open addressing or applies a third hash function and then uses open addressing if necessary.

Hashed Files (contd.)

• To reduce overflow records, a hash file is typically kept 70-80% full.

• The hash function h should distribute the records uniformly among the buckets – Otherwise, search time will be increased because many overflow records will exist.

• Main disadvantages of static external hashing: – Fixed number of buckets M is a problem if the number of records in the file grows or shrinks.

– Ordered access on the hash key is quite inefficient (requires sorting the records).

Hashed Files - Overflow handling

Fill in This Table

Heap Sorted Hashed

scan equality range insert delete search search