Ch7.2-5, Separate Chaining, Open addressing, Rehashing

Download Report

Transcript Ch7.2-5, Separate Chaining, Open addressing, Rehashing

§3 Separate Chaining
---- keep a list of all keys that hash to the same value
struct ListNode;
typedef struct ListNode *Position;
struct HashTbl;
typedef struct HashTbl *HashTable;
struct ListNode {
ElementType Element;
Position Next;
};
typedef Position List;
/* List *TheList will be an array of lists, allocated later */
/* The lists use headers (for simplicity), */
/* though this wastes space */
struct HashTbl {
int TableSize;
List *TheLists;
};
1/13
 Create an empty table
§3 Separate Chaining
HashTable InitializeTable( int TableSize )
{ HashTable H;
int i;
if ( TableSize < MinTableSize ) {
Error( "Table size too small" ); return NULL;
}
H = malloc( sizeof( struct HashTbl ) ); /* Allocate table */
if ( H == NULL ) FatalError( "Out of space!!!" );
H->TableSize = NextPrime( TableSize ); /* Better be prime */
H->TheLists = malloc( sizeof( List ) * H->TableSize ); /*Array of lists*/
if ( H->TheLists == NULL ) FatalError( "Out of space!!!" );
for( i = 0; i < H->TableSize; i++ ) { /* Allocate list headers */
H->TheLists[ i ] = malloc( sizeof( struct ListNode ) ); /* Slow! */
if ( H->TheLists[ i ] == NULL ) FatalError( "Out of space!!!" );
else H->TheLists[ i ]->Next = NULL;
}
……
TheLists
return H;
H
TableSize
}
……
2/13
 Find a key from a hash table
Position Find ( ElementType Key, HashTable H )
{
Position P;
List L;
§3 Separate Chaining
Your hash
function
L = H->TheLists[ Hash( Key, H->TableSize ) ];
P = L->Next;
while( P != NULL && P->Element != Key ) /* Probably need strcmp */
P = P->Next;
return P;
}
Identical to the code to perform a
Find for general lists -- List ADT
3/13
§3 Separate Chaining
 Insert a key into a hash table
void Insert ( ElementType Key, HashTable H )
{
Again?!
Position Pos, NewCell;
List L;
Pos = Find( Key, H );
if ( Pos == NULL ) { /* Key is not found, then insert */
NewCell = malloc( sizeof( struct ListNode ) );
if ( NewCell == NULL ) FatalError( "Out of space!!!" );
else {
L = H->TheLists[ Hash( Key, H->TableSize ) ];
NewCell->Next = L->Next;
NewCell->Element = Key; /* Probably need strcpy! */
L->Next = NewCell;
}
}
}

Tip: Make the TableSize about as large as the number of keys
expected (i.e. to make the loading density factor 1).
4/13
§4 Open Addressing
---- find another empty cell to solve collision (avoiding pointers)
Algorithm: insert key into an array of hash table
{
index = hash(key);
initialize i = 0 ------ the counter of probing;
while ( collision at index ) {
index = ( hash(key) + f(i) ) % TableSize;
if ( table is full ) break;
else i ++;
}
if ( table is full )
ERROR (“No space left”);
else
insert key at index;
}
Tip: Generally  < 0.5.
5/13
Collision
resolving
function.
f(0) = 0.
§4 Open Addressing
1. Linear Probing
f ( i ) = i ; /* a linear function */
〖Example〗 Mapping n = 11 C library functions into a
hash
table ht[p ]iswith
b = 26 buckets and s = 1.
Although
small,
the worst case can be bucket x search time
Cause
primary
clustering:
any key
acos atoi char define
exp
0
acos
1
that hashes into theLARGE.
cluster will add
1
atoi
2
ceil cos float atol floor ctime
to the cluster after several attempts
to resolve the collision.
Loading density  = 11 / 26 = 0.42
Average search time = 41 / 11 = 3.73
Analysis of the linear probing show
that the expected number of probes
2
3
4
5
6
7
8
9
10
……
25
char
define
exp
ceil
cos
float
atol
floor
ctime
 12 (1  (11 )2 ) for insertionsand unsuccessful searches
p
 12 (1  11 ) for successfulsearches = 1.36
6/13
1
1
1
4
5
3
9
5
9
§4 Open Addressing
2. Quadratic Probing
f(i)=i2;
/* a quadratic function */
【Theorem】If quadratic probing is used, and the table size is prime,
then a new element can always be inserted if the table is at least half
empty.
Proof: Just prove that the first TableSize/2 alternative locations are
all distinct. That is, for any 0 < i  j  TableSize/2, we have
( h(x) + i 2 ) % TableSize  ( h(x) + j 2 ) % TableSize
Suppose:
then:
h(x) + i 2 = h(x) + j 2
i2=j2
(i+j)(ij)=0
TableSize is prime
Contradiction !
( mod TableSize )
( mod TableSize )
( mod TableSize )
either ( i + j ) or ( i  j ) is divisible by TableSize
For any x, it has  TableSize/2  distinct locations into which it can go.
If at most TableSize/2 positions are taken, then an empty spot can
always be found.
7/13
§4 Open Addressing
Note: If the table size is a prime of the form 4k + 3, then the
quadratic probing f(i) =  i 2 can probe the entire table.
Read Figures 7.15 - 7.16 for detailed
representations and implementations of initialization.
Position Find ( ElementType Key, HashTable H )
What if these
f(i)=f(i1)+2i1
{ Position CurrentPos;
Faster
two
conditions
where
2* isthan
really
int CollisionNum;
mod
are
What
returned?
aisswitched?
bit
shift
CollisionNum = 0;
CurrentPos = Hash( Key, H->TableSize );
while( H->TheCells[ CurrentPos ].Info != Empty &&
H->TheCells[ CurrentPos ].Element != Key ) {
CurrentPos += 2 * ++CollisionNum  1;
if ( CurrentPos >= H->TableSize ) CurrentPos  = H->TableSize;
}
return CurrentPos;
}
8/13
§4 Open Addressing
void Insert ( ElementType Key, HashTable H )
{
Position Pos;
Pos = Find( Key, H );
if ( H->TheCells[ Pos ].Info != Legitimate ) { /* OK to insert here */
H->TheCells[ Pos ].Info = Legitimate;
H->TheCells[ Pos ].Element = Key; /* Probably need strcpy */
}
}
Question: How to delete a key?
Note:  Insertion will be seriously slowed down if there are too
many deletions intermixed with insertions.
 Although primary clustering is solved, secondary clustering
occurs – that is, keys that hash to the same position will probe
the same alternative cells.
9/13
§4 Open Addressing
3. Double Hashing
f ( i ) = i * hash2( x );
/* hash2( x ) is the 2nd hash function */
 hash2( x )  0 ;  make sure that all cells can be probed.
 Tip: hash2( x ) = R – ( x % R ) with R a prime smaller than
TableSize, will work well.
Note:  If double hashing is correctly implemented,
simulations imply that the expected number of
probes is almost the same as for a random collision
resolution strategy.
 Quadratic probing does not require the use of a
second hash function and is thus likely to be
simpler and faster in practice.
10/13
§5 Rehashing
 Build another table that is about twice as
Because
big;Uhhhh…
Oh
Practically
come
on!
Haven’t
speaking
we had
I
enjoy
giving
you
headaches
…hash table for
What
will
happen

Scan
down
the
entire
original
insertion
might
fail

Then
what
can
we
do?
I would
enough
prefer
hashing
to
use
quadratic
methods?
hashing…
kidding

if the non-deleted
tableJust
is more
than
half
full?
elements;
What,
Why
anything
do
we
need
wrong
rehashing
with it?
?
Say which probing
method
 Use a new function to hash those elements
do you like?
into the new table.
If there are N keys in the table, then T (N) = O(N)
Question: When to rehash?
Answer:
 As soon as the table is half full
 When an insertion fails
 When the table reaches a certain load factor
11/13
§5 Rehashing
Note: Usually there should have been N/2 insertions
before rehash, so O(N) rehash only adds a
constant cost to each insertion.
Home work:
However, in an interactive system, the
p.259
7.1 &insertion
7.2 caused a
unfortunate
user whose
rehash
couldcase
see a slowdown.
A test
for all kinds
of hashing
Read Figures 7.23
for detailed implementation of rehashing.
12/13
Laboratory Project 3
Jumping the Queue
Due: Thursday, November 2nd, 2006 at 10:00pm
Detailed requirements can be downloaded from
http://10.71.45.99/list.asp?boardid=47
Courseware Download
Don’t forget to sign
you names
and duties at the end of
your report.
13/13