Hashing - Simpson College

Download Report

Transcript Hashing - Simpson College

Mark Allen Weiss: Data Structures and Algorithm Analysis in Java
Chapter 5: Hashing
Hash Tables
Lydia Sinapova, Simpson College
Hashing
 What is Hashing?
 Direct Access Tables
 Hash Tables
2
Hashing - basic idea
 A mapping between the search
keys and indices - efficient
searching into an array.
 Each element is found with one
operation only.
3
Hashing example
Example:
 1000 students,


identification number between 0 and 999,
use an array of 1000 elements.
 SSN of each student a 9-digit number.

much more elements than the number of
the students, - a great waste of space.
4
The Approach
 Directly referencing records in a table
using arithmetic operations on keys to
map them onto table addresses.
 Hash function: function that transforms
the search key into a table address.
5
Direct-address tables
 The most elementary form of hashing.
 Assumption – direct one-to-one
correspondence between the keys and
numbers 0, 1, …, m-1., m – not very large.
 Array A[m]. Each position (slot) in the array
contains a pointer to a record, or NIL.
6
Direct-Address Tables
 The most elementary form of hashing
 Assumption – direct one-to-one
correspondence between the keys and
numbers 0, 1, …, m-1., m – not very large.
 Array A[m]. Each position (slot) in the array
contains either a reference to a record, or
NULL.
 Cost – the size of the array we need is
determined the largest key. Not very useful if
there are only a few keys
7
Hash Functions
Transform the keys into
numbers within a
predetermined interval to be
used as indices in an array
(table, hash table) to store the
records
8
Hash Functions – Numerical
Keys
Keys – numbers
If M is the size of the array, then
h(key) = key % M.
This will map all the keys into
numbers within the interval
[0 .. (M-1)].
9
Hash Functions – Character
Keys
Keys – strings of characters
Treat the binary representation of a
key as a number, and then apply the
hash function
10
How keys are treated as numbers
If each character is represented with
m bits,
then the string can be treated as
base-m number.
11
Example
A
K
E
Y:
00001 01011 00101 11001 =
1 . 323 + 11 . 322 + 5 . 321 + 25 . 320 =
44271
Each letter is represented by its
position in the alphabet. E.G, K is the
11-th letter, and its representation is
01011 ( 11 in decimal)
12
Long Keys
If the keys are very long, an overflow
may occur.
A solution to this is to apply the
Horner’s method
in computing the hash function.
13
Horner’s Method
anxn + an-1.xn-1 + an-2.xn-2 + … + a1x1 + a0x0 =
x(x(…x(x (an.x +an-1) + an-2 ) + …. ) + a1) + a0
4x5 + 2x4 + 3x3 + x2 + 7x1 + 9x0 =
x( x( x( x ( 4.x +2) + 3) + 1 ) + 7) + 9
The polynomial can be computed by
alternating the multiplication and addition operations
14
Example
VERYLONGKEY
10110 00101 10010 11001 01100 01111 01110 00111 01011 00101 11001
V
E
R
Y
L
O
N
G
K
E
Y
22
5
18
25
12
15
14
7
11
5
25
22 . 3210 + 5 . 329 + 18 . 328 + 25 . 327 + 12 . 326 +
15 . 325 + 14 . 324 + 7 . 323 + 11 . 322 + 5 . 321 +
25 . 320
15
Example (continued)
VERYLON GKEY
22 . 3210 + 5 . 329 + 18 . 328 + 25 . 327 + 12 . 326 +
15 . 325 + 14 . 324 + 7 . 323 + 11 . 322 + 5 . 321 + 25 . 320
(((((((((22.32 + 5)32 + 18)32 + 25)32 + 12)32 + 15)32 + 14)32 +
7)32 +11)32 +5)32 + 25
compute the hash function by applying the mod operation at
each step, thus avoiding overflowing.
h0 = (22.32 + 5) % M
h1 = (32.h0 + 18) % M
h2 = (32.h1 +25) % M
....
16
Code
int hash32 (char[] name, int tbl_size)
{
key_length = name.length;
int h = 0;
for (int i=0; i < key_length ; i++)
h = (32 * h + name[i]) % tbl_size;
return h;
}
17
Hash Tables
 Index - integer, generated by a hash
function between 0 and M-1
 Initially - blank slots.
 sentinel
value, or a special field in
each slot.
18
Hash Tables
 Insert - hash function to
generate an address
 Search for a key in the table -
the same hash function is used.
19
Size of the Table
Table size M - different from the
number of records N
Load factor: N/M
M must be prime to ensure even
distribution of keys
20