ITEC 2620M Introduction to Data Structures Instructor: Prof. Z. Yang Course Website:

Download Report

Transcript ITEC 2620M Introduction to Data Structures Instructor: Prof. Z. Yang Course Website:

ITEC 2620M
Introduction to Data Structures
Instructor: Prof. Z. Yang
Course Website:
http://people.math.yorku.ca/~zyang/it
ec2620m.htm
Office: TEL 3049
HASHING
Key Points of this Lecture
• Hash tables
• Hash functions
• Collision resolution and clustering
• Deletions
3
Indices vs. Keys
• Each key/record is associated with an array
slot
• We could map each key to each slot
– e.g. last name to apartment number
• We could then search either the array
(unsorted?) or a look-up table (sorted?)
• However, what if the look-up is actually a
calculated function?
– eliminate look-up!
4
Hash Functions
• A hash function h() converts a key
(integer, string, float, etc) into a table
index
• Example
5
Hash Tables
• Records are stored in slots specified by
a hash function
• Look-up/store
– Convert key into a table index with hash
function h()
•h(key) = index
– Find record/empty slot starting at index =
h(key)
(use resolution policy if necessary)
6
Comments
• Hash function should evenly distribute keys
across table
– not easy given unspecified input data distribution
• Hash table should be about half full
– note: time-space tradeoff
• more space -> less time
(and already twice as much space as a sorted array)
– if half full, 50% chance of one collision
• 25% chance of two collisions
• etc...
• 2 accesses on average
(approaches n as table fills)
7
How to do better
• What to do with collisions?
– linear probing (“classic hashing”)
• if collision, search spaces sequentially
• To eliminate clustering, we would like each
remaining slot to have equal probability
• Can’t use random – needs to be reproducable
• Pseudo-random probing (see text)
• Goal of random probing? --> cause divergence
– Probe sequences should not all follow same path
8
Quadratic Probing
• Simple divergence method
• Linear probing – ith probe is i slots away
• Quadratic probing
9
Secondary Clustering
• If multiple keys are hashed to the same
index/home position, quadratic probing
still follows the same path each time
– This is secondary clustering
• Use second hash function to determine
probe sequence
10