ITEC 2620M Introduction to Data Structures Instructor: Prof. Z. Yang Course Website:
Download ReportTranscript ITEC 2620M Introduction to Data Structures Instructor: Prof. Z. Yang Course Website:
ITEC 2620M Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: http://people.math.yorku.ca/~zyang/it ec2620m.htm Office: TEL 3049 HASHING Key Points of this Lecture • Hash tables • Hash functions • Collision resolution and clustering • Deletions 3 Indices vs. Keys • Each key/record is associated with an array slot • We could map each key to each slot – e.g. last name to apartment number • We could then search either the array (unsorted?) or a look-up table (sorted?) • However, what if the look-up is actually a calculated function? – eliminate look-up! 4 Hash Functions • A hash function h() converts a key (integer, string, float, etc) into a table index • Example 5 Hash Tables • Records are stored in slots specified by a hash function • Look-up/store – Convert key into a table index with hash function h() •h(key) = index – Find record/empty slot starting at index = h(key) (use resolution policy if necessary) 6 Comments • Hash function should evenly distribute keys across table – not easy given unspecified input data distribution • Hash table should be about half full – note: time-space tradeoff • more space -> less time (and already twice as much space as a sorted array) – if half full, 50% chance of one collision • 25% chance of two collisions • etc... • 2 accesses on average (approaches n as table fills) 7 How to do better • What to do with collisions? – linear probing (“classic hashing”) • if collision, search spaces sequentially • To eliminate clustering, we would like each remaining slot to have equal probability • Can’t use random – needs to be reproducable • Pseudo-random probing (see text) • Goal of random probing? --> cause divergence – Probe sequences should not all follow same path 8 Quadratic Probing • Simple divergence method • Linear probing – ith probe is i slots away • Quadratic probing 9 Secondary Clustering • If multiple keys are hashed to the same index/home position, quadratic probing still follows the same path each time – This is secondary clustering • Use second hash function to determine probe sequence 10