Multiple Choice Hash Tables with Moves on Deletes and Inserts

Download Report

Transcript Multiple Choice Hash Tables with Moves on Deletes and Inserts

Multiple Choice Hash Tables with
Moves on Deletes and Inserts
Adam Kirsch
Michael Mitzenmacher
Hashing : Modern Perspective
• For many situations (e.g., hardware for
routers) multiple choice hash tables are stateof-the-art.
– Each item gets d possible hash locations, placed
in one.
• Moving items among choices (e.g., cuckoo
hashing) greatly improves space utilization.
– Only cost : may take many moves per insert.
Previously
• Schemes that move at most 1 item per
insertion.
– Limit cost of cuckoo hashing.
• Schemes that batch move operations in a
queue.
– Amortize cost of cuckoo hashing.
• Using content addressable memories (CAMs)
to reduce chance of overflow.
– Small CAMs yield big gains.
Contributions
• Consider potential of moving items on
deletions.
– Focus on one move per deletion/insertion.
• Examine alternative approach using
weaker hashing from [KTC, Peacock
Hashing].
– Analyze limits of performance.
Multilevel Hash Table [BK90]
• Use a multilevel hash table (MHT)
– Can store n elements with d = log log n +
O(1) levels in O(n) space with high
probability
Level
– Example with d = 4 hash functions
1
2
x
Skew: more elements placed
by early hash functions
(double exponential decay)
3
4
Second Chance (SC) Scheme
• Standard MHT fills from top down
– elements cascade from table to table.
– We try to slow cascade at every step.
x
Standard MHT Insertion
Second Chance (SC) Scheme
• Standard MHT fills from top down
– elements cascade from table to table.
– We try to slow cascade at every step.
x
Second Chance (SC) Scheme
• Standard MHT fills from top down
– elements cascade from table to table.
– We try to slow cascade at every step.
x
CAMs
• Last few collisions hard to stop.
– Can waste lots of space on few items.
• Solution : content addressable memory.
– CAMs fully asociative.
– Hold small numbers of items.
Moves on Deletions
• Harder to manage.
• What item to move up?
Level
1
2
x
3
4
Hint-Based Approach
• Each cell stores hint for where an item to
move on delete is held.
• Hints can be kept fairly small.
– About log n bits.
• Various hint approaches possible.
– We found “replace hint on any collision” works
well.
– May depend on item lifetime distribution, etc.
– One move, recursive move variations.
Simulation Data
• No current method of analysis for hints.
– Use simulations. 10,000 trials per data point.
• MHT levels decreasing in size by factor of 2.
Plus small CAM.
• With n items, top level has size n.
– Space usage just above 50%.
• Load table to n elements, alternate
inserts/deletes for 218 steps.
– Exponentially distributed lifetimes.
• Goal : how many hash functions needed?
Simulation Results
Scheme
No moves
Items
Hash
Average
Functions
Stash
32768
13
4.225
Second Chance 32768
Max
Stash
31
6
0.001
2
Hint, 1 Move
32768
7
0.013
3
Hint, Moves
32768
6
0.246
7
Hint,1Move+SC 32768
4
4.678
18
Hint,Moves+SC 32768
4
0.911
9
Lessons from Simulations
• No moves very weak.
• Second Chance (move on insert) more
powerful than hint-based move on
delete.
• But the two combine well.
– Four hash functions: better than 50% load,
small CAM.
Alternative : Weak Hashes
• To avoid hints, overflow at each bucket
splits to two buckets at next level.
– Each bucket receives from four buckets.
• Less spreading of items, but know
where to look on deletes.
• Conjecture : loss of randomness implies
weak performance.
Picturing Weak Hashes
Two Idealized Schemes
• Each bucket holds random item, splits rest.
• Each bucket counts items passed to bucket A
and bucket B at next level, greedily holds item
from bucket with larger count.
• Assume invariants kept over
insertions/deletions at all times.
• Can be analyzed recursively level by level.
– Get distribution of bucket loads at each level.
– Obtain average case peformance.
Results
Scheme
Items
Hash
Functions
Average
Stash
Random
32768
6
2.470
Greedy
32768
6
0.182
Second
Chance
32768
6
0.001
Conclusions
• Weak hashes, based on buckets, much less
effective than hints.
– Even under optimistic assumptions.
• One move approaches effective.
– Move on insert/delete complement each other.
• Need methods for analysis.
– Challenging dependencies; hard to get exact
numbers.