Packet Level Algorithms

Download Report

Transcript Packet Level Algorithms

Some Open Questions Related to
Cuckoo Hashing
Michael Mitzenmacher
Harvard University
The Beginnings
Cuckoo Hashing
• Basic scheme: each element gets two
possible locations (uniformly at random).
• To insert x, check both locations for x. If
one is empty, insert.
• If both are full, x kicks out an old element y.
Then y moves to its other location.
• If that location is full, y kicks out z, and so
on, until an empty slot is found.
Cuckoo Hashing Examples
A
B
E
D
C
Cuckoo Hashing Examples
A
B
C
F
E
D
Cuckoo Hashing Examples
A
B
E
D
C F
Cuckoo Hashing Examples
A
B
G
E
D
C F
Cuckoo Hashing Examples
E
G
B
A
D
C F
Cuckoo Hashing Examples
A
B
C
G
E
D
F
Why Do We Care
About Cuckoo Hashing?
• Hash tables a fundamental data structure.
• Multiple-choice hashing yields tables with
– High memory utilization.
– Constant time look-ups.
– Simplicity – easily coded, parallelized.
• Cuckoo hashing expands on this, combining
multiple choices with ability to move
elements.
• Practical potential, and theoretically
interesting!
Good Properties of Cuckoo Hashing
• Worst case constant lookup time.
• High memory utilizations possible.
• Simple to build, design.
Cuckoo Hashing Failures
• Bad case 1: inserted element runs into
cycles.
• Bad case 2: inserted element has very long
path before insertion completes.
– Could be on a long cycle.
• Bad cases occur with very small probability
when load is sufficiently low.
• Theoretical solution: re-hash everything if
a failure occurs.
Various Representations
Buckets
Buckets
Elements
Elements
Buckets
Buckets
Elements
Basic Performance
• For 2 choices, load less than 50%, n
elements gives failure rate of Q(1/n);
maximum insert time O(log n).
• Related to random graph representation.
– Each element is an edge, buckets are vertices.
– Edge corresponds to two random choices of an
element.
– Small load implies small acyclic or unicyclic
components, of size at most O(log n).
Natural Extensions
• More than 2 choices per element.
– Very different : hypergraphs instead of graphs.
– D. Fotakis, R. Pagh, P. Sanders, and P. Spirakis.
– Space efficient hash tables with worst case
constant access time.
• More than 1 element per bucket.
– M. Dietzfelbinger and C. Weidling.
– Balanced allocation and dictionaries with tightly
packed constant size bins.
Variations
• Online : Elements inserted and deleted as
you go.
– Constant expected time + logarithmic (or
polylogarithmic) time with high probability per
element.
• Offline : All elements available at start.
– Becomes a maximum matching problem.
– No real moving of elements -- equivalent to
offline version of multiple-choice hashing of
Azar, Broder, Karlin, and Upfal.
Open Question 1:
Random Walk Cuckoo Hashing
• More than 2 choices is important.
– Much higher memory utilizations.
– 3 choices : 90+% in experiments.
– 4 choices : about 97%.
• Analysis [FPSS] : Use breadth first search
on bipartite graph to find an augmenting
path.
– Not practical for many implementations.
Random Walk Cuckoo Hashing
• When it is time to kick something out,
choose one randomly.
• Small state, effective.
• Intuition : if fraction p of the buckets are
empty, random walk “should” have fraction
p of finding empty bucket at each step.
– Clearly wrong, but nice intuition.
– Suggests logarithmic time to find an empty slot.
The Open Question
• Find tight bounds on the performance of
random walk cuckoo hashing in the online
setting, for d > 3 choices (and possibly more
than one element per bucket).
Recent Progress
• Polylogarithmic bounds on insertion time
for large number of choices: RANDOM 09,
Frieze, Mitzenmacher, Melsted.
• Two step argument:
– Most buckets have an augmenting path of
length O(log log n) to an empty bucket. (Reach
empty bucket with inverse logarithmic
probability.)
– Expansion gives such a bucket is found after
O(log n) steps with high probability.
The Open Question
• Find tight bounds on the performance of random
walk cuckoo hashing in the online setting, for d >
3 choices (and possibly more than one element per
bucket).
• Is logarithmic insertion time the right answer?
Lower bounds?
• Better understanding of graph structure with 3 or
more choices.
Open Question 2:
Thresholds
• How much load can cuckoo hashing handle
before collisions overwhelm it?
• There appear to be asymptotic thresholds.
– Fine below the threshold, disaster after.
– Useful for designs for real systems.
• The case for 2 choices, 1 element per bucket
well understood.
• Less so for other cases.
The Open Question
• Tight thresholds for cuckoo hashing
schemes, and corresponding efficient
algorithms.
What’s Known
• 2 choices, 1 element per bucket well understood.
• For 2 choices, more than 1 element per bucket:
– Corresponds to orientability problems on random
graphs : orient edges so no more than k pointing to each
vertex.
– Offline thresholds known.
– Online (provable) thresholds weak.
• For more than 2 choices:
– Harder orientability problems.
– Online (provable) thresholds weak.
– Very close lower/upper bounds for offline setting.
New Result
• Dietzfelbinger, Goerdt, Mitzenmacher,
Montanari, Pagh have tight bounds on
offline thresholds, more than 2 choices, 1
item per bucket.
• Extension to more than 1 item per bucket
still open.
• Writeup (hopefully) coming soon…
What Was Known (Example)
• Case of 3 choices.
– Upper bound on load of 0.9183. [Batu Berenbrink
Cooper]
• Uses differential-equation based analysis of orientability
threshold.
– Lower bound of 0.8894 (offline). [Dietzfelbinger Pagh]
• Random maximum matching problem.
• Use random matrices with 3 ones per column to design
dictionary schemes. Bound corresponds to full-rank threshold
of such matrices.
• Upper bound is tight, using better bound on fullrank threshold.
The Open Question
• Tight thresholds for cuckoo hashing
schemes, and corresponding efficient
algorithms.
• Offline bounds for more than 2 choices.
• Offline bounds for more than 2 choices and
more than 1 item per bucket.
• Online bounds generally.
– Specific case of d = 3 especially interesting.
Open Question 3:
Stashes
• A failure is declared whenever one element
can’t be placed.
• Is that really necessary?
• What if we could keep one element
unplaced? Or eight? Or O(log n)?
• Goal : Reduce the failure probability.
• Second goal : Reduce moves per insert.
The Open Question
• What is the value of some extra space to
stash problematic elements?
Motivation : CAMs
• CAM = content addressable memory
– Fully associative lookup.
– Usually expensive, so must be kept small.
– Hardware solution, or a dedicated cache line in
software.
• Not usually considered in theoretical work,
but very useful in practice.
• Can we bridge this gap?
– What can CAMs do for us?
A CAM-Stash
• Use a CAM to stash away elements that would cause
failure.
– ESA 2008, Kirsch, Mitzenmacher, Wieder.
• Intuition: if failures were independent, probability
that s elements cause failures goes to Q(1/ns).
– Failures not independent, but nearly so.
– A stash holding a constant number of elements greatly
reduces failure probability.
– Implemented as hardware CAM or cache line.
• Lookup requires also looking at stash.
– But generally empty.
Analysis Method
• Treat cells as vertices,
elements as edges in
bipartite graph.
• Count components that
have excess edges to be
placed in stash.
• Random graph analysis to
bound excess edges.
6 vertices, 7 edges:
1 edge must go into stash.
A Simple Experiment
• 10,000 elements, table of size 24,000, 2
choices per element, 107 trials.
Stash Size
0
1
2
3
4
Needed Trials
9989861
10040
97
2
0
Generalizations
• Can similarly generalize known results for
cuckoo hashing with more than 2 choices,
more than 1 element per bucket.
• Stash of size s reduces failure exponent
linearly in s.
• Intuition: random graph analysis exposes
“bottleneck” in cuckoo hashing. Stashes
relieve the bottleneck.
CAM to Improve Insertion Time
• Lots of moves per insert in worst case.
– Average is constant.
– But maximum is W(log n) with non-trivial
(inverse-poly) probability.
• May want bounded number of memory
accesses per insert.
• Empirical study by Kirsch/Mitzenmacher.
A CAM-Queue
• Insertion is a sequence of suboperations.
– Of form “Move x to position Hj(x).”
• Use the CAM as a queue for pending suboperations.
• Perform suboperations from queue as available.
– Move attempt = 1 lookup/write.
– A suboperation may cause another suboperation to go on
the queue.
• Lookup: check the hash table and the CAM-queue.
• De-amortization
– Use queue to turn worst-case performance into averagecase performance.
Queue Sizes
• Need CAM sized to overflow with negligible
probability.
– Maximum queue size much bigger than average.
– Experiments suggest queues of size in small 10s
possible, with 4+ suboperations per insert, in practice.
• Recent work by Arbitman, Naor, Segev gives
provable bounds for logarithmic-sized queue for
2-choice cuckoo hashing, up to 50% loads.
– Analysis open for more than 2 choices.
The Open Question
• What is the value of some extra space to
stash problematic elements?
• Can these uses of stashes be similarly useful
for other data structures?
• Is there a general theory telling us the value
of constant/logarithmic/linear sized stashes?
Open Question 4:
Randomness
• Analysis always easier when assuming hash
functions are perfectly random.
• But perfect hash functions are unrealistic.
• What about real hash functions on real data?
The Open Question
• How much randomness is needed for
cuckoo hashing to be effective?
Universal Hash Families
• Defined by Carter/Wegman
• Family of hash functions L of form
H:[N]  [M] is k-wise independent if when
H is chosen randomly, for any distinct
x1,x2,…xk, and any a1,a2,…ak,

k
i1
Pr(H(x i )  ai ) 1/ M k
• Family is k-wise universal if
Pr(H(x1 )  H(x 2 )  ...  H(x k )) 1/ M k1

Recent Results
• For 2 choices, O(log n)-wise independence is
sufficient; [PR] show hash functions of Siegel
suffice.
• Queueing result of [ANS] uses new technology of
Braverman to show polylogarithmic-wise
independence suffices.
• Cohen and Kane show 5-independence not
enough; also show 1 O(log n)-wise independent
and 1 pairwise independent hash function suffice.
Another Approach : Random Data
• Previous analysis for worst-case data. What
about random data?
• Analysis usually trivial if data is
independently, uniformly chosen over large
universe.
– Then all hashes appear “perfectly random”.
• Not a good model for real data.
• Need intermediate model between worstcase, average case. [Mitzenmacher Vadhan]
A Model for Data
• Based on models of semi-random sources.
• Data is a finite stream, modeled by a
sequence of random variables X1,X2,…XT.
• Range of each variable is [N].
• Each stream element has some entropy,
conditioned on values of previous elements.
– Correlations possible.
– But each new element has some
unpredictability.
Applications
• Potentially, wherever hashing is used
–
–
–
–
–
Bloom Filters
Power of Two Choices
Linear Probing
Cuckoo Hashing
Many Others…
Intuition
• If each element has entropy, then extract the
entropy to hash each element to nearuniform location.
• Extractors should provide near-uniform
behavior.
Notions of Entropy
• max probability : mp ( X )  max x Pr[ X  x]
– min-entropy : H ( X )  log( 1 / mp ( X ))
– block source with max probability p per block
mp ( X i | X1  x1,..., X i 1  xi 1 )  p
• collision probability : cp( X )   x (Pr[ X  x]) 2
– Renyi entropy : H2 ( X )  log( 1 / cp( X ))
– block source with coll probability p per block
cp( X i | X1  x1,..., X i 1  xi 1 )  p
• These “entropies” within a factor of 2.
• We use collision probability/Renyi entropy.
Leftover Hash Lemma
• A “classical” result (from 1989) [ILL].
• Intuitive statement: If H : [ N ]  [ M ] is
chosen from a pairwise independent hash
function, and X is a random variable with
small collision probability, H(X) will be
close to uniform.
Leftover Hash Lemma
• Specific statements for current setting.
– For 2-universal hash families.
• Let H : [ N ]  [ M ] be a random hash function from a 2universal hash family L. If cp(X)< 1/K, then
(H,H(X)) is (1 / 2) M / K -close to (H,U[M]).
– Equivalently, if X has Renyi entropy at least log M +
2log(1/), then (H,H(X)) is -close to uniform.
• Let H : [ N ]  [ M ] be a random hash function from a 2universal hash family. Given a block-source with coll
prob 1/K per block, (H,H(X1),.. H(XT)) is
xxxxxxxxxx-close
to (H,U[M]T).
(T / 2) M / K
– Equivalently, if X has Renyi entropy at least log M +
2log(T/), then (H,H(X1),.. H(XT)) is -close to uniform.
Further Improvements
• Additional improvements over Leftover
Hash Lemma in paper [MV08].
• Chung and Vadhan [CV08] further improve
analysis.
• Dietzfelbinger and Shellbach show you
have to be careful : pairwise independence
not enough even for random data sets from
small universe. [DS09]
– Not enough entropy when data large compared
to universe.
The Open Question
• How much randomness is needed for
cuckoo hashing to be effective?
• Tighten bound on independence needed in
worst case, and provide efficient hash
function families.
• What better results are possible with
reasonable assumptions on the data?
Open Question 5:
Parallel Architectures
• Multicores, Graphics Processor Units
(GPUs), other parallel architectures possibly
the next wave.
• Multiple-choice hashing and cuckoo
hashing seem naturally parallelizable.
• Theory and practice?
The Open Question
• Design and analyze efficient schemes for
constructing and maintaining hash tables in
modern parallel architectures.
Related Work
• Plenty on parallel hashing/load balancing
schemes.
– PRAM emulation, related work in the 1990s.
• Technical improvements of last decade
suggest more is possible.
• In Amenta et al., we designed new
implementation for GPUs based on cuckoo
hashing.
– To appear in SIGGRAPH 09.
– New theory, practical implementations possible?
The Open Question
• Design and analyze efficient schemes for
constructing and maintaining hash tables in
modern parallel architectures.
• How can cuckoo hashing be helpful?
• Practical implementations, with strong
theoretical backing?
Open Questions
• Tight bounds on insertion times for random walk
cuckoo hashing for d > 2 choices.
• Tight bounds on load capacity thresholds for
cuckoo hashing for d > 2 choices (and more than
one element per bucket).
• Stashes : where to use them, and a general
framework for them?
• Randomness: how much is really needed in the
worst case? On suitably random data?
• Parallelizable instantiations of cuckoo hashing?
• Real-world applications for cuckoo hashing.
• Your question here…
Thanks
Much thanks to
Martin Dietzfelbinger
and
Rasmus Pagh
for comments, suggestions, references.
Thanks to my
co-authors
for the results.
THANK YOU.