Transcript talk

Practical LFU implementation for
Web Caching
George Karakostas
Telcordia
Dimitrios N. Serpanos
University of Patras
A simple caching environment
T itle:
Creator:
Preview:
T his EPS pic ture was not s aved
with a preview i nc luded i n i t.
Comment:
T his EPS pic ture wi ll pri nt to a
Pos tScript printer, but not to
other types of printers.
Basic assumptions
1. The number of all Web pages N is known.
2. The system is closed.
3. The requests for Web pages follow Zipf’s Law.
4. The requests are statistically independent.
(only order of magnitude matters)
(yeah, right…but we won’t care)
(plenty of experimental evidence)
(very strong assumption - counterintuitive(?))
Zipf-like distributions
Let U {Pi : Pi is the i th most popular page, 1 i  N}
Then Pr[Pi is requested]   , where   1
i
H
N
More generally:
Pr[ Pi is requested]  
i
where  is a constant between 0.6-0.9, depending on
the particular request stream.
Popularities according to Zipf
T itle:
Creator:
Preview:
T his EPS pic ture was not s aved
with a preview i nc luded i n i t.
Comment:
T his EPS pic ture wi ll pri nt to a
Pos tScri pt printer, but not to
other types of printers.
where =1.
Our motivation
• Serpanos & Wolf prove analytically the
optimality of Perfect-LFU under assumptions 3
and 4.
• Breslau et al. studied the implications of
assumptions 3 and 4. Give evidence for Zipf-like
distribution of page requests, and for the
optimality of Perfect-LFU as a cache
replacement policy.
But, if so...
Why people don’t use Perfect-LFU?
Answer: Because it is ‘Perfect’ (i.e. impractical).
Perfect-LFU needs to store statistics for all the pages
requested from the beginning of cache operation. Hence
the resources (time/space) needed are of order N.
Our contribution : We show that under assumptions 1-4
we can efficiently approximate the Perfect-LFU hit rate
within any constant ε.
Chernoff bounds
Theorem [Chernoff]: The sum of R i.i.d. random variables
is close to its expected value with very high probability:
R
Pr[  X
i 1
i
2 R)


(

 (1 ) E[ X ] ]  e
Observation 1: Under our assumptions, the number of
requests for a page in a random trace is close to its expected
value, i.e. proportional to its popularity.
Observation 2: With a small R we can distinguish the most
popular objects.
Window-LFU
• Simple variation of Perfect-LFU.
• Instead of keeping statistics for all pages, keep only for
a sample of the request stream (called window) of size
| W|  poly(C) N1 ln(1/ )
where C is the cache size, and ε is the error
parameter.
• Cache the C most frequent pages in the sample.
Theorem: Under our assumptions,
HIT - RATE W - LFU(C)  (1 ) HIT - RATE P - LFU(C)
Window placement
Observation : Under our assumptions, any sample of size
|W| will achieve the Perfect-LFU hit rate.
Request stream
New
request
CACHE
Locality
Two different types of locality phenomena:
• Temporal
• Popularity
Our window will be the |W| most recent requests
to take advantage of temporal locality as well.
Simulation results
T itle:
T itle:
Creator:
gnuplot
Preview:
T his EPS pic ture was not s aved
with a previ ew inc luded in it.
Comment:
T his EPS pic ture wi ll print to a
Pos tScri pt printer, but not to
other types of printers.
Creator:
gnuplot
Preview:
T his EPS pic ture was not s aved
with a previ ew inc luded in it.
Comment:
T his EPS pic ture wi ll print to a
Pos tScri pt printer, but not to
other types of printers.
T itle:
T itle:
Creator:
gnuplot
Preview:
T his EPS pic ture was not s aved
with a previ ew inc luded in it.
Comment:
T his EPS pic ture wi ll print to a
Pos tScri pt printer, but not to
other types of printers.
Creator:
gnuplot
Preview:
T his EPS pic ture was not s aved
with a previ ew inc luded in it.
Comment:
T his EPS pic ture wi ll print to a
Pos tScri pt printer, but not to
other types of printers.
T itle:
Creator:
gnuplot
Preview:
T his EPS pic ture was not s aved
with a previ ew inc luded in it.
Comment:
T his EPS pic ture wi ll print to a
Pos tScri pt printer, but not to
other types of printers.
Conclusions / Open problems
• Window-LFU is an efficient implementation of LFU
• It takes advantage of the different types of locality to
achieve in practice better performance than PerfectLFU.
• How can we determine the window size dynamically?
(simple doubling heuristic performs very well)
• How can we detect that the Zipf-like distribution
parameters (N,α) have changed?