The Cache Location Problem Overview • TERCs Vs. Proxies • Stability • Cache location.
Download
Report
Transcript The Cache Location Problem Overview • TERCs Vs. Proxies • Stability • Cache location.
The Cache Location Problem
Overview
• TERCs Vs. Proxies
• Stability
• Cache location
Proxy Web Caching is Good
•
•
•
•
Saves network bandwidth
Reduces delay
Reduces server’s load
But it is not perfect:
– not everybody uses it (configuration)
– may become a bottleneck and increase
delay
– increases delay for unsatisfied pages
Transparent En-Route Caches (TERCs)
• Caches are located along routes from clients to
servers, and are transparent to both server and
client
• Requests are intercepted by the TERC on their
way to the server, and either
• answered by the cache if the information exists
• otherwise, forwarded to the server
• Advantages:
• No configuration required! No management!
• No change required in current network infrastructure
• Can be deployed independently within an ISP
subnetwork
TERCs (-)
• Must be on the route from client to server:
– sensitive to route changes
– hierarchies are much harder to implemen
• Needs to intercept traffic:
– implementation problem
– more complex
– can TERCs work at line speed?
• Depends on routing stability, and flow
stability
Where should TERCs be placed?
Route Stability
• Published results indicate that routing is
stable (Paxon, Labovitz)
• We need stability only during the connection
lifetime (~1 min.):
– [KRS00] measurements to more that 13000
destinations show that >93% of
connections were stable
– real numbers are probably higher
• TCP route caching
• equivalent of IP addresses
Stability of Flows
• We built the flow tree from servers:
• Data from Bell-Labs servers (www.belllabs.com, www.multimedia.belllabs.com )
– Nov. 97 - Jan. 98
– ~14000 different hosts, 1 Gbytes, ~200k cachable
requests (per week)
• From log files to results:
– extract unique host
– run traceroute for each host
– obtain the routing tree (or is it DAG?)
Stability - Visual
Client return rate between days
day
0111
0111
0112
0113
0114
0115
0116
0117
1130
1201
1202
1203
1204
1205
1206
4.35
4
3.78
3.69
3.55
3.73
3.25
3.12
3.21
2.96
3.01
2.79
3.36
6.93
6.06
5.66
5.34
3.58
2.77
4.4
3.85
3.86
3.87
4.02
3.33
7.48
6.1
6.12
4.26
3.28
4.58
4.25
4.16
4.34
4.25
2.96
7.33
6.48
4.07
3.03
4.21
4.23
4.28
4.34
4.25
3.15
7.41
4.3
2.77
3.71
4.02
4.25
3.98
4.2
2.88
5.38
3.13
4.21
4.56
4.12
4.1
4.36
3.25
3.36
2.99
3.14
2.86
2.88
3.18
3.46
4.32
4.08
4.15
3.42
3.49
4.23
7
6.34
6.06
4.97
3.58
6.88
5.89
5.35
3.94
7.01
5.58
3.48
7.15
3.95
0112
4.35
0113
4
6.93
0114
3.78
6.06
7.48
0115
3.69
5.66
6.1
7.33
0116
3.55
5.34
6.12
6.48
7.41
0117
3.73
3.58
4.26
4.07
4.3
5.38
1130
3.25
2.77
3.28
3.03
2.77
3.13
3.36
1201
3.12
4.4
4.58
4.21
3.71
4.21
2.99
4.32
1202
3.21
3.85
4.25
4.23
4.02
4.56
3.14
4.08
7
1203
2.96
3.86
4.16
4.28
4.25
4.12
2.86
4.15
6.34
6.88
1204
3.01
3.87
4.34
4.34
3.98
4.1
2.88
3.42
6.06
5.89
7.01
1205
2.79
4.02
4.25
4.25
4.2
4.36
3.18
3.49
4.97
5.35
5.58
7.15
1206
3.36
3.33
2.96
3.15
2.88
3.25
3.46
4.23
3.58
3.94
3.48
3.95
4.82
4.82
Stability (3)
• The relative flow in the tree is stable in
time, although the client population
changes significantly
• Routing is stable for the lifetime of the
connection
• Placing caches based on past traffic
yields good results
How Fixed is the Hit Ratio?
How Fixed is the Hit Ratio?(2)
Where Should the TERCs be Placed?
The Model
• Wide area network
• Requests are represented by a set of
demands (of client i from server j)
• Goal: minimize average delay =
minimize total flow
• The hit ratio (P) abstracts cache behavior
• most hits due to small number of
popular pages
• full dependency - the same pages
are cached everywhere
• But part of the flow can come from Proxies
=>
Each flow is associated with a hit ratio Pi,j
The General k-cache Location Problem
• Instance:
• an undirected graph G=(V,E)
• a set of demands F={fi,j }
• a set of hit ratios P={pi,j }
• k - the number of caches
• Solution: K, a subset of V of size k
• Objective: minimizing total flow
min
i,j v K+{j}
fi,j [pi,j d(i,v) + (1-pi,j) (d(i,v)+d(v,j))]
The k-TERC Location Problem
• Instance:
• an undirected graph G=(V,E)
• a set of demands F={fi,j }
• a set of hit ratios P={pi,j }
• k - the number of caches
• Solution: K, a subset of V of size k
• Objective: minimizing total flow
min
i,j v K+{j}
on the path
from j to i
fi,j [pi,j d(i,v) + (1-pi,j) (d(i,v)+d(v,j))]
Remarks
• A generalization of the p-median problem
(in the p-median problem we want to minimize the total
cost of serving a set of demands from at most p
centers)
• In the k-TERC location problem:
– it is enough to solve the problem for fixed p (pi,j =
p)
– The optimal set K does not depend on p.
– (not true in general)
• The k-TERC location problem is a special case
of the general k-location problem
(p=1/n)
The independence of ps,c
min f
vk K
s ,c
s ,c
[ ps ,c d (vc , vk ) (1 ps ,c )(d (vc , vk ) d (vk , vs ))]
min f s ,c [d (vc , vk ) (1 ps ,c )d (vk , vs )]
s ,c
vk K
min f s ,c d (vc , vs ) f s ,c ps ,c d (vk , vs )
s ,c
vk K
TERC
constant
Hardness Results
line
tree
general graph
one server
Poly.
Poly.
NP - hard
m servers
Poly.
NP - hard
NP - hard
Placement on a line
•
•
•
•
•
0
1
2
Topology: a line of n nodes
Every node may be a server, a client, or both.
FR(i) – The flow demand on the segment (i-1,i)
FR can be easily computed from the input.
FC(i,lo,li) - The flow on the segment (i-1,i) when
the closest caches to i are in lo and li.
• FC can be computed from the input with p=1.
• Note: FR(i) = FC(i,n-1,0)
n-1
Placement on a line
• C(j,lo,li,k) the overall flow in segment [0,j] when k
caches are locate optimally inside the segment,
and the closest caches to j are in lo and li.
The dynamic Program
• Base case (j=1)
1 li n 1
C (1, li ,1,1) FC(1,1,0)
C (1, li ,0,0) FC(1, li ,0)
• For j>1:
C ( j, lo , li , k ' ) min{C ( j 1, j, li , k '1) FC( j, j, li ),
C ( j 1, lo , li , k ' ) FC( j, lo , li )}
The Algorithm
1. Compute C(1,li,1,1) and C(1,li,0,0) for
1≤li≤n-1
2. For each j>1 compute C(j,lo,li,k’) for all
0≤k’≤k and 0≤li≤j≤lo≤n-1
Complexity: O(n3k)
Optimizing for a single server
• The routes from the server to all clients
form a tree (actually a DAG)
• We’ll use dynamic programing to find the
optimal cache locations
The Greedy Algorithm
• Optimal algorithm using a bottom up dynamic
programming:
– not trivial
– complexity O(n k2 h)
• Greedy:
– repeat k times
{find the best cache location}
– complexity O(n k)
• How bad can it be?
Greedy Vs. Optimal
Dynamic Programming for Tree
• First we convert the tree to a binary tree
by adding dummy nodes.
• Sort all nodes in reverse BFS order:
nodes descendents are numbered
before the node itself.
Children of node i are: iR and iL
Notations
C(i,k’,l) is the cost of a subtree rooted at i
with k’ optimally located caches, where
the next cache up the tree is at distance
l from i.
F(i,k’,l) is the sum of demands in the
subtree i that do not pass thru a cache
in the solution C(i,k’,l).
The Dynamic Program
The DP Formula for C(i,k,l)
The cost if a cache is not placed at node i:
min {C (iL,k ' , l 1) C (iR ,k k ' , l 1) (l 1)[ F (iL,k ' , l 1) F (iR ,k k ' , l 1)] l f s ,i }
0 k ' k
The cost if a cache is placed at node i:
min {C (iL ,k ' ,1) C (iR ,k 1 k ' ,1) F (iL ,k ' ,1) F (iR ,k 1 k ' ,1)}
0 k ' k
Complexity:
O(n·h·k) variables O(n·h·k2) time cmplx
Finer analysis yields O(n·h·k) time complexity
The Server’s Point of View
Traffic Reduction
TERCs Vs. Edge Caches
The Server’s Point of View (2)
Popularity Stability