Document 7137238
Download
Report
Transcript Document 7137238
On the Placement of Web
Server Replicas
Lili Qiu, Microsoft Research
Venkata N. Padmanabhan, Microsoft Research
Geoffrey M. Voelker, UCSD
IEEE INFOCOM’2001, Anchorage, AK, April 2001
1
Outline
Overview
Related work
Our approach
Simulation methodology & results
Summary
2
Motivation
Growing interests in Web
server replicas
replica
replica
Internet
replica
replica
replica
Forms of Web server replicas
Clients
Content
Providers
Exponential growth in Web usage
Content providers want to offer
better service at lower cost
Solution: replication
Mirror sites
Content Distribution Networks
(CDNs)
CDN: a network of servers
Examples: Akamai, Digital Island
3
Placement of Web Server Replicas
Problem specification
Among a set of N potential sites, pick K sites as replicas
to minimize users’ latency or bandwidth usage
Internet
Clients
Content
Providers
4
Related Work
Placement of Web proxies [LGI+99]
Cache location [KRS00]
Placement of Internet instrumentation
[JJJ+00]
5
Our Approach
Model Internet as a graph
Parameterize the graph using measured inputs
# requests generated from each region
Distance between different regions
Map the placement problem onto a graph
optimization problem
Assumption:
Each client uses a single replica that is closest to it
Solve graph optimization problem
Using various approximation algorithms
6
Minimum K-median Problem
8
4
2
5
2
6
3
10
4
6
7
3
2
5
Given a complete graph
G=(V,E), d(j), c(i,j)
d(j): # requests
c(i,j): distance between node
i and j
8
Latency
or hop counts
or other metric to be
optimized
Find a subset V’ V with
|V’| = K s.t. it minimizes
vV minwV’ d(v)c(v,w)
NP-hard problem
7
Placement Algorithms
Tree based algorithm [LGG+99]
Random
Assume the underlying topologies are trees, and
model it as a dynamic programming problem
O(N3M2) for choosing M replicas among N potential
places
Pick the best among several random assignments
Hot spot
Place replicas near the clients that generate the
largest load
8
Placement Algorithms (Cont.)
Greedy algorithm
Calculate costs of assigning clients to replicas
Select replica with lowest cost
Adjust costs based upon assignment, repeat until
done
Super-Optimal algorithm
Lagrangian relaxation + subgradient method
9
Simulation Methodology
Network topology
Randomly generated topologies
Real Internet network topology
AS level topology obtained using BGP routing data
from a set of seven geographically dispersed BGP
peers
Web Workload
Real server traces
Using GT-ITM Internet topology generator
MSNBC, ClarkNet, NASA Kennedy Space Center
Performance Metric
Relative performance: costpractical/costsuper-optimal
10
Simulation Methodology
(Cont.)
Simulate a network of N
nodes (100 N 3000)
Cluster clients using network
aware clustering [KW00]
IP addresses with the same
address prefix belong to a
cluster
A small number of popular
clusters account for most
requests
Top 10, 100, 1000, 3000
clusters account for about
24%, 45%, 78%, and 94% of
the requests respectively
Pick the top N clusters
Map them to different nodes
11
Simulation Methodology
(Cont.)
Random trees
Random graphs
AS-level topologies
Sensitivity to the error in the input
12
Random Tree Topologies
Tree-based algorithm performs well as expected.
Greedy algorithm performs equally as well.
13
Random Graph Topologies
The greedy and hot-spot algorithms
out-perform the tree-based algorithm.
14
Large Random Graph Topologies
The greedy performs the best,
and the hot-spot performs nearly as well.
15
AS-level Internet Topologies
The greedy performs the best,
and the hot-spot performs nearly as well.
16
Effects of Imperfect Knowledge
about Input Data
Predicted workload (using moving window average)
Perfect topology information
Within 5% degradation when using predicted workload
17
Effects of Imperfect Knowledge
about Input Data (Cont.)
Predicted workload (using moving window average)
Noisy topology information
Perturb the distance between two nodes i and j by up to a
factor of 2
Within 15% degradation when using
predicted workload and noisy topology information
18
Summary
One of the first experimental studies on placement of
Web server replicas
Knowledge about client workload and topology is needed
for provisioning replicas
The greedy algorithm performs very well
Within a factor of 1.1 – 1.5 of the super-optimal
Insensitive to noise
The hot spot algorithm performs nearly as well
Stay within a factor of 2 of the super-optimal when the
salted error is a factor of 4
Within a factor of 1.6 – 2 of the super-optimal
Obtaining input data
Moving window average for load prediction
Using BGP router data to obtain topology information
19
Conclusion
Recommend using the greedy
algorithm for deciding the placement
of Web server replicas
20
Acknowledgement
Craig Labovitz
Yin Zhang
Ravi Kumar
21
Comments on greedy
algorithm performance
Worst-case performance: unbounded
Bad example
A full homogeneous binary tree with n=2i leaves
and n caches
0
optimal cost = 0
0
0
greedy cost = (n-1)*d
d
d
d
d
However, the worst-case scenario seems
unlikely to occur in real and random
topologies
22
Simulation Results in
Random Tree Topologies
23
Random Tree Topologies
Tree-based algorithm performs well as expected.
Greedy algorithm performs equally as well.
24
Random Graph Topologies
The greedy and hot-spot algorithms
out-perform the tree-based algorithm.
25
Large Random Graph Topologies
The greedy performs the best,
and the hot-spot performs nearly as well.
26
AS-level Internet Topologies
The greedy performs the best,
and the hot-spot performs nearly as well.
27
Simulation Results in
Real Internet Topologies
28
Obtaining Input Data
Workload
The number of requests generated by popular
client clusters
Stable
Placement algorithm can use moving window average
for predicting load with negligible impact on
performance
Network topology
Propagation delay
Hop count
AS hop count
Internet weather map
29
Placement of Web Server Replicas
Goal
replica
replica
Internet
replica
replica
replica
Minimum K-median
problem
Clients
Placing K replicas to
minimize users’ latency
or bandwidth usage
Content
Providers
Select K servers to
minimize the sum of
assignment costs
NP-hard problem
30