Document 7137238

Transcript Document 7137238

On the Placement of Web
Server Replicas
Lili Qiu, Microsoft Research
Venkata N. Padmanabhan, Microsoft Research
Geoffrey M. Voelker, UCSD
IEEE INFOCOM’2001, Anchorage, AK, April 2001
1
Outline





Overview
Related work
Our approach
Simulation methodology & results
Summary
2
Motivation

Growing interests in Web
server replicas

replica
replica

Internet
replica
replica
replica


Forms of Web server replicas

Clients
Content
Providers
Exponential growth in Web usage
Content providers want to offer
better service at lower cost
Solution: replication

Mirror sites
Content Distribution Networks
(CDNs)


CDN: a network of servers
Examples: Akamai, Digital Island
3
Placement of Web Server Replicas

Problem specification

Among a set of N potential sites, pick K sites as replicas
to minimize users’ latency or bandwidth usage
Internet
Clients
Content
Providers
4
Related Work



Placement of Web proxies [LGI+99]
Cache location [KRS00]
Placement of Internet instrumentation
[JJJ+00]
5
Our Approach


Model Internet as a graph
Parameterize the graph using measured inputs



# requests generated from each region
Distance between different regions
Map the placement problem onto a graph
optimization problem

Assumption:


Each client uses a single replica that is closest to it
Solve graph optimization problem

Using various approximation algorithms
6
Minimum K-median Problem

8
4
2
5
2
6
3
10
4
6

7

3
2
5
Given a complete graph
G=(V,E), d(j), c(i,j)
d(j): # requests
c(i,j): distance between node
i and j


8



Latency
or hop counts
or other metric to be
optimized
Find a subset V’ V with
|V’| = K s.t. it minimizes
vV minwV’ d(v)c(v,w)
NP-hard problem
7
Placement Algorithms

Tree based algorithm [LGG+99]



Random


Assume the underlying topologies are trees, and
model it as a dynamic programming problem
O(N3M2) for choosing M replicas among N potential
places
Pick the best among several random assignments
Hot spot

Place replicas near the clients that generate the
largest load
8
Placement Algorithms (Cont.)

Greedy algorithm




Calculate costs of assigning clients to replicas
Select replica with lowest cost
Adjust costs based upon assignment, repeat until
done
Super-Optimal algorithm

Lagrangian relaxation + subgradient method
9
Simulation Methodology

Network topology

Randomly generated topologies


Real Internet network topology


AS level topology obtained using BGP routing data
from a set of seven geographically dispersed BGP
peers
Web Workload

Real server traces


Using GT-ITM Internet topology generator
MSNBC, ClarkNet, NASA Kennedy Space Center
Performance Metric

Relative performance: costpractical/costsuper-optimal
10
Simulation Methodology
(Cont.)

Simulate a network of N
nodes (100  N  3000)

Cluster clients using network
aware clustering [KW00]


IP addresses with the same
address prefix belong to a
cluster
A small number of popular
clusters account for most
requests



Top 10, 100, 1000, 3000
clusters account for about
24%, 45%, 78%, and 94% of
the requests respectively
Pick the top N clusters
Map them to different nodes
11
Simulation Methodology
(Cont.)




Random trees
Random graphs
AS-level topologies
Sensitivity to the error in the input
12
Random Tree Topologies
Tree-based algorithm performs well as expected.
Greedy algorithm performs equally as well.
13
Random Graph Topologies
The greedy and hot-spot algorithms
out-perform the tree-based algorithm.
14
Large Random Graph Topologies
The greedy performs the best,
and the hot-spot performs nearly as well.
15
AS-level Internet Topologies
The greedy performs the best,
and the hot-spot performs nearly as well.
16
Effects of Imperfect Knowledge
about Input Data


Predicted workload (using moving window average)
Perfect topology information
Within 5% degradation when using predicted workload
17
Effects of Imperfect Knowledge
about Input Data (Cont.)


Predicted workload (using moving window average)
Noisy topology information

Perturb the distance between two nodes i and j by up to a
factor of 2
Within 15% degradation when using
predicted workload and noisy topology information
18
Summary



One of the first experimental studies on placement of
Web server replicas
Knowledge about client workload and topology is needed
for provisioning replicas
The greedy algorithm performs very well


Within a factor of 1.1 – 1.5 of the super-optimal
Insensitive to noise


The hot spot algorithm performs nearly as well


Stay within a factor of 2 of the super-optimal when the
salted error is a factor of 4
Within a factor of 1.6 – 2 of the super-optimal
Obtaining input data


Moving window average for load prediction
Using BGP router data to obtain topology information
19
Conclusion

Recommend using the greedy
algorithm for deciding the placement
of Web server replicas
20
Acknowledgement



Craig Labovitz
Yin Zhang
Ravi Kumar
21
Comments on greedy
algorithm performance


Worst-case performance: unbounded
Bad example

A full homogeneous binary tree with n=2i leaves
and n caches
0
optimal cost = 0
0
0
greedy cost = (n-1)*d
d

d
d
d
However, the worst-case scenario seems
unlikely to occur in real and random
topologies
22
Simulation Results in
Random Tree Topologies
23
Random Tree Topologies
Tree-based algorithm performs well as expected.
Greedy algorithm performs equally as well.
24
Random Graph Topologies
The greedy and hot-spot algorithms
out-perform the tree-based algorithm.
25
Large Random Graph Topologies
The greedy performs the best,
and the hot-spot performs nearly as well.
26
AS-level Internet Topologies
The greedy performs the best,
and the hot-spot performs nearly as well.
27
Simulation Results in
Real Internet Topologies
28
Obtaining Input Data

Workload


The number of requests generated by popular
client clusters
Stable


Placement algorithm can use moving window average
for predicting load with negligible impact on
performance
Network topology




Propagation delay
Hop count
AS hop count
Internet weather map
29
Placement of Web Server Replicas

Goal

replica
replica
Internet
replica
replica
replica

Minimum K-median
problem

Clients
Placing K replicas to
minimize users’ latency
or bandwidth usage
Content
Providers

Select K servers to
minimize the sum of
assignment costs
NP-hard problem
30

Document 7137238

Transcript Document 7137238

Directory