A Taxonomy and Survey of Content Delivery Networks

Transcript A Taxonomy and Survey of Content Delivery Networks

A Taxonomy and Survey of
Content Delivery Networks
Meng-Huan Wu
2011/10/26
1
Outline
•
•
•
•
•
•
•
Introduction
Request-routing mechanisms
Content selection and delivery
Content routing and delivery
Caching techniques
Conclusion & Future work
References
2
Introduction
• A CDN is a collection of network elements
arranged for more effective delivery of
content to end-users.
• Reduce network impact on the response time
of user requests.
• Avoid flash crowd (or SlashDot effect)
3
The three key components of a CDN
architecture
• A content provider or customer is one who delegates
the URI name space of the Web objects to be
distributed. The origin server of the content provider
holds those objects.
• A CDN provider is a proprietary organization or
company that provides infrastructure facilities to
content providers in order to deliver content in a timely
and reliable manner.
• End-users or clients are the entities who access content
from the content provider’s website.
4
Servers
• Origin server : The server where the definitive
version of a resource resides is called origin
server
• Replica server(or surrogate server) : A server is
called a replica server when it is holding a
replica of a resource but may act as an
authoritative reference for client responses.
5
Relationships
6
Abstract architecture of a Content
Delivery Network (CDN)
7
Request-routing in a CDN
environment
8
Content selection and delivery
9
Full-site content selection and
delivery
index.html
embedded image1.gif
image2.gif
Surrogate
Server
CDN
Client
Origin
Server
10
Partial site content selection and
delivery
index.html
embedded image1.gif
image2.gif
CDN
Surrogate
Server
Client
Origin
Server
11
Empirical-based approach
• In empirical-based approach, the Web site
administrator empirically selects the content
to be replicated to the edge servers. Heuristics
are used in making such an empirical decision.
• The main drawback of this approach lies in the
uncertainty in choosing the right heuristics.
12
Popularity-based approach
• In popularity-based approach, the most
popular objects are replicated to the
surrogates.
• This approach is time consuming and reliable
objects request statistics is not guaranteed
due to the popularity of each object varies
considerably.
• Moreover, such statistics are often not
available for newly introduced content.
13
Cluster-based approach
• In cluster-based approach, Web content is
grouped based on either correlation or access
frequency and is replicated in units of content
clusters.
14
Content routing and delivery
• If the local CDN server accepts a user’s
request but does not have the requested
content, it will perform content routing to
locate and then deliver the content to the user.
15
The steps the CDN takes to serve a
user’s request
• Step 1. Try to satisfy the user’s request using the
local CDN server.
• Step 2. If step 1 fails, try to satisfy the user’s
request using a CDN server inside the cluster
including the local CDN server.
• Step 3. If step 2 fails, try to satisfy the user’s
request using a CDN server inside a nearby cluster.
• Step 4. If step 3 fails, try to satisfy the user’s
request using the origin server.
16
17
Caching techniques
18
Query-based scheme
• The most straightforward scheme is the
query-based scheme, in which a CDN server
broadcasts a query for the requested content
to other CDN servers inside the same cluster if
it does not have the content.
19
Digest-based scheme
• In order to avoid flooding queries, the digestbased scheme was proposed. Each CDN server
maintains a content digest that includes the
content information of other CDN servers inside
the same cluster. Once a CDN server has cached/
deleted some contents, it notifies other CDN
servers to update their content digests.
• Hence, a CDN server knows where to locate the
content by checking its content digest.
20
Directory-based scheme
• A centralized version of the digest-based scheme is the
directory-based scheme, in which a directory server
maintains the content information of the CDN servers
inside the cluster. A CDN server only needs to notify
the directory server when local updates occur, and
queries the directory server when there is a local miss.
• Compared to the digest-based scheme the update
traffic is greatly reduced, but the directory server is a
single point of failure because it needs to handle the
update and query messages from all the cooperating
CDN servers.
21
Hashing-based scheme
• A more efficient scheme is the hashing-based
scheme. The CDN servers inside a cluster
maintain the same hashing function. Each
content is assigned to a designated CDN
server based on the content’s URL (or other
unique identification), unique IDs (e.g., IP
addresses) of the CDN servers, and the
hashing function. All requests for the same
content are redirected to the designated CDN
server for that content.
22
Semi-hashing-based scheme
• Under the semi-hashing-based scheme, a local
CDN server allocates a certain portion, Plocal, of
its disk space to cache the most popular
contents for its local users, and the remaining
portion to cooperate with other CDN servers
via a hashing function.
23
Cache update taxonomy
24
Periodic update
• The most common cache update method is the
periodic update. To ensure content consistency and
freshness, the content provider configures its origin
Web servers to provide instructions to caches about
what content is cacheable, how long different content
is to be considered fresh, when to check back with the
origin server for updated content, and so forth.
• With this approach, caches are updated in a regular
fashion. But this approach suffers from significant
levels of unnecessary traffic generated from update
traffic at each interval.
25
Update propagation
• The update propagation is triggered with a
change in content. It performs active content
pushing to the CDN cache servers. In this
mechanism, an updated version of a
document is delivered to all caches whenever
a change is made to the document at the
origin server.
• For frequently changing content, this
approach generates excess update traffic.
26
On-demand update
• On-demand update is a cache update mechanism
where the latest copy of a document is
propagated to the surrogate cache server based
on prior request for that content. This approach
follows a assume nothing structure and content is
not updated unless it is requested.
• The disadvantage of this approach is the back and
forth traffic between the cache and origin server
in order to ensure that the delivered content is
the latest.
27
Invalidation
• Another cache update approach is invalidation, in
which an invalidation message is sent to all surrogate
caches when a document is changed at the origin
server. The surrogate caches are blocked from
accessing the documents when it is being changed.
Each cache needs to fetch an updated version of the
document individually later.
• The drawback of this approach is that it does not make
full use of the distribution network for content delivery
and belated fetching of content by the caches may lead
to inefficiency of managing consistency among cached
contents.
28
Taxonomy of request-routing
mechanisms
29
DNS based Request-Routing
Akamai DNS
Akamai
CDN
california.cnn.akamai.com
delaware.cnn.akamai.com
Surrogate
145.155.10.15
Surrogate
58.15.100.152
DNS query:
www.cnn.com
Client
140.124.180.1
DNS response:
145.155.10.15
local DNS server
30
DNS based Request-Routing
Akamai DNS
Akamai
CDN
Surrogate
Surrogate
DNS query
Client
140.124.180.1 DNS
response
local DNS server
31
URL rewriting
http://www.foo.com/sports/highlight.mpg
=> http://www.cdn.com/www.foo.com/sports/highlight.mpg
origin server
HTTP request for
www.foo.com/sports/highlight.mpg
=>www.cdn.com/www.foo.com/sports/highlight.mpg
1
DNS query for www.cdn.com
client
CDN’s authoritative DNS server
2
3
HTTP request for
www.cdn.com/www.foo.com/sports/highlight.mpg
CDN server near client
32
Content outsourcing
• Cooperative push-based:
– This approach is based on the pre-fetching of content to
the surrogates.
• Non-cooperative pull-based:
– In this approach, client requests are directed to their
closest surrogate servers.
• Cooperative pull-based:
– The cooperative pull-based approach differs from the noncooperative approach in the sense that surrogate servers
cooperate with each other to get the requested content in
case of cache miss.
33
Conclusion & Future work
• Conclusion
– They offer fast and reliable applications and
services
– Reduce network impact on the response time
– Enhance QoE
• Future work
– Find a better way to content placement
34
References
[1] A. K. Pathan, and R. Buyya, “A Taxonomy and
Survey of Content Delivery Networks,” Tech
Report, Univ. of Melbourne, 2007
[2] J. Ni, and D. H. K. Tsang, “Large Scale
Cooperative Caching and Application-level
Multicast in Multimedia Content Delivery
Networks,” IEEE Communications, Vol. 43,
Issue. 5, pp. 98-105, May 2005.
35
Q&A
36