INTRODUCTION - Wayne State University

Download Report

Transcript INTRODUCTION - Wayne State University

ENHANCHING THE WEB’S INFRASTUCTURE:
FROM CACHING TO REPLICATION
ECE 7995
Presented By:
Pooja Swami
and
Usha Parashetti
1
Contents
•
•
•
•
•
•
•
•
•
•
•
Introduction
Issues to be addressed
Caching on the Web
Issues with Web Caching Design
Performance study of Web Caching
Deficiencies of Web Caching
Caching goes Replication
Implementation of Replication
Web location and information service (WLIS)
Performance study of CgR
Conclusion
2
Introduction
Enormous Success of the Internet has increased the number of
people getting added to the WWW.
•
User’s of WWW have increased exponentially.
•
Increased use of WWW has created problems within the Internet
•
•
•
•
Increased latency over the web due to insufficient bandwidth
Network gets congested due to increased traffic
Servers get overloaded with request due to increased traffic
Solutions to these problems
•
Introduce Caching on the Web
• Using Caching Proxies
•
Introduce Replication
• Increase the availability of data by creating mirrored sites
•
Combine the benefits provided by both Caching and Replication
• Use of active caching scheme such as CgR
3
Issues to be addressed to enhance web’s
Infrastructure
To preserve the usability of the WWW, the following issues need
to be addressed at the server level
•
Document retrieval latency times must be decreased
•
Document availability must be increased, perhaps by distributing
documents among several servers
•
The amount of data transferred must be reduced
•
Network access must be redistributed to avoid peak hours.
4
Caching on the web
•
Caching on the web is implemented
using Caching proxies.
•
A proxy acts as a mediator between
the user’s machine and the outside
world.
•
From the user’s point of view, the
proxy acts as a Web server: each
request is sent to and answered by
the proxy.
•
From the server’s point of view, the
proxy acts like a client: it forwards
requests to the originating server.
•
Data stored is shared among several
users, so there is an increased
probability of data being accessed
more than once.
5
Current Issues in Web caching design
Design of the replacement policy on the Web becomes complex
•
•
If a cache is full when it receives a request to store a large document, then would
it be more sensible to replace a single large document than several smaller ones.
Needs to consider the best strategy to maximize caching benefits.
Determination of the pattern of document access and the time required to
reload data is more complicated on the Web
•
•
Loading time of course depends on the origin of the document
Data transferred over international links typically takes longer to retrieve than
information from servers in the same country
Caching strategies face with the problem of document staleness on the Web
•
•
•
When a cached document changes on the originating server, caching proxies are
not aware of the changes.
Further requests satisfied from the cache will deliver out-of-date information
Causes the problem of cache coherency
6
Current Issues in Web caching design (contd..)
Web caching proxies can employ time-to-live ,TTL to estimate
document staleness
TTL Implementation procedure
• A date of last modification is included in every reply from a Web server
• A TTL timing window based on that date is associated with each
document put in the cache
• On document requested, the proxy checks the timing window.
• A request occurring within the TTL time frame is served directly from the
cache, the assumption being that the document is still current
• A request occurring after the TTL has expired causes a conditional
reload to be performed
• Originating server will answer either with the new document or a special
reply indicating unchanged data
7
Performance Study of Web Caching
Locality of reference and the web
•
A very small subset of pages were
frequently accessed, while most
documents were accessed relatively
seldom.
•
On averaging cache hit rates it was
found that 14 % of all cached pages
were responsible for 42 % of the data
served directly from the cache , which
occupied only 7% of the overall disk
space.
•
Study results indicates that locality of
reference exists with respect to web
8
Performance Study of Web Caching (contd..)
Performance gains observed in terms of Cache hit rates, Byte hit
rates and Transferred data.
•
Theoretically achievable hit rates calculated are to be :
•
•
•
•
Practical results obtained were :
•
•
•
•
Cache hit rate : 56.5%
Byte hit rate : 40.6 %
Transferred data :3,650,950,731 bytes
Cache hit rate : 21.3 %
Byte hit rate : 16.6 %
Transferred data :4,992,987,253 bytes
Study results indicate that only 40 % of the theoretically possible hit rate is
achieved
9
Deficiencies of Web Caching
Caching alone cannot provide complete solution towards improvement
of Web Infrastructure
•
Document retrieval latency times
•
•
•
Document availability
•
•
Not possible to check if the document requested is current , in case when originating
server is down
Reduction in data transfer
•
•
Caching provides only partial solution to this problem
Performance study indicate 69.9 % of the pages were retrieved only once
Caching cannot reduce the amount of data transferred for the documents retrieved only
once
Redistribution of Network access
•
•
Caching cannot solve this problem
Document loading and staleness checks are made in the critical path at the time of
request
10
Caching goes Replication
• The concept of CgR is to combine caching and replication to
achieve the goal mentioned earlier.
• The basic idea is that of an active caching scheme in which servers
can decide which documents should be held where.
• Its simply transforming the caching servers into replicate servers.
11
. Describing the current Web with CgR,
caches now become active replicates for
certain URL namespaces of some of
those servers whose data previously only
cached.
. Primary server will have to know only
the set of its direct replicates. Level 1
replicate servers can in turn have
replicates(RS4 and RS5) for which they
act as a primary server.
. Conversion of caches to Replicated
Servers (RS) and conversion of normal
WWW servers to Primary Servers (PS), is
the central concept of CgR.
. The selection of which caching servers
to convert to replicate servers can be
done manually or automatically based on
appropriate heuristics.
12
How replication is implemented
• Servers initiate propagation by sending a normal HTTP GET request
bearing a specific notification to their replica sites.
• CgR-enhanced replicate servers will interpret this notification as a
command to request the data to be replicated.
• Basic issue is how the client will be able to select replicate servers.
This can be done by the use of a CgR-enhanced client-side proxy
(CP)
• CP permits all the action to be performed transparently without
modifying the clients or their interface to the Web. Users only has to
choose this proxy as their gateway to the internet.
13
How replication is implemented
cont.
• Now CP can directly send HTTP requests to replicate as well as
conventional servers.
• CP may switch RS anytime for load balancing.
• CP offers only a basic means by which clients can address a group
of servers.
• What still needed is a way to propagate information about the
replicate servers that exist and what data they hold.
14
Web location and information service (WLIS)
• WLIS keeps track of which URL namespaces are replicated and which
servers belong to logical groups of primary server and replicate servers.
• A natural place to implement the WLIS service is the client-side proxy,
but can also be included in the primary server or offered by separate
WLIS servers.
• Now the question arises about the creation of WLIS information, since
it can’t be entered manually into the system that is meant to be highly
scalable.
15
. This fig. shows how the distributed WLIS database
is set up.
. Assuming that initially no WLIS information is
available, the CP will forward requests directly to the
PS (Step 1).
. PS knows about its first-level replicate servers
(RS1-RS3) and will include this information in its
reply.
.The client receives this initial WLIS information and
stores it for later use. The next request can be
redirected towards one of the RS indicated in the
previous WLIS response.
. If the decision is made to query the RS1, it will in
turn reply with the requested document and a list of
its own replicates (RS4 and RS5).
16
Performance study of CgR
Cache Hit Rate:
Cache hit rate for CgR-enhanced RS with different quotas of its cache assigned to hold
replicated data.
For small cache size, reserving a cache space for replicates decreased overall
performance
17
Transmission times for document requests:
Small cache sizes again relates to the lower cache hit rates.
When the size are increased to 500 and 800, transmission time are reduced by about
1.5 percent.
18
Conclusion
• Caching and replication have proved beneficial in many areas of
computing but advantages of the combination of these two
approaches are manifold.
• They not only help to reduce the latency but also remedy the
extreme variations of network bandwidth .
• Additionally they provide a more fault-tolerant and evenly balanced
system.
19
Thank You
20