High Performance Web Server

Download Report

Transcript High Performance Web Server

High Performance Web Server
NTUIM
R89725018 Chen Pei-wen
R89725013 Cheng Pei-chun
Outline
Introduction
 Load balancing
 Content-based Switching
 Implementation Architecture
 Conclusion
 Reference

Introduction
Performance and high availability are
critical at web sites which receive large
number of requests.
 The QoS a web server provides to end
users depends on

– Network-transfer speed
– Server-response time
Introduction(con’d)
Network-transfer speed is mainly a
matter of Internet-link bandwidth.
 Server-response time depends upon
available resources:

– Single server
– Multiple servers
Introduction(con’d)

Single server
Web-server
Software
Well-designed server process
Adjust web-server software
Operation
System
Specific operation system
Adjust operation-system parameter
Hardware
Install more RAM
Replace the CPU with a faster one
Use faster SCSI controllers and disks
Introduction(con’d)

Multiple Servers
– Improve performance by increasing the
number of Web servers.
– This involves an attempt to distribute the
traffic onto a cluster of back-end Web
servers.
– Load balancing is needed.
Load Balancing

Goal
– To balance the traffic onto available server
– The technical distribution is totally
transparent to the end user
Load Balancing(con’d)

Benefits
– Improve reliability (fault tolerance)
• If you are using a single server and it fails, the site goes
down with it.
• This is especially bad for e-commerce and financial sites
which lose money if they are out of service.
• With a load balanced group of servers, loss of a single
server will only slightly affect overall site performance
and the site will not go down.
Load Balancing(con’d)
– Improve performance
• Load balancing allows multiple servers to be available to
handle larger number of incoming client requests.
– Lower cost
• With load balancing providing fault tolerance to the entire
site, the reliability of each individual server is less critical.
• We can use lower-cost servers without compromising
overall reliability.
Load Balancing(con’d)
– Improve scalability and flexibility
• With one server, all you can do if your traffic increases is
upgrade that server or buy a bigger one.
• With load balanced groups of servers, you can simply
add more servers gracefully to server farm.
– Improve maintainability
• The flexibility of a load balanced group of servers allows
you to remove individual servers from service for repair
or upgrade without affecting the overall availability of the
site.
Load Balancing(con’d)

Load balancing algorithm (RFC 2391)
1. Round-Robin
•
This is the simplest scheme, where a host is selected
simply on a round robin basis, without regard to load
on the host.
2. Least Load first (session count)
•
•
the host with least number of sessions bound to it is
selected to service a new session.
Each session is assumed to be as resource consuming
as any other session
Load Balancing(con’d)
3. Least traffic first (bytes or packet count)
•
measure system load by tracking packet count or byte
count directed from or to each of the member hosts
over a period of time.
4. Least weighted load
•
•
Weights to sessions, based on likely resource
consumption estimates of session types
Weights to hosts based on resource availability.
5. Fastest response
•
periodically ping member hosts and measure the
response time to determine how busy the hosts really
are
Load Balancing - RR DNS

Round-Robin DNS Approach
– Allows a single domain name to be associated
with several IP addresses
• Example : using CNAME (canonical name) resource records
www.foo.dom. IN CNAME www1.foo.dom
IN CNAME www2.foo.dom
IN CNAME www3.foo.dom
IN CNAME www4.foo.dom
Load Balancing - RR DNS

Operation
– A browser has to take to retrieve the URL is to resolve the
corresponding IP address
– A name resolver that calls a nearby DNS server, which then
actively iterates over the distributed DNS server hierarchy on
the Internet until it reaches the Round-Robin DNS server,
which finally gives the IP address
– A browser takes the IP address and create a connection with
the assigned server
Load Balancing - RR DNS
Load Balancing - RR DNS

Attractiveness of Round-Robin DNS
– The concept is simple
– It requires no additional hardware

Drawbacks of Round-Robin DNS
– DNS is unaware of the status of web servers
– All servers are assumed to have equal capability
to offer all services
Load Balancing - RR DNS
– The caching of DNS data can cause load
imbalances
• In practice, DNS servers cache the resolved data at any point
in the DNS hierarchy both to decrease the resolver traffic and
to speed up resolving.
Database
User
process
query
response
Name
resolver
cache
query
response
Name
Server
cache
Database
query
response
RR DNS
Server
cache
Load Balancing - L4 Switch

Layer-4 Switch Approach
– These switches sit between the connection to the
Internet and the server farm
Load Balancing - L4 Switch
Load Balancing - L4 Switch

Operation
– The switch recognizes when a client is requesting a new
session by identifying the TCP SYN packet
– The request is forwarded to the best available server based
on the configured load balancing algorithm
– The switch maintains a session-server binding table that
associates each active session with the real server to which
it is assigned
Load Balancing - L4 Switch
– It performs address substitution so that the real server will
transparently receive packets for that session
– Likewise, the switch intercepts packets traveling from the
real server to the client and performs the reverse address
substitution
– The switch recognizes when the session is terminated by
identifying the TCP FIN packet
– Then it removes the session-server binding from its
binding table
Load Balancing - L4 Switch

Attractiveness of Layer-4 Switch
–
–
–
–

Good load balancing can be achieved
No problem of the caching of DNS data
Sophisticated algorithm can be used
Aware of the failures of web servers
Limitation of Layer-4 Switch
– It has no concept of what content is being
requested
Load Balancing - L4 Switch
– All content should be replicated
– Cache hit rate may be low
Load Balancing

Benefits of Content Awareness
Content-based Switching

Content-based Switching
– Intelligently load balances traffic across delivery
nodes, dynamically directing specific content
requests to the best site and server at that
moment.
– Based on content availability, application
availability and server load.
– Adds protection against flash crowds and ensures
transaction continuity for e-commerce applications.
– Enables advanced personalization and
prioritization for important content and customers.
Content-based Switching(con’d)

Benefits
– Increased performance due to improved hit rates
in the back-end’s main memory caches.
– Increased secondary storage scalability due to the
ability to partition the server’s database over the
different back-end nodes
– The ability to back-end nodes that are specialized
for certain types of requests
Content-based Switching(con’d)

Products
– ArrowPoint's
Content Smart™ Web Switches
WebNS™
– Foundry network’s
ServerIron™ Traffic Management system
Internet IronWare™
– Nortel Networks’s
Accelar Load Balancing Server Switch
Content-based Switching(con’d)

Switch Architecture
Content-based Switching(con’d)

Idea
User
Web
Simultaneously,
A
flowswitch
makes
is created
control
with
spoofs
a block
content
the
virtual
between
TCP
is
Web
created
request
IPconnect
switch
of
thethe
switch
in
byexamines
requested
the
back
typing
port
and
to aASIC
client
the
URL
HTTP
URL
optimal
and
into
intercepts
header
all
a
Browser.
the
examines
and
server
subsequent
request.
URL
and
and
packet
"snaps"
packets
compares
URLs.
together
are to
forwarded
current
with the
content
without
flow from
rules
intervention
the
to client
select
by
best
to
the
the
switch
server
switch.
controllers.
or cache to satisfy request.
Content-based Switching(con’d)
Content-based Switching(con’d)
Content-based Switching(con’d)
Content-based Switching(con’d)

Variant Switching Scheme
– URL Switching
Directs HTTP requests to a group of servers using
information contained in URL string.
• Greater control over the website deployment to place
different web content on different servers
• Eliminating unnecessary duplication of all content across
all load-balanced servers.
• Ex: Different file types
Different request
Content-based Switching(con’d)

Variant Switching Scheme
– Cookie Switching
Directs HTTP requests to a server group based on
information embedded in a cookie in the HTTP
header.
• Cookie specifies which server group should handle the
request.
• Ensures that a particular server group always handles
requests from a particular client even across sessions.
• Guarantee persistent end-user experience.
• Ex: Personalized web page
Prioritized service
Content-based Switching(con’d)

Variant Switching Scheme
– SSL Session ID Switching
All the SSL connections between a client and
server must reach the same host.
• Ensures that all the traffic for a SSL transaction with a
given SSL-ID always goes to the same server.
• Key feature for commerce, financial web sites
• Ex: Prevent shopping cart loss
Access control
Prevent source address overload
Content-based Switching(con’d)

Load balancer evaluation criteria
–
–
–
–
–
–
–
–
Plans for Layer 3 or Layer 4 switching
Number of servers and planned growth
Type of content to be balanced
Number of server sites to be balanced
Sophistication of balancing algorithms
Degree of fault tolerance required
Interfaces and port density
Support requirements
Implementation Architecture
Design, Implementation and Performance of a
Content-Based Switch
Infocom 2000
{George Apostolopoulos, David Aubespin, Vinod Peris,
Prashant Pradhan, Debanjan Saha}
Implementation Architecture(con’d)

Switch Architecture
– The Layer 5 system consist
of a switch core to which a
number of custom built
intelligent port controller are
attached.
– Layer 5 functions, such as
the parsing of HTTP
protocol messages and
URL based routing, are
performed by the processor.
Implementation Architecture(con’d)

Switch Architecture
– Port controller identify
the packets that need to
be handled by the
processor and forward
them to the processor.
– Make sure it can
achieve very high
speed while delivering
sophisticated Layer 5
functionality.
Implementation Architecture(con’d)

Operation Blueprint
– Phase 1: it intercepts the TCP connection setup request
from the client and response by establishing a connection
to the client.
– Phase 2:after routing decision is made, it sets up a second
connection to the appropriate server node.
– Phase 3:splicing the two TCP connections
Implementation Architecture(con’d)

Processing at Port Controllers
Implementation Architecture(con’d)

Processing at CPU
– CPU acts as the end-points for the TCP
connections to the client and the server until.
they are spliced.
– Splices the connection by sending the
appropriate control messages to the port
controllers.
– Handling of TCP options deserves special
attention.
• Reject all TCP options
• Enumerate the minimum set of options supported by
all nodes
Implementation Architecture(con’d)



URL look up
To be able to dispatch
HTTP requests based on
URLs, the L5 system has
to know mapping from the
URL to the web server on
which the page resides.
Use a hash function and
set default size of all hash
buckets to 256.
Conclusion



The concept of content-based switching is
understandable, but efforts are needed to
implement it well.
Content based-service differentiation can be
used to provide service differentiation based
on the user profiles.
Not only load balancing but also persistence
pays in E-Commerce.
Reference






RFC 2391 LSNAT
Webtechniques Load balancing your web sites
http://www.webtechniques.com/archives/1998/05/engelschall/
HydraWEB Load Balancing
http://www.hydraweb.com/load_balancing/index.asp
Techniques for Designing High-Performance Web Sites
http://www.research.ibm.com/people/i/iyengar/ieeeic/ieeeic.html
Locality-Aware Request Distribution in Cluster-based Network
Service In Architectural Support for Programming Languages
and Operating System, 1998
TCP/IP & Related Protocols 2 edition Uyless Black
Chapter four : The Domain Name System
Reference (con’d)

Foundry Products Application Notes
http://www.foundrynet.com/appnotes.html

Alteon WebSystems Web Switching White Paper
http://www.alteonwebsystems.com/products/whitepapers/index.asp

Design, Implementation and Performance of a Content-Based
Switch Infocom 2000
{George Apostolopoulos, David Aubespin, Vinod Peris, Prashant
Pradhan, Debanjan Saha}

Cisco CDNs
http://www.cisco.com/warp/public/779/largeent/learn/technologie
s/content_networking/
URL Switching
Back
Cookie Switching
Back
SSL Session ID Switching
Back