Introduction to Linux-based Virtual Server and Content Switch C. Edward Chow Department of Computer Science University of Colorado at Colorado Springs [email protected] The ppt file of.

Download Report

Transcript Introduction to Linux-based Virtual Server and Content Switch C. Edward Chow Department of Computer Science University of Colorado at Colorado Springs [email protected] The ppt file of.

Introduction to Linux-based Virtual Server and Content Switch C. Edward Chow Department of Computer Science University of Colorado at Colorado Springs [email protected]

The ppt file of this tutorial is available at http://cs.uccs.edu/~chow/pub/conf/pdcat/tutorial.ppt

Part of this work sponsored by CCL/ITRI 7/9/2001 Edward Chow Content Switch 1

Outline of the Talk

• Overview of Content Delivery Networks • Linux-based Virtual Server • Linux-based Content Switching 7/9/2001 Edward Chow Content Switch 2

Content Delivery Network (CDN)

Slow Response Huge Requests

Clients

@Home PSINet Sprint QWest UUnet

Mind Spring

Gloobix

Host Server

Server Crash

Clients

7/9/2001 Edward Chow

Clients

Content Switch 3

Content Delivery Problems

http://www.akamai.com

7/9/2001 Edward Chow Content Switch 4

Use Client Cache/ Client Side Cache Server

Clients Client Cache

Fast Response QWest PSINet Mind Spring @Home UUnet Sprint Gloobix Fewer Requests

Host Server Client Side Cache Server Clients Clients

7/9/2001 Edward Chow Content Switch 5

Clients Clients

Use Mirror Sites

Need improvement by guiding the selection of mirror servers with server load/network bandwidth measurement

PSINet @Home Mirror Site

Fewer Requests

Host Server

Sprint UUnet QWest

Server

Mind Spring

Gloobix

Fast Response

Mirror Site

Clients

Content Switch 6 7/9/2001 Edward Chow

Edge Network Cache Servers Mirror Site

Fast Response

Clients Client Cache Cache Server

PSINet

Cache Server

QWest

Mind Spring Cache Server

@Home UUnet

Cache Server

Sprint Gloobix

Fewer Requests

Host Server

Server

Clients

7/9/2001

Client Side Cache Server Edge Network Cache Server

Edward Chow

Mirror Site

Clients

Content Switch 7

Content Delivery Problem

• Cache Location Problem: Where to put cache servers?

• How many are needed?

• When/where/how to push/delivery the content?

• How about dynamic content?

7/9/2001 Edward Chow Content Switch 8

Akamai Edge Delivery Service

Date 11/2000 # of Edge Servers 6000 # of Networks # of Countries 335 54 6/2001 9700 650 56 • Peering Bottleneck Problem : Access traffic evenly spread over 7400+ networks (no one over 5%; most << 1%)  Need to put edge servers in many networks.

• 11/2000, 4 billion bits/day for 2800 sites.

• Source Http://www.akamai.com

7/9/2001 Edward Chow Content Switch 9

Caching Dynamic Content at Web Proxies

• Active Cache Project : [PeiCao 98] Univ. Wisconsin – Cache Java applet to be executed at proxies – Choice of passing to server, delivery cached copy, or generate dynamically.

• Edge Side Include (ESI): – XML tag to specify ESI fragment in a web page.

– Each ESI fragment can have different cache/ 7/9/2001 Edward Chow Content Switch 10

Edge Side Include Example http://www.esi.org/

onerror=“contineu” />

> click here - - >

Edward Chow Content Switch 11 7/9/2001

Solution to First Mile Problem

• First Mile Problem: Hugh requests at web site of CDN • High Bandwidth Connection • Caching – End System Cache • Client Cache • Client Site Proxy Cache Server • Mirror Site Caches – Cache Servers in Internet • Hierarchical Cache Servers, e.g., Squid/Harvest/Adaptive Web • Edge Servers of Akamai • Faster Server/Server Farm (Server Side Caching+Cluster) • Layer4 Load balancer+Real Servers • Content Switch+Real Servers • Distributed Packet Rewrite 7/9/2001 Edward Chow Content Switch 12

Web Server Cluster

Load balancer can run at

Application Level — Reverse Proxy

Kernel level — Linux Virtual Server

Load Balancer or Content Switch Real Server

Load balancer can distribute requests based on

• •

Layer 3-4 info — fixe field/fast hash Layer 3-7 info — var. length/slow parsing

7/9/2001 Edward Chow Real Server Content Switch 13

Comparison of Load Balancers

• Reverse Proxy runs as application process requires more memory/packet copying.

• Linux Virtual Server runs in kernel  no memory

Name Reverse Proxy/ Apache/Tomcat/Servlet Linux Virtual Server Type SW SW Level Application Kernel Layer Info 3-7 3-4 Linux Content Switch Layer4 Switch (narrow def.) HW Content/Web Switch

7/9/2001

SW Kernel 3-7 Embedded OS 3-4 HW

Edward Chow

Embedded OS 3-7

Content Switch 14

Linux Virtual Server (LVS)

• “Virtual server is a highly scalable and highly available server built on a cluster of real servers. The architecture of the cluster is transparent to end users, and the users see only a single virtual server” with

Virtual IP

address

(VIP). • Http://www.linuxvirtualserver.org/

RIP1

Real Server1

Internet VIP

WAN/ LAN

CIP Client 7/9/2001 Load Balancer/Director Linux Box CIP: Client IP Address VIP: Virutal IP Address Edward Chow RIP: Real Server IP Address RIP2

Real Server2

RIP3

Real Server3

Content Switch 15

LVS-NAT Configuration (Network Address Translation)

• All return traffic go through Director  Slow • Modify IP addr/port #/Checksum at Director • Director and real servers at same LAN • No modification needed on real-servers • Port remapping: real web server can run on 8080 RIP1

Real Server1

Internet CIP Client 7/9/2001 VIP Director Edward Chow Switch RIP2

Real Server2

RIP3

Real Server3

Content Switch 16

LVS-NAT Configuration Step 2. Director routes Pkt

• Based on CIP, source port#, VIP and dst port#, director selects one of the real servers • Change the dst IP addr or port # of pkt.

CIP VIP

1. request

7/9/2001 Client 2. Scheduling/ Rewrite packet CIP RIP1 RIP1

Real Server1

Internet CIP VIP Director ipvsadm cmd LVS Routing Scheduling Rules Edward Chow Switch RIP2

Real Server2

RIP3

Real Server3

Content Switch 17

LVS-NAT Configuration Step 3. Real Server Replies

• Real server retrieves response.

• All real servers set default gateway to Director; like any other NAT or IP masquerade setup • Packet will be sent back to Director.

3. Process CIP VIP

1. request 2. Scheduling/ Rewrite packet

CIP RIP1 RIP1

Real Server1

Request Internet CIP VIP Director RIP1 Switch CIP RIP2

Real Server2

RIP3 Client

Real Server3

7/9/2001 Edward Chow Content Switch 18

LVS-NAT Configuration Step 4. Director rewrites reply

• Director changes the dst IP addr. (RIP1) of pkt to VIP • Modify port # if needed.

• Modify the checksum; send back pkt.

CIP VIP

1. request 2. Scheduling/ Rewrite packet 3. Process

CIP RIP1 RIP1

Real Server1

Request

Internet VIP CIP CIP VIP Director RIP1 Switch CIP RIP2

Real Server2

RIP3 Client 4. Rewrite reply

Real Server3

7/9/2001 Edward Chow Content Switch 19

LVS-NAT Configuration (Network Address Translation)

• All return traffic go through Director  Slow • Modify IP addr/port #/Checksum at Director.

• Director and real servers at same LAN CIP VIP

1. request 2. Scheduling/ Rewrite packet 3. Process

CIP RIP1 RIP1

Real Server1

Request

Internet VIP CIP CIP VIP Director RIP1 Switch CIP RIP2

Real Server2

RIP3 Client 5. Receive reply 7/9/2001

4. Rewrite reply

Edward Chow

Real Server3

Content Switch 20

LVS-NAT Setup Commands

# make the director forward the masquerading packets echo 1 > /proc/sys/net/ipv4/ip_forward ipchains -A forward -j MASQ -s 172.16.0.0/24 -d 0.0.0.0/0 # Add virtual service and link a scheduler to it ipvsadm -A -t 202.103.106.5:80 -s wlc ipvsadm -A -t 202.103.106.5:21 -s wrr (Weighted Least-Connection scheduling) (Weighted Round Robin scheduling ) #Add real servers and select forwarding method and weight ipvsadm -a -t 202.103.106.5:80 -R 172.16.0.2:80 -m ipvsadm -a -t 202.103.106.5:80 -R 172.16.0.3:8000 -m -w 2 ipvsadm -a -t 202.103.106.5:21 -R 172.16.0.2:21 -m 7/9/2001 Edward Chow Content Switch 21

LVS-Tunnel Configuration (IP Tunneling)

• Real Servers need to handle IP over IP packets.

• Real Servers can be geographically separated and return traffic go through different routes. • Security implication!

2. Scheduling/

RIP1

Real Server1

3. Process 1. request

CIP Internet CIP

Put packet in IP Tunnel

VIP Client 4. Receive reply RIP0 IP Tunnel RIP0 RIP2 CIP VIP VIP Load Balancer Linux Box RIP3 VIP CIP RIP2

Real Server3

Request

Real Server2

7/9/2001 Edward Chow Content Switch 22

LVS-Tunnel Setup Commands

#The load balancer (LinuxDirector), kernel 2.2.14

echo 1 > /proc/sys/net/ipv4/ip_forward ipvsadm -A -t 172.26.20.110:23 -s wlc ipvsadm -a -t 172.26.20.110:23 -r 172.26.20.112 -i #The real server 1, kernel 2.2.14

echo 1 > /proc/sys/net/ipv4/ip_forward # insert it if it is compiled as module insmod ipip ifconfig tunl0 172.26.20.110 netmask 255.255.255.255 broadcast 172.26.20.110 up route add -host 172.26.20.110 dev tunl0 echo 1 > /proc/sys/net/ipv4/conf/all/hidden echo 1 > /proc/sys/net/ipv4/conf/tunl0/hidden 7/9/2001 Edward Chow Content Switch 23

LVS-DR Configuration

(Direct Routing)

Real servers need to configure a non-arp alias interface with virtual IP address and that interface must share same physical segment with load balancer.

• Only Director’s interface replies to VIP ARP request.

2. Scheduling/

• Director only rewrites server MAC address; IP packet not changed  Fast!

VMAC GMAC VMAC CIP VIP

1. request Rewrite packet

Director RMAC1

Real Server1

Internet CIP VMAC RMAC3 CIP VIP RMAC2

Real Server2

RMAC3 Route/ Switch

Real Server3

Client GMAC: Gateway MAC address Edward Chow 7/9/2001 Content Switch 24

LVS-DR Configuration Step 3. Process Request • Real server returns request.

• Request goes directly through switch/router; not Director.

1. request

VMAC GMAC VMAC CIP VIP

2. Scheduling/ Rewrite packet

Linux Director

Real

RMAC1

Server1

Internet VIP CIP VMAC RMAC3 CIP VIP RMAC2

Real Server2

CIP Client Switch RMAC3 RMAC3 GMAC VIP CIP 4. Receive reply 7/9/2001 GMAC: Gateway MAC address Edward Chow

3. Process

Real Server3

Request

Content Switch 25

LVS-DR Configuration

(Direct Routing)

Real servers need to configure a non-arp alias interface with virtual IP address and that interface must share same physical segment with load balancer.

• Load balancer only rewrites server MAC address; IP packet not changed  Fast!

1. request

VMAC GMAC VMAC CIP VIP

2. Scheduling/ Rewrite packet

Linux Director RMAC1

Real Server1

Internet VIP CIP VMAC RMAC3 CIP VIP RMAC2

Real Server2

CIP Client Switch RMAC3 RMAC3 GMAC VIP CIP 4. Receive reply 7/9/2001 GMAC: Gateway MAC address Edward Chow

3. Process

Real Server3

Request

Content Switch 26

LVS-DR Setup Commands

#The load balancer (LinuxDirector), kernel 2.2.14 or later echo 1 > /proc/sys/net/ipv4/ip_forward ipvsadm -A -t 172.26.20.110:23 -s wlc ipvsadm -a -t 172.26.20.110:23 -r 172.26.20.112 –g # The real server 1, 172.26.20.112, kernel 2.2.14 or later echo 1 > /proc/sys/net/ipv4/ip_forward ifconfig lo:0 172.26.20.110 netmask 255.255.255.255 broadcast 172.26.20.110 up route add -host 172.26.20.110 dev lo:0 echo 1 > /proc/sys/net/ipv4/conf/all/hidden echo 1 > /proc/sys/net/ipv4/conf/lo/hidden 7/9/2001 Edward Chow Content Switch 27

Persistence Handling in LVS

• Sticky connections Examples: – FTP control (port21), data (port20)

For passive FTP, the server tells the clients the port that it listens to, the client initiates the data connection connecting to that port. For the LVS/TUN and the LVS/DR, LinuxDirector is only on the client to-server half connection, so it is imposssible for LinuxDirector to get the port from the packet that goes to the client directly.

– SSL Session: port 443 for secure Web servers and port 465 for secure mail server, key for connection must be chosen/exchanged.

• Persistent port solution: –

First accesses the service, LinuxDirector create a template between the given client and the selected server, then create an entry for the connection in the hash table.

The template expires in a configurable time, and the template won't expire until all its connections expire.

The connections for any port from the client will send to the server before the template expires.

The timeout of persistent templates can be configured by users, and the default is 300 seconds

7/9/2001 Edward Chow Content Switch 28

Client CIP

HA-LVS Configuration High Available

MON Internet Linux Director Heart Beat

Real Server1 Real Server2

1. When Backup Director detects Linux Director failure through heart beat protocol, “graciously negotiate”  the take-over of VIP Provide fault-tolerant 7/9/2001 MON

Real

Backup Director

Server3

2. Monitor server processes run on real servers  Route requests to server processes that are alive. Initiate restart/repair Edward Chow Content Switch 29

Performance of LVS-based Systems

“We ran a very simple LVS-DR arrangement with one PII-400 (2.2.14 kernel)directing about 20,000 HTTP requests/second to a bank of about 20 Web servers answering with tiny identical dummy responses for a few minutes. Worked just fine.” Jerry Glomph Black, Director, Internet & Technical Operations, RealNetworks “I had basically (1024) four class-Cs of virtual servers which were loadbalanced through a LinuxDirector (two , actually -- I used redundant directors ) onto four real servers which each had the four different class Cs aliased on them.” "Ted Pavlic" 7/9/2001 Edward Chow Content Switch 30

LVS Usage Survey 2/15/2001 Lorn Key

Clusters 20 1 2 2 2 Directors Per Cluster Total Real Servers Routing Methods Schedule Methods Types of Real Servers Service Offered File System Replication Monitoring Software 7/9/2001 2 170 DR/NAT RR/WLC RH6.2

WWW rsync Heartbeat ldirectord 2 12 DR WRR 2 4 NAT LC Linux WWW/ other Win Linux WWW DB rsync Coda NFS Nanny/ Pulse Edward Chow Heartbeat Mon 2 15 DR WLC 2 6 NAT WLC Linux Solaris WWW SMTP RH WWW Custom rsync custom Nanny Pulse Heartbeat Content Switch 31

C. Edward Chow Department of Computer Science University of Colorado at Colorado Springs Sponsored by Computer Comm. Lab/ITRI

Content Switch Topics

• What is a Content Switch?

• What Services it Can Provide • Content Switch Example • Related Technologies • Content Switch Architecture and Basic Operations • TCP Delay Binding and Related Improvement • Content Switch Rule and Conflict Detection • Conclusion 7/9/2001 Edward Chow Content Switch 33

Content Switch (CS)

• Route packets based on high layer (Layer 5/7) headers and content.

• Examples: – Direct Web traffic based on pattern of • URLs, cookies – URL Switching • XML Tag Value– Web Switching – Can Route incoming email based on email address; Connect POP/IMAP based on login • Web switches and Intel XML Director/accelerator are special cases of content switch.

7/9/2001 Edward Chow Content Switch 34

What Services It Can Provide

• Enabling premium services for e-commerce, ISP, and Web hosting providers • Load Balancing and High Available Server Clusters: Web, E-commerce, Email, Computing, File, SAN • Policy-based networking, differential/QoS services. • Firewall, Strengthening DoS protection, cache/firewall load-balancing • ‘Flash-crowd' management • Email Spam Protection, Virus Detection/Removal • Applet Authentication/Filtering 7/9/2001 Edward Chow Content Switch 35

F5 VRM Solution

Site II losangeles.domain.com

Site I newyork.domain.com

Router BIG-IP 3-DNS GLOBAL-SITE BIG-IP

7/9/2001

Server Array Webmaster User london.domain.com

Edward Chow

Local DNS Site III tokyo.domain.com

Content Switch 36

Intel Netstructure XML Director 7280

• Example of Rule: Server1: create */order.asp & //Amount[Value >= 10000] 7/9/2001 Edward Chow Content Switch 37

Phobos In-Switch

• •

Only

load balancing switch in a PCI card form factor

Plugs

directly into any server PCI slot • •

Supports

up to 8,192 servers, ensuring availability and maximum performance • Six different algorithms are available for optimum performance: Round Robin, Weighted Percentage, Least Connections, Fastest Response Time, Adaptive and Fixed.

Provides

site failover to other servers for high-availability of the web • U.S. Retail $1995.00

7/9/2001 Edward Chow Content Switch 38

E-Commerce Example: 1. Client

Client submits via HTTP/Post (or SOAP) the following purchase in XML: CCL 111222333 309121544 IBM Thinkpad T21 5000 10 50000 309121538 Intel wireless LAN PC Card 200 10 2000 52000 7/9/2001 Edward Chow Content Switch 39

E-Commerce Example: 2. Content Switch

• Content switch receives the packet.

• Recognize it is a http post request from http request line POST /purchase.cgi HTTP/1.1

• Recognize it is an XML document from the meta header content-type: TEXT/XML • Parsing XML content • Extract values 52000 CCL of tag sequences : purchase/totalAmount purchase/customerName • Rule 1 is matched and packet is routed to one of highSpeedServers.

Rule 1: if (xml.purchase/totalAmount > 5000) routeTo(highSpeedServers); Rule 2: if (xml.purchase/customerName == CCL) routeTo(specialCustomerServers); 7/9/2001 Edward Chow Content Switch 40

No Free Lunch: Penalty of Having Content Switch

Layer 4 Switching Layer 7 Switching

packet header extraction fixed short fields varying length long fields switch rule matching hash table look up pattern matching  Increased packet processing time.

• For XML Director/Accelerator, it needs to parse XML document and match tag sequences.

 1-3? order of processing time

Size of XML Document (Bytes) XML Content Extract Time (ms)

600 14 7000 67104 21 53 7/9/2001 Edward Chow Content Switch 41

Related Technologies

• Application level solution: Proxy server; Apache/Tomcat/Servlet; Microsoft NLB • Kernel level layer 4 load balancing solution: http://www.linuxvirtualserver.org/ – – – Joseph Mark’s presentation LVS-NAT(Network Address Translation) web page LVS-IP Tunnel web page – LVS-DR (Direct Routing) web page • Hardware solution: Cisco 11000, F5 (Big IP), Alteon Web Systems, Foundry Networks (ServerIron), Excellent information on: Foundry ServerIron Installation and Configuration Guide, May 2000. • Routing table lookup: Longest prefix (Gupta/McKeown) 7/9/2001 Edward Chow Content Switch 42

Basic Operations of Content Switching

CS: Content Switching

CS Rules

CS Rule Editor Incoming Packets Packet Classification Header Content Extraction CS Rule Matching Algorithm 7/9/2001

Network Path Info Server Load Status

Packet Routing (Load Balancing) Edward Chow Forward Packet To Servers Content Switch 43

Content Switch Architecture

Apostolopoulos Infocom 2000 7/9/2001 Edward Chow Content Switch 44

Content Switch Architecture

Case A: Controller finds there is an entry in its Hash Table, Route request to “sticky connection” outgoing port Hash Table 7/9/2001

Client

Edward Chow

Real Server1

Content Switch 45

Content Switch Architecture

Case B: Step 1. Controller finds there is no entry in Hash Table, Route request to content switch processor Hash Table 7/9/2001

Client

Edward Chow

Real Server1

Content Switch 46

Content Switch Architecture

CS Rules Step2. CS processor a. Extract content/Match CS rules b.Route request c. Setup Sequence# modification on server side port

Case B: Step 1. Controller finds there is no entry in Hash Table, Route request to content switch processor Hash Table pkt Modification info

Real Server1

Edward Chow Content Switch 47

Content Switch Architecture

CS Rules

Step2. CS processor a. Extract content/Match CS rules b.Route request c. Setup Sequence# modification on server side port Case B: Step 1. Controller finds there is no entry in Hash Table, Route request to content switch processor Hash Table Edward Chow

Real Server1

pkt Modification info

Step 3. At server side port, Return pkts are modified Sequence#/IP addr/Chksum Route back to client

Content Switch 48

Efficient Software Architecture

• Tasks: Million packets with thousand of rules to match and load balancing algorithms to run.

• How to assign tasks to the (network) processors and threads?

– Packet Extraction (Understand header formats, XML parsing) – Content Switching Rule Matching – Packet Routing (Load Balancing, Bandwidth Control) • How Much Packet Processing Should Controllers Do?

• What a controller can do?

• A Typical Parallel Processing Problem?

7/9/2001 Edward Chow Content Switch 49

TCP Delay Binding (Splicing)

client SYN(CSEQ) content switch step1 server ACK(DSEQ+1) DATA(CSEQ+1) ACK(DSEQ+1) DATA(DSEQ+1) ACK(CSEQ+ ACK(DSEQ+

7/9/2001

LenR lenD +1) DATA(?) 2 nd ACK(?) request SYN(DSEQ) ACK(CSEQ+1) +1) step2 step3 step5 step6 step7 step8 step9 step4 SYN(CSEQ) SYN(SSEQ) ACK(CSEQ+1) ACK(SSEQ+1) DATA(CSEQ+1) ACK(SSEQ+1) DATA(SSEQ+1) ACK(CSEQ+ lenR +1) step10 ACK(SSEQ+ lenD +1) step11 lenR: size of http request.

lenD: size of return document

Edward Chow Content Switch 50

Improve Content Switching

• Setup CS-Real Server connections ahead of time (Persistent HTTP Connections). NetScale  Reduce TCP 3-way handshake time • Pre-allocate Server Scheme (Guess Real Server based on the TCP Sync) • Sequence# modification on every return pkt  to recompute checksum also.

Need • Filter Scheme (Offload Sequence# modification/rule matching to real servers).

• Buffering/Pipeline (aggregate) Requests 7/9/2001 Edward Chow Content Switch 51

Pre-Allocate Server Scheme

client content switch Pre-allocated server SYN(CSEQ) SYN(CSEQ) step1 ACK(SSEQ + 1) SYN(SSEQ) ACK(CSEQ+1) step2 step3 SYN(SSEQ) ACK(CSEQ+1) ACK(SSEQ+1)

.

DATA(CSEQ+1) ACK(SSEQ+1) DATA(SSEQ+1) ACK(CSEQ+ LenR +1) step4 DATA(CSEQ+1) ACK(SSEQ+1) step5 DATA(SSEQ+1) ACK(CSEQ+ lenR +1) step6 ACK(SSEQ+ lenD +1) ACK(SSEQ+ lenD +1)

• Guess routing decision based on IP/Port#/History • Advantage: • Faster than TCP delay binding.

• Possible direct route between client and server • Reduce session processing overhead 7/9/2001 Content Switch 52

Degenerated to TCP Delayed Binding If Guess

client SYN(CSEQ)

is Wrong

content switch step1 Pre-allocated server SYN(CSEQ) SYN(SSEQ)/ ACK(CSEQ+1) ACK(SSEQ + 1) step2 SYN(SSEQ)/ ACK(CSEQ+1) step3 ACK(SSEQ+1) DATA(CSEQ+1)/ ACK(SSEQ+1) step4 DATA(CSEQ+1)/ACK(SSEQ+1)

Server sent HTTP 404

step6 step5 DATA(SSEQ+1) FIN(CSEQ+lenR+1)) step7 SYN(CSEQ) Right server SYN(RSEQ)/ ACK(CSEQ+1)

Sequence # conversion needed for right server now

step8 step9 step10 DATA(SSEQ+1)/ACK(CSEQ+ LenR +1) ACK(RSEQ+1) DATA(CSEQ+1)/ACK(RSEQ+1) DATA(RSEQ+1)/ACK(CSEQ+ lenR +1) ACK(SSEQ+lenD+1 step11 step12 ACK(RSEQ+ lenD +1)

7/9/2001 Edward Chow Content Switch 53

Filter Process Scheme

client SYN(CSEQ) content switch step1 SYN(DSEQ)/ACK(CSEQ+1) step2 ACK(DSEQ+1) step3 DATA(CSEQ+1)/ACK(DSEQ+1) step5a

Filter Process run on server

server step4 Migrate step5b (Data, CSEQ, DSEQ) SYN(CSEQ) SYN(SSEQ)/ ACK(CSEQ+1) step6 ACK(SSEQ+1) step7 DATA(CSEQ+1)/ACK(SSEQ+1) step8 DATA(DSEQ+1) ACK(CSEQ+ LenR +1) step9 DATA(SSEQ+1) ACK(CSEQ+ lenR +1) ACK(DSEQ+ lenD +1) step10 ACK(SSEQ+ lenD +1)

7/9/2001 Edward Chow Content Switch 54

Pre-allocate performance plot

Plot of response time vs document size

500000 480000 460000 440000 420000 400000 380000 360000 340000 320000 300000 280000 260000 240000 220000 200000 180000 160000 140000 120000 100000 80000 60000 40000 20000 0 0 10000 20000

byte s

30000 40000

Series 1

- Basic scheme with no rule matching module inserted, i.e., using default IPVS.

Series1 Series2 Series3 Series4

Series 2

- Basic scheme with the rule matching module inserted.

Series 3

- Pre-allocate scheme with all hits, i.e., where all pre allocate guesses were correct.

Series 4

- Pre-allocate scheme with all misses, i.e., where all pre-allocate guesses were wrong.

7/9/2001 Edward Chow Content Switch 55

Handling multiple requests in a Keep-Alive connection

• Determine when new request arrives – Verify that previous request has been completely received – Request data size is > 0 • Key assumption is only one outstanding request is sent at a time by client, i.e., requests are not pipelined • Reuse connections – Store each connection control information in a hash table keyed by real server address, once it is established.

7/9/2001 Edward Chow Content Switch 56

Quiz

• Web server keeps the TCP connection alive, expecting the browser to return for images and in-line media files.

• How many keep-alive connections are setup on IE5 and Netscape 4.7 for web page with many .jpg/.gif images?

• Can these image requests be pipelined from client browser to web server?

7/9/2001 Edward Chow Content Switch 57

Multiple HTTP Requests from One TCP Connection NAT approach

server1 client Index.htm

Content Switch cs.jpg

server2

.

.

.

server9

• A keep alive TCP connection may include multiple HTTP “GET” requests.

• Content Switch examines each “GET” request and makes new routing decision.

• Content Switch establishes another connection with a different server based on the routing decision.

• Those HTTP responses from different servers need to be interleaved and seen by the user as if from the same server.

• Solutions: In order delivery (buffer requirement); Out of order delivery (seq# tracking)?

• Problems: Should we throw away earlier html requests if receive later requests?

7/9/2001 Edward Chow Content Switch 58

Multiple HTTP Requests from One TCP Connection

client Content Switch server1 server2

.

.

.

server9

• Can servers return documents directly to client in keep-alive session case?

• Can equivalent VS-Tunnel or VS-DR be implemented using Content Switch?

7/9/2001 Edward Chow Content Switch 59

Content Switch Rule Survey

Survey shows that existing switches support • rules in basic (condition action) or (action condition) form • some define condition as class, then specify the action in separate statement or command • simple single conditional term • command line interface (to facilitate incremental update?) • Actions can include reject, forward, put in queue (for bandwidth control, scheduling) 7/9/2001 Edward Chow Content Switch 60

Content Switch Rule Design

• Rule syntax generic to support all Intended features.

• Use simple C if statement syntax rule: if (condition) { action } – Easy to read – Allow optimization using c compiler • Condition consists of multiple terms of – variable relational_operator value e.g. xml.purchase/totalAmount > 50000 smtp.to == “[email protected]” cookie.name == “servlet1” bitmatch(64, 8, 0xff) == 64 # above mean TTL=64 idea from netfilter universal filter – suffix(variable, string) – regex(variable, pattern) e.g. suffix(url, “gif”) e.g. regex(url, “/purchase”) • Action consists of reject, forward(server| queue) loadBalance(serverGroup, loadBalancingAlgorihtm) 7/9/2001 Edward Chow Content Switch 61

Efficient CS Rule Matching

• Brute force, strict priority: Rules are executed in sequential manner.

• Efficient Rule Matching Method: – Organize Rules so that rules can be skipped based on existing content types.

– Utilize compiler optimization technique.

7/9/2001 Edward Chow Content Switch 62

7/9/2001

Simple CS Rule Editor GUI

Edward Chow Content Switch 63

Conflict Detection on Content Switching Rules

• Detect conflicts among rules or rule set.

• Absolute conflict type: r1: if (xml.purchase/customerName == “CCL”) {routeTo(r1)} r2: if (xml.purchase/customerName == “CCL”) {routeTo(r2)} • Potential conflict type: r1: if (xml.purchase/totalAmount > 5000) {routeTo(quickServers)} r2: if (xml.purchase/totalAmount >20000) {routeTo(superServers)} • Algorithm: Build tree with the same variable, check operator and value to see if they are the same or lead to potential conflict, compare actions to decide conflict type or duplication.

• Developed conflict detection algorithm for rules with multiple term condition. Can be applied to policy-based rules conflict detection.

• Editor can build these trees while a user enters rules and warns about conflict right away. 7/9/2001 Edward Chow Content Switch 64

XML Tag Value Extraction

• A xmlContentExtract() is built to extract the tag values of a list of unique tag sequences.

• It is based on clark cooper’s expat 1.0 xmlparser.

• Its argument include the pointer to an XML document, the pointer to the array of strings (unique xml tag squences we follow the xsl selector syntax), and the number of sequences.

• It return the list of a structure node, with the tag sequence, its attribute, and its value.

• Currently, it supports one attribute and tag sequece needs to be unique.

7/9/2001 Edward Chow Content Switch 65

Status of UCCS ACSD Project

• A Linux-based LVS content switch called LCS was developed • Sponsored by CCL/ITRI. • Based on Linux-2.2.16-3, current release LCS02.

• ip_forward.c, ip_masq.c, ip_vs.c are modified to implement basic TCP delay binding.

• ip_cs.c are added for most of the content switching functions with http header extraction and xml content extraction.

• A simple Java-based ruleEdit program was created for rule editing and conflict detection.

• Rule translate program to convert the rule set into a Linux kernel module and allow dynamic replacement of rule without restarting the system.

• LCS is being ported to Intel IXP 1200 network processor.

7/9/2001 Edward Chow Content Switch 66

LCS Demo

• We set up viva.uccs.edu as a content switch and wait and ace as two real servers.

• URL Switching demo: http://viva.uccs.edu/~lcs1/ http://viva.uccs.edu/~lcs2/ route to ace .uccs.edu

route to wait.

uccs.edu

• XML Web Switching (E-commerce applications) http://archie.uccs.edu/~acsd/lcs/xmldemo.html

When the 2 nd subtotal tag >=50000 , route to ace .

When the 2 nd subtotal tag <50000 , route to wait .

• Let us know if you have problem accessing them.

My students may be working on LCS extension.

7/9/2001 Edward Chow Content Switch 67

LCS Rule Example

R4: if (atoi(rule_fields[1].value) >= 50000) { return route_to("ace", NON_STICKY, saddr); } R5: if ((atoi(rule_fields[1].value) > 0) && (atoi(rule_fields[1].value) < 50000)){ IP_RULE_MSG("serevr=wait\n"); return route_to("wait", NON_STICKY, saddr); } R10: if (strstr(url, "lcs1") != NULL) { IP_RULE_MSG("server=ace\n"); return route_to("ace", NON_STICKY, saddr); } R11: if(strstr(url, "lcs2") != NULL){ IP_RULE_MSG("server=wait\n"); return route_to("wait", NON_STICKY, saddr); } 7/9/2001 Edward Chow Content Switch 68

Related Load Balancing Research Results

• Modified Apache status module to report – Total bytes to be transferred by child processes – Average document transfer speed • Modified LB-DNS to receive server status and bandwidth probing results.

• LB-DNS returns IP-address of the best server based a weight contributed by both server load and bandwidth.

• Modified WebStone benchmark to test the performance of load balancing web server clusters.

7/9/2001 Edward Chow Content Switch 69

Load balancing Systems

Bandwidth Probe Results Modified Web Server 1 Statistics Gathering Daemon Server Delay Server Ranking /tmp/StatFile Modified Web Server n LBA: Modified DNS Edward Chow Request for Web pages Content Switch 70 7/9/2001

7/9/2001

Connection Rate: LBA vs. Round-Robin

Server connection rate for 4 servers

1000 800 600 400 200 0 1 2 3 4 5 6 7 8 9 10 11 12 load balancing system 418.2 656.6 907.9 420 round-robin 636.7 322.6 711.6 420.5 638.3 670.6 683.4

899 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6 327.6

Update for LBA , per sec

Round robin only run once load balancing system round-robin Edward Chow Content Switch 71

Conclusion

• Content Delivery Network improves internet content retrieval • LVS provides a low cost layer 4 switching service for cluster.

• Linux Content Switch with generic rules can be easily configured for wide-variety of value-added services: – Premium services – Load balancing/High Available server farm.

– Firewall – Bandwidth control/Traffic shaping • Require efficient SW/HW architecture and rule matching algorithms to reduce processing overhead.

• Content rule design/conflict detection are important and challenging.

• TCP delay binding can be improved.

7/9/2001 Edward Chow Content Switch 72

• • • • • • • • • • •

References

http://www.linuxvirtualserver.org/ http://www.akamai.com/ http://cs.uccs.edu/~chow/pub/contentsw/talk/contentswitching.ppt

[Aron2000] Aron, Mohit, “Differential and predictable QoS in web server systems”, Ph.D dissertation Rice University, Oct. 2000.

[Zhang97] Lixia Zhang, Sally Floyd, and Van Jacobson, “Adaptive Web Caching,” April 25, 1997. http://www-nrg.ee.lbl.gov/floyd/web.html

[Esi2001] Edge Side Includes, http://www.esi.org/ . [Chow2001a] C. Edward Chow and Indira Semwal, “Web Load Balancing Through More Accurate Server Report,” Proceeding of PDCAT 2001, Taipei, Taiwan.

[Chow2001b] C. Edward Chow, Ganesh Godavari, and Jianhua Xie, “Content Switch Rules and their Conflict Detection,” Proceeding of PDCAT 2001, Taipei, Taiwan.

[Chow2001c] C. Edward Chow and Weihong Wang, “The Design and Implementation of Linux LVS based Content Switch”, Proceeding of PDCAT 2001, Taipei, Taiwan.

[

Aversa2000] Luis Aversa and Azer Bestavros, “Load Balancing a Cluster of Web Servers:

Using Distributed Packet Rewriting,”

Proceedings of IPCCC 2000.

[Cao98] PeiCao, Jin Zhang and Kevin Beach, “Active Cache: Caching Dynamic Contents on the Web” http://www.cs.wisc.edu/~cao/papers/active-cache.ps

7/9/2001 Edward Chow Content Switch 73