Proxy Lab Recitation I Monday Nov 20, 2006 Outline • What is a HTTP proxy? • HTTP Tutorial – HTTP Request – HTTP Response • Sequential.

Download Report

Transcript Proxy Lab Recitation I Monday Nov 20, 2006 Outline • What is a HTTP proxy? • HTTP Tutorial – HTTP Request – HTTP Response • Sequential.

Proxy Lab Recitation I
Monday Nov 20, 2006
Outline
• What is a HTTP proxy?
• HTTP Tutorial
– HTTP Request
– HTTP Response
• Sequential vs. concurrent proxies
• Caching
What is a proxy?
Client
Browser
Proxy
Server
www.google.com
• Why a proxy?
– Access control (allowed websites)
– Filtering (viruses, for example)
– Caching (multiple people request CNN)
Brief HTTP Tutorial
• Hyper-Text Transfer Protocol
– Protocol spoken between a browser and a web-server
• From browser  web-server: REQUEST
– GET http://www.google.com/
HTTP/1.0
• From web-server  browser: RESPONSE
– HTTP 200 OK
– Other stuff…
HTTP Request
Request Type
Host
Path
Version
GET http://csapp.cs.cmu.edu/simple.html HTTP/1.1
Host: csapp.cs.cmu.edu
User-Agent: Mozilla/5.0 ...
Accept: text/xml,application/xml ...
Accept-Language: en-us,en;q=0.5 ...
Accept-Encoding: gzip,deflate ...
An empty line terminates a HTTP request
HTTP Request
GET http://csapp.cs.cmu.edu/simple.html HTTP/1.1
Host: csapp.cs.cmu.edu
User-Agent: Mozilla/5.0 ...
Accept: text/xml,application/xml ...
Accept-Language: en-us,en;q=0.5 ...
Accept-Encoding: gzip,deflate ...
The Host header is optional in HTTP/1.0 but we
recommend that it be always included
HTTP Request
GET http://csapp.cs.cmu.edu/simple.html HTTP/1.1
Host: csapp.cs.cmu.edu
User-Agent: Mozilla/5.0 ...
Accept: text/xml,application/xml ...
Accept-Language: en-us,en;q=0.5 ...
Accept-Encoding: gzip,deflate ...
The User agent identifies the browser type. Some websites
use it to determine what to send. And reject you if you say
you use MyWeirdBrowser  Proxy must send this and all other
headers through…
HTTP Response
Status
HTTP/1.1 200 OK
Date: Mon, 20 Nov 2006 03:34:17 GMT
Server: Apache/1.3.19 (Unix) …
Last-Modified: Mon, 28 Nov 2005 23:31:35 GMT
Content-Length: 129
Connection: Keep-Alive
Content-Type: text/html
Status indicates whether it was successful or not, if it is a “redirect”, etc.
The complete response should be transparently sent back to the
client by the proxy.
HTTP Response
HTTP/1.1 200 OK
Date: Mon, 20 Nov 2006 03:34:17 GMT
Server: Apache/1.3.19 (Unix) …
Last-Modified: Mon, 28 Nov 2005 23:31:35 GMT
Content-Length: 129
Connection: Keep-Alive
Content-Type: text/html
This field identifies how many bytes are there in the response.
Not sent by all web-servers. DO NOT RELY ON IT !
Concurrent Proxy
• Need to handle multiple requests simultaneously
– From different clients
– From the same client
• E.g., each individual image in a HTML document needs to be
requested separately
• Serving requests sequentially decreases
throughput
– Server is waiting for I/O most of the time
– This time can be used to start serving other clients
– Multiple outstanding requests
Concurrent Proxy
• Use threads for making proxy concurrent
– Create one thread for each new client request
– The thread finishes and exists after serving the client
request
– Use pthread library
• pthread_create(), pthread_detach(), etc.
• Can use select() as well for adding concurrency
– Much more difficult to get right
Caching Proxy
• Most geeks visit http://slashdot.org/ every
2 minutes
– Why fetch the same content again and again?
– (If it doesn’t change frequently)
• The proxy can cache responses
– Serve directly out of its cache
– Reduces latency, network-load
Caching: Implementation Issues
• Use the GET URL (host/path) to locate the
appropriate cache entry
• THREAD SAFETY
– A single cache is accessed by multiple
threads
– Easy to create bugs: thread 1 is reading an
entry, while thread 2 is deleting the same
entry
General advice
• Use RIO routines
– rio_readnb, rio_readlineb
– Be very careful when you are reading line-by-line
(HTTP request), versus just a stream of bytes (HTTP
response)
• When to use strcpy() vs. memcpy()
• gethostbyname(), inet_ntoa() are not threadsafe!
• Path: sequential  + concurrency  + caching