Proxy Lab Recitation I Monday Nov 20, 2006 Outline • What is a HTTP proxy? • HTTP Tutorial – HTTP Request – HTTP Response • Sequential.
Download
Report
Transcript Proxy Lab Recitation I Monday Nov 20, 2006 Outline • What is a HTTP proxy? • HTTP Tutorial – HTTP Request – HTTP Response • Sequential.
Proxy Lab Recitation I
Monday Nov 20, 2006
Outline
• What is a HTTP proxy?
• HTTP Tutorial
– HTTP Request
– HTTP Response
• Sequential vs. concurrent proxies
• Caching
What is a proxy?
Client
Browser
Proxy
Server
www.google.com
• Why a proxy?
– Access control (allowed websites)
– Filtering (viruses, for example)
– Caching (multiple people request CNN)
Brief HTTP Tutorial
• Hyper-Text Transfer Protocol
– Protocol spoken between a browser and a web-server
• From browser web-server: REQUEST
– GET http://www.google.com/
HTTP/1.0
• From web-server browser: RESPONSE
– HTTP 200 OK
– Other stuff…
HTTP Request
Request Type
Host
Path
Version
GET http://csapp.cs.cmu.edu/simple.html HTTP/1.1
Host: csapp.cs.cmu.edu
User-Agent: Mozilla/5.0 ...
Accept: text/xml,application/xml ...
Accept-Language: en-us,en;q=0.5 ...
Accept-Encoding: gzip,deflate ...
An empty line terminates a HTTP request
HTTP Request
GET http://csapp.cs.cmu.edu/simple.html HTTP/1.1
Host: csapp.cs.cmu.edu
User-Agent: Mozilla/5.0 ...
Accept: text/xml,application/xml ...
Accept-Language: en-us,en;q=0.5 ...
Accept-Encoding: gzip,deflate ...
The Host header is optional in HTTP/1.0 but we
recommend that it be always included
HTTP Request
GET http://csapp.cs.cmu.edu/simple.html HTTP/1.1
Host: csapp.cs.cmu.edu
User-Agent: Mozilla/5.0 ...
Accept: text/xml,application/xml ...
Accept-Language: en-us,en;q=0.5 ...
Accept-Encoding: gzip,deflate ...
The User agent identifies the browser type. Some websites
use it to determine what to send. And reject you if you say
you use MyWeirdBrowser Proxy must send this and all other
headers through…
HTTP Response
Status
HTTP/1.1 200 OK
Date: Mon, 20 Nov 2006 03:34:17 GMT
Server: Apache/1.3.19 (Unix) …
Last-Modified: Mon, 28 Nov 2005 23:31:35 GMT
Content-Length: 129
Connection: Keep-Alive
Content-Type: text/html
Status indicates whether it was successful or not, if it is a “redirect”, etc.
The complete response should be transparently sent back to the
client by the proxy.
HTTP Response
HTTP/1.1 200 OK
Date: Mon, 20 Nov 2006 03:34:17 GMT
Server: Apache/1.3.19 (Unix) …
Last-Modified: Mon, 28 Nov 2005 23:31:35 GMT
Content-Length: 129
Connection: Keep-Alive
Content-Type: text/html
This field identifies how many bytes are there in the response.
Not sent by all web-servers. DO NOT RELY ON IT !
Concurrent Proxy
• Need to handle multiple requests simultaneously
– From different clients
– From the same client
• E.g., each individual image in a HTML document needs to be
requested separately
• Serving requests sequentially decreases
throughput
– Server is waiting for I/O most of the time
– This time can be used to start serving other clients
– Multiple outstanding requests
Concurrent Proxy
• Use threads for making proxy concurrent
– Create one thread for each new client request
– The thread finishes and exists after serving the client
request
– Use pthread library
• pthread_create(), pthread_detach(), etc.
• Can use select() as well for adding concurrency
– Much more difficult to get right
Caching Proxy
• Most geeks visit http://slashdot.org/ every
2 minutes
– Why fetch the same content again and again?
– (If it doesn’t change frequently)
• The proxy can cache responses
– Serve directly out of its cache
– Reduces latency, network-load
Caching: Implementation Issues
• Use the GET URL (host/path) to locate the
appropriate cache entry
• THREAD SAFETY
– A single cache is accessed by multiple
threads
– Easy to create bugs: thread 1 is reading an
entry, while thread 2 is deleting the same
entry
General advice
• Use RIO routines
– rio_readnb, rio_readlineb
– Be very careful when you are reading line-by-line
(HTTP request), versus just a stream of bytes (HTTP
response)
• When to use strcpy() vs. memcpy()
• gethostbyname(), inet_ntoa() are not threadsafe!
• Path: sequential + concurrency + caching