Transcript Document

Hypertext Transfer Protocol

temporary location of course "Net Technologies": http://dims.karelia.ru/~alexmou/

• HTTP/1.1 Authors – Roy Fielding (UCI) – Jim Gettys - Editor (Digital ISBU / W3C) – Jeff Mogul (Digital / WRL) – Henrik Frysyk Nielsen (W3C) – Tim Berners-Lee (W3C)

Tim Berners-Lee

• IETF HTTP Working Group – Larry Masinter - Working Group Chair Thanks to Jim Gettys, Digital Equipment Corporation, 1996 and James Marshall, 1997.

Rev. 1.05 / 14.01.2007

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

HTTP and OSI RM HTTP TCP IP Physical TCP/IP APPLICATION PRESENTATION SESSION TRANSPORT NETWORK DATA LINK PHYSICAL Layer 7 Layer 6 Layer 5 Layer 4 Layer 3 Layer 2 Layer 1 OSI/RM

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

What is HTTP?

• Application-level protocol for distributed, collaborative, • • • • hypermedia information systems.

Transaction-oriented client/server protocol.

HTTP uses TCP as transport basis.

Text-based commands and directives (not binary).

HTTP (original version) was a "stateless" protocol; each transaction was treated independently. A typical implementation creates a new TCP connection between client and server for each transaction and then terminates the connection as soon as the • transaction completes.

Flexible in formats it can handle.

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

History of HTTP WWW=HTTP

• • HTTP/0.9, 1990. Graphical user interface with hyper links to other information both text, graphics, sound, video etc. starts at “homepage”.

1994 - population explosion on net with many countries providing • access.

HTTP/1.0 (RFC 1945, May 1996), the protocol was improved by allowing messages to be in the format of MIME-like messages, containing metainformation about the data transferred and modifiers on the request/response semantics.

• However, HTTP/1.0 does not sufficiently take into consideration the effects of hierarchical proxies, caching, the need for persistent connections, or virtual hosts.

• HTTP/1.1 (RFC 2068, Jan. 1997), (RFC 2616, June 1999).

Petrozavodsk State University, Alex Moschevikin, 2003

Screenshot of the first version of Netscape Navigator, 1994

NET TECHNOLOGIES

URI, URL, URN, difference

An

Uniform Resource Locator

(URL) is the term used to identify an Internet resource without the name specification (example, # anchor in HTML), and can be specified in a single line of text. There are more than 30 URI(URL)-schemes registered in IANA.

An

Uniform Resource Name

(URN) is the term used to identify an Internet resource, without the use of a scheme, and can be specified in a single line of text ("urn:isbn:n-nn-nnnnnn-n").

An

Uniform Resource Identifier

(URI) is the junction of URL and URN.

URI http://www.gleaners.org/faq.html#Q04 (#Q04 is not sent to http server) URLhttp://www.gleaners.org/faq.html

URN urn:ietf:rfc:2141 urn:ietf:std:50 urn:ietf:id:ietf-urn-ietf-06 urn:ietf:mtg:41-urn

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

URI - Uniform Resource Identifier

• • • Identifies the resource on host machine and and access method for that resource.

General form is : http://www.gde-to.tut:80/~brewery https://www.gde-to.tam/ ftp://anonymous:[email protected]/films/a.avi

Parts – Scheme or protocol – User name – :Password – DNS name of the host – TCP port – Path to and name of resource (index.html)

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

Understanding HTTP

direct connection: User Agent -> HTTP-request -> Web-server no end-to-end TCP connection between the User Agent and the origin server resource saving work of proxies (caching)

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

3 types of intermediate systems

traditional http-proxy (firewall + proxy, different version of HTTP) gateway, substituted origin server (non-http request following auth. http-request) tunnel (no operations on http requests and responses)

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

Structure of HTTP transactions

Like most network protocols, HTTP uses the client-server model: 1. An HTTP client opens a connection and sends a request message to an HTTP server.

2. The server then returns a response message with request status code, usually containing the resource that was requested.

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

Format of request or response

Both kinds of messages (request and response) consist of: • an initial line, • zero or more header lines, • a blank line (i.e. a CRLF by itself), and • an optional message body (e.g. a file, or query data, or query output). Put another way, the format of an HTTP message is:

Header1: value1 Header2: value2 Header3: value3

Initial lines and headers should end in CRLF (0D 0A).

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

Request methods

Method Description

OPTIONS A request for information about the options available.

GET A request to retrieve information.

HEAD POST Like a GET except that the server's response must not include an entity body; all of the header fields in the response are the same as if the entity body were present. This enables a client to get information about a resource without transferring the entity body.

A request to accept the attached entity as a new subordinate to the identified URL.

PUT DELETE etc.

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

Initial request line

A request line has three parts, separated by spaces: • a method name, • the local path of the requested resource, • and the version of HTTP being used. A typical request line is:

GET /path/to/file/index.html HTTP/1.0

Notes: • Method names are always uppercase . • The path is the part of the URL after the host name, also called the request URI. • The HTTP version always takes the form "HTTP/x.x", uppercase .

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

Initial response line

The initial response line, called the status line, also has three parts separated by spaces: • the HTTP version, • a response status code that gives the result of the request, and • an English reason phrase describing the status code.

Typical status lines are:

HTTP/1.0 200 OK

or

HTTP/1.0 404 Not Found

Notes: The status code is a three-digit integer, and the first digit identifies the general category of response: •

1xx

indicates an informational message only • • •

2xx 3xx 4xx

indicates success of some kind redirects the client to another URL indicates an error on the client's part •

5xx

indicates an error on the server's part

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

The most common status codes

The most common status codes are:

200 OK

-- the request succeeded, and the resulting resource (e.g. file or script output) is returned in the message body.

404 Not Found

-- the requested resource doesn't exist.

301 Moved Permanently 302 Moved Temporarily 303 See Other

(HTTP 1.1 only) -- The resource has moved to another URL (given by the

Location:

response header), and should be automatically retrieved by the client. This is often used by a CGI script to redirect the browser to an existing file.

500 Server Error

-- An unexpected server error. The most common cause is a server-side script that has bad syntax, fails, or otherwise can't run correctly.

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

Header lines

One line per header in the form of "

Header-Name: value

", ending with CRLF (RFC 822 format).

HTTP 1.0 defines 16 headers, though none are required. HTTP 1.1 defines 46 headers, and one (

Host:

)

is required

in requests.

Host: dfe3300.karelia.ru

From: [email protected]

User-agent: my_software/3.0Gold

Last-Modified: Fri, 31 Dec 1999 23:59:59 GMT

If an HTTP message includes a body , there are usually header lines in the message that describe the body. In particular, • The

Content-Type:

header gives the MIME-type of the data in the body, such as

text/html

• The

Content-Length:

or

image/gif

. header gives the number of bytes in the body.

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

Sample HTTP exchange (1)

To retrieve the file from the URL http://www.my_server.com/path/file.html

1. Open a socket to the host

www.my_server.com

, port 80 (use the default port of 80 because none is specified in the URL).

2. Then, send something like the following through the socket:

GET /path/file.html HTTP/1.1

Host: www.my_server.com

From: me@my_mail.ru

User-Agent: my_soft/3.0

[blank line here] Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

Sample HTTP exchange (2)

The server should respond with something like the following, sent back through the same socket:

HTTP/1.1 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/html Content-Length: 1354

Happy New Millennium!

(more file contents) . . .

After sending the response, the server closes the socket.

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

GET and POST (HTTP/1.0)

GET:

GET /path/script.cgi?home=Cosby&favorite+flavor=flies HTTP/1.0

User-Agent: my_soft/1.0

[blank line here]

POST:

POST /path/script.cgi HTTP/1.0

User-Agent: my_soft/1.0

Content-Type: application/x-www-form-urlencoded Content-Length: 32 home=Cosby&favorite+flavor=flies Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

URL-encoding

HTML form data is usually URL-encoded to package it in a GET or POST submission (RFC 2396).

1. Convert all "unsafe" characters in the names and values to "

%xx

", where "

xx

" is the ascii value of the character, in hex. "Unsafe" characters include

=

,

&

,

%

,

+

, non-printable characters, and any others you want to encode. For simplicity, you might encode all non alphanumeric characters. 2. Change all spaces to plusses. 3. String the names and values together with

=

and

&

, like

name1=value1&name2=value2&name3=value3

4. This string is your message body for POST submissions, or the query string for GET submissions. For example, if a form (in html document) has a field called " Number " that's set to " B52 ", and a field called " Text " that's set to " You & me ", the URL-encoded form data would be

Number=B52&Text=You+%26+me

with a length of 21.

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

Features of HTTP/1.1

• Superset of HTTP 1.0.

• Improvements : Faster response, by allowing multiple transactions to take place over • a single persistent connection. Faster response and great bandwidth savings, by adding cache • support. Faster response for dynamically-generated pages, by supporting chunked encoding, which allows a response to be sent before its total • length is known. Efficient use of IP addresses, by allowing multiple domains to be served from a single IP address (virtual hosts).

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

HTTP/1.1 clients

To comply with HTTP 1.1, clients must: • include the

Host:

header in each request; • accept responses with chunked data; • either support persistent connections, or include "

Connection: close

" header with each request; • handle the "

100 Continue

" response.

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

Chunked Transfer-encoding

If a server wants to start sending a response before knowing its total length (like with long script output), it might use the simple

chunked transfer-encoding

, which breaks the complete response into smaller chunks and sends them in series.

A chunked message body contains a series of

chunks

, followed by a line with "0" (zero), followed by optional footers (just like headers), and a blank line.

Each chunk consists of two parts: • a line with the size of the chunk data, in hex, possibly followed by a semicolon and extra parameters you can ignore (none are currently standard), and ending with CRLF. • the data itself, followed by CRLF.

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

Chunked encoding example

HTTP/1.1 200 OK Content-Type: text/plain Transfer-Encoding: chunked 1a; ignore-stuff-here abcdefghijklmnopqrstuvwxyz 10 1234567890abcdef 0 some-footer: some-value another-footer: another-value [blank line here] No chunked encoding:

HTTP/1.1 200 OK Content-Type: text/plain Content-Length: 42 some-footer: some-value another-footer: another-value abcdefghijklmnopqrstuvwxyz123 4567890abcdef Note the blank line after the last footer. The length of the text data is 42 bytes (1a + 10, in hex), and the data itself is

abcdefghijklmnopqrstuvwxyz1234567890abcdef

. The footers should be treated like headers, as if they were at the top of the response.

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

Persistent connections

Problem:

In practice, most Web pages consist of several files on the same server.

In HTTP 1.0 and before, TCP connections are closed after each request and response, so each resource to be retrieved requires its own connection. Opening and closing TCP connections takes a substantial amount of CPU time, bandwidth, and memory.

Solution:

Much can be saved by allowing several requests and responses to be sent through a single persistent connection. Persistent connections are the default in HTTP 1.1

, so nothing special is required to use them. Just open a connection and send several requests in series (called pipelining), and read the responses in the same order as the requests were sent. If a client includes the "

Connection: close

" header in the request, then the connection will be closed after the corresponding response.

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

The "100 Continue" response

On slow channels server might respond with an interim "

100 Continue

" response. This means the server has received the first part of the request.

HTTP/1.1 100 Continue HTTP/1.1 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/plain Content-Length: 42 some-footer: some-value abcdefghijklmnoprstuvwxyz1234567890abcdef

To handle this, a simple HTTP 1.1 client might read one response from the socket; if the status code is 100, discard the first response and read the next one instead.

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

Web traffic compression

There are few methods of web traffic compression (gzip, deflate, compress etc.).

The client asks the http server to use on of the supported compression algorithms, the server may send the requested document in compressed form. Decompression begins just after receiving the first bytes of http response (it is not necessary to receive all the document).

GET / HTTP/1.1

host: www.google.com

Accept-Encoding: gzip, deflate, compress Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

If-Modified-Since

To avoid sending resources that don't need to be sent, thus saving bandwidth, HTTP 1.1 defines the

If-Modified-Since:

and

If Unmodified-Since:

request headers. The former says "only send the resource if it has changed since this date"; the latter says the opposite. Clients aren't required to use them, but HTTP 1.1 servers are required to honor requests that do use them. Unfortunately, due to earlier HTTP versions, the date value may be in any of three possible formats (1st - the most legal):

If-Modified-Since: Fri, 31 Dec 1999 23:59:59 GMT If-Modified-Since: Friday, 31-Dec-99 23:59:59 GMT If-Modified-Since: Fri Dec 31 23:59:59 1999 Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

Caching documents

• Server side (for example, in case of dynamically generated pages) • Client side (in local files on hard disk and memory) • Intermediate http-proxies • Not all transactions can be cached, and a client or server can dictate that a certain transaction may be cached only for a given time limit

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

Caching in HTTP SERVER

Includes

Date:

,

Expires:

headers, or the

max-age

directive (server specified expiration times and validators) into HTTP response.

PROXIES and CLIENTS

How do they know when to kill a certain document in cache or whether store it at all?

Cache-Control: max-age=0 Cache-Control: no-cache Cache-Control: must-revalidate Pragma: no-cache (HTTP/1.0) … Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

Caching in HTTP

Web-publisher (programmer) recommends http server to include the corresponding header in http packet SERVER CLIENT CLIENT will not organize new request to SERVER until storing time (in seconds) of document exceeds

max-age

.

Problem of dynamic content of Web-sites.

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

Overall scheme

. . .

. . .

HTML-parser Mouse click on "Vote"

GET /cgi-bin/vote.pl?N=test&S=Vote HTTP/1.1

Host: www.server.ru

User-Agent: Mozilla/IE 6.0

[blank line]

CLIENT HTTP-agent HTTP request to Web-server

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

Overall scheme (continued)

STDOUT (POST)

N=test&S=Vote

OR $ENV (GET) Web-SERVER Common Gateway Interface

#!/usr/local/bin/perl if ($ENV{'REQUEST_METHOD'} eq "POST") { read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}) } elsif ($ENV{'REQUEST_METHOD'} eq "GET") { $buffer = $ENV{'QUERY_STRING'} } . . .

Perl script vote.pl

Reply to Web-server through CGI

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

Overall scheme (continued)

Content-type: text/html

Web-SERVER

Thank you . . .

HTML-document to SERVER HTTP agent

HTTP/1.1 200 OK Cache-control: no-cache Content-Length: 1354 Content-type: text/html

SERVER's HTTP-agent

. . . Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

Overall scheme (continued)

. . .

CLIENT HTTP-agent HTML-viewer

Thank you . . .

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

Secure HTTP (HTTPS)

HTTPS has the same functionality as HTTP, but uses encryption of data transferred to/from client and server (RFC 2660).

HTTPS uses 443 TCP port as default.

When connection to the secure port is established, the following happens automatically: • The client authenticates the server using the server's digital certificate • The client and server negotiate which cipher suite (set of security protocols) and generate session keys for encrypting and decrypting data.

• The client and server establish a secure encrypted connection.

HTTPS has its own headers in HTTPS request/response and may, for example, encapsulate HTTP request/response (next slide).

Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES

Secure HTTP (HTTPS)

An appropriate HTTP server response would be:

HTTP/1.0 200 OK Security-Scheme: S-HTTP/1.4

Content-Type: text/html Congratulations, you've won. Click here to claim your prize

This HTTP response, encapsulated as an S-HTTP message becomes:

Secure * Secure-HTTP/1.4

Content-Type: message/http Prearranged-Key-Info: des-ecb,697fa820df8a6e53,inband:1 Content-Privacy-Domain: CMS MIAGCSqGSIb3DQEHBqCAMIACAQAwgAYJKoZIhvcNAQcBMBEGBSsOAwIHBAifqtdy x6uIMYCCARgvFzJtOZBn773DtmXlx037ck3giqnV0WC0QAx5f+fesAiGaxMqWcir r9XvT0nT0LgSQ/8tiLCDBEKdyCNgdcJAduy3D0r2sb5sNTT0TyL9uydG3w55vTnW aPbCPCWLudArI1UHDZbnoJICrVehxG/sYX069M8v6VO8PsJS7//hh1yM+0nekzQ5 l1p0j7uWKu4W0csrlGqhLvEJanj6dQAGSTNCOoH3jzEXGQXntgesk8poFPfHdtj0 5RH4MuJRajDmoEjlrNcnGl/BdHAd2JaCo6uZWGcnGAgVJ/TVfSVSwN5nlCK87tXl nL7DJwaPRYwxb3mnPKNq7ATiJPf5u162MbwxrddmiE7e3sST7naSN+GS0ateY5X7 AAAAAAAAAAA= Petrozavodsk State University, Alex Moschevikin, 2003 NET TECHNOLOGIES