Transcript Document
HTTP
Hypertext Transport Protocol
Hypertext Transfer Protocol
(HTTP)
A communications protocol
Used to transfer or convey information on the World
Wide Web
Original purpose was to provide a way to publish and
retrieve HTML hypertext pages
Development of HTTP was coordinated by
W3C (World Wide Web Consortium)
IETF (Internet Engineering Task Force)
Culminating in the publication of a series of RFCs
Most notably RFC 2616 (June 1999)
Defines HTTP/1.1, the version of HTTP in common
use today
Hypertext Transfer Protocol
(HTTP)
HTTP is a request/response protocol between
clients and servers
Client makes an HTTP request
Referred to as the user agent
A web browser, spider, or other end-user tool
Server responds
Called the origin server
Stores or creates resources such as HTML files and
images
In between the user agent and origin server may
be several intermediaries
proxies, gateways, tunnels, etc.
Hypertext Transfer Protocol
(HTTP)
HTTP does not need to use TCP/IP or its
supporting layers
HTTP:
Can be implemented on top of any other protocol
on the Internet, or on other networks
Only presumes a reliable transport
Any protocol that provides such guarantees can be
used
Hypertext Transfer Protocol
(HTTP)
An HTTP client initiates a request by establishing a
Transmission Control Protocol (TCP) connection to a
particular port on a host
Port 80 by default
An HTTP server listening on that port waits for the client
to send a request message
Upon receiving the request, the server sends back
A status line
E.g. "HTTP/1.1 200 OK“
A message of its own
Body of which is perhaps the requested file, an error
message, or some other information
Hypertext Transfer Protocol
(HTTP)
Resources to be accessed by HTTP are
identified using
Uniform Resource Identifiers (URIs)
Or, more specifically, URLs
Using the http: or https URI schemes
Request message
The request message consists of the
following:
Request line
E.g. GET /images/logo.gif HTTP/1.1
Requests the file logo.gif from the /images directory
Headers
E.g. Accept-Language: en
An empty line
An optional message body
Request message
The request line and headers must all end
with CRLF
A carriage return followed by a line feed
ASCII Code 13 followed by an ASCII Code 10
An empty line must consist of only CRLF and
no other whitespace
In the HTTP/1.1 protocol, all headers except
Host are optional
HTTP Methods
HTTP defines eight methods
Indicates
the desired action to be
performed on the identified resource
Sometimes referred to as "verbs"
Request Methods
HEAD
Asks for the response identical to the one that would
correspond to a GET request, but without the response body
Useful for retrieving meta-information written in response
headers, without having to transport the entire content
GET
Requests a representation of the specified resource
By far the most common method used on the Web today
Should not be used for operations that cause side-effects
Using it for actions in web applications is a common misuse
See 'safe methods' below
POST
Submits data to be processed (e.g. from an HTML form) to the
identified resource
The data is included in the body of the request
May result in the creation of a new resource or the updates of
existing resources or both
Request methods
PUT
Uploads a representation of the specified resource
DELETE
Deletes the specified resource
TRACE
Echoes back the received request
so a client can see what intermediate servers are adding
or changing in the request
OPTIONS
Returns the HTTP methods that the server supports
Can be used to check the functionality of a web server
CONNECT
Converts the request connection to a transparent TCP/IP
tunnel
Usually to facilitate SSL-encrypted communication
(HTTPS) through an unencrypted HTTP proxy
Request methods
HTTP servers are supposed to
implement at least:
GET and HEAD methods
OPTIONS method
Whenever possible
Request methods
Safe methods
Some methods (e.g. HEAD or GET) are defined as safe, which means
they are intended only for information retrieval and should not change the
state of the server
In other words, they should not have side effects
Unsafe methods (such as POST, PUT and DELETE) should be displayed
to the user in a special way
Typically as buttons rather than links
Make the user aware of possible obligations
Such as a button that causes a financial transaction
Despite the required safety of GET requests they can cause changes on
the server
For example, a Web server may use the retrieval through a simple
hyperlink to initiate deletion of a domain database record, thus causing a
change of the server's state as a side-effect of a GET request
This is discouraged, because it can cause problems for Web caching, search
engines and other automated agents, which can make unintended changes on
the server
Another case is that a GET request may cause the server to create a
cache space
Request methods
Idempotent methods and Web Applications
Methods GET, HEAD, PUT and DELETE are defined to be idempotent
The RFC allows a user-agent, such as a browser to assume that any
idempotent request can be retried without informing the user
This is done to improve the user experience when connecting to
unresponsive or heavily-loaded web servers
However, note that the idempotence is not assured by the protocol or
web server
Multiple identical requests should have the same effect as a single request
Methods OPTIONS and TRACE, being safe, are inherently idempotent
It is perfectly possible to write a web application in which (eg) a database
insert or update is triggered by a GET request - this would be a very normal
example of what the spec refers to as "a change in server state"
This misuse of GET can combine with the retry behavior above to
produce erroneous transactions and used, as intended, for document
retrieval only
For this reason GET should be avoided for anything transactional
HTTP versions
HTTP has evolved into multiple, mostly
backwards-compatible protocol versions.
RFC 2145 describes the use of HTTP version
numbers
The client tells in the beginning of the request the
version it uses, and the server uses the same or
earlier version in the response
HTTP versions
0.9
HTTP/1.0 (May 1996)
Deprecated
Supports only one command, GET — which does not specify the HTTP version
Does not support headers
Since this version does not support POST, the client can't pass much information to the
server
This is the first protocol revision to specify its version in communications
Still in wide use, especially by proxy servers
HTTP/1.1 (June 1999)
Current version; persistent connections enabled by default and works well with proxies.
Supports request pipelining
Allows multiple requests to be sent at the same time
Allows the server to prepare for the workload and potentially transfer the requested
resources more quickly to the client
HTTP/1.2
The initial 1995 working drafts were prepared by the W3C and submitted to the IETF
an Extension Mechanism for HTTP
proposed the Protocol Extension Protocol, abbreviated PEP
PEP was originally intended to become a distinguishing feature of HTTP/1.2
In later PEP working drafts, however, the reference to HTTP/1.2 was removed
The experimental RFC 2774, HTTP Extension Framework, largely subsumed PEP.
It was published in February 2000
Status codes
In HTTP/1.0 and since, the first line of the HTTP
response is called the status line
Includes a
The way the user agent handles the response
primarily depends on
1.
2.
Numeric status code (such as "404")
Textual reason phrase (such as "Not Found").
the code
the response headers
Custom status codes can be used
If the user agent encounters a code it does not
recognize, it can use the first digit of the code to
determine the general class of the response.
Status codes
Standard reason phrases are only recommendations
Can be replaced with "local equivalents" at the web
developer's discretion
If the status code indicated a problem
The user agent might display the reason phrase to the user
to provide further information about the nature of the problem
The standard also allows the user agent to attempt to
interpret the reason phrase
This might be unwise since the standard explicitly specifies
that
Status codes are machine-readable
Reason phrases are human-readable.
Status Codes
1xx Informational
2xx Success
3xx Redirection
4xx Client Error
5xx Server Error
1xx Informational
Request received, continuing process.
This class of status code indicates a
provisional response
Consists only of the Status-Line and optional
headers
Terminated by an empty line
Since HTTP/1.0 did not define any 1xx status
codes, servers MUST NOT send a 1xx
response to an HTTP/1.0 client except under
experimental conditions
1xx Informational
100 Continue
The server has received the request headers
The client should proceed to send the request body
in the case of a request for which a body needs to be sent
for example, a POST request
If the request body is large, sending it to a server when a
request has already been rejected based upon inappropriate
headers is inefficient
To have a server check if the request could be accepted based
on the request's headers alone, a client must send
Expect: 100-continue as a header in its initial request
see RFC 2616 §14.20: Expect header)
Check if a 100 Continue status code is received in response before
Continuing
or
receive 417 Expectation Failed and not continue
101 Switching Protocols
102 Processing (WebDAV)
2xx Success
The action was successfully received,
understood, and accepted
This class of status code indicates that the
client's request was successfully received,
understood, and accepted
2xx Success
200 OK
Standard response for successful HTTP requests.
201 Created
Request has been fulfilled and resulted in a new resource being create
202 Accepted
Request has been accepted for processing
Request might or might not eventually be acted upon
The processing has not been completed
It might be disallowed when processing actually takes place
203 Non-Authoritative Information (since HTTP/1.1)
204 No Content
205 Reset Content
206 Partial Content
Notice that a file has been partially downloaded.
Used by tools like wget to enable resuming of interrupted downloads, or split a
download into multiple simultaneous streams.
207 Multi-Status (WebDAV)
The message body that follows is an XML message and can contain a number of
separate response codes, depending on how many sub-requests were made.
3xx Redirection
The client must take additional action to complete the
request
This class of status code indicates that further action
needs to be taken by the user agent in order to fulfill
the request
The action required MAY be carried out by the user
agent without interaction with the user if and only if the
method used in the second request is GET or HEAD
A user agent SHOULD NOT automatically redirect a
request more than 5 times, since such redirections
usually indicate an infinite loop
3xx Redirection
300 Multiple Choices
Indicates multiple options for the URI that the client may follow.
Canbe used to present different format options for video, list files with different extensions, or word sense disambiguation.
301 Moved Permanently
302 Found
The response to the request can be found under another URI using a GET method.
304 Not Modified
Most popular redirect code, but also an example of industrial practice contradicting the standard.
HTTP/1.0 specification (RFC 1945) required the client to perform a temporary redirect (the original describing phrase was
"Moved Temporarily"), but popular browsers implemented it as a 303 See Other.
Therefore, HTTP/1.1 added status codes 303 and 307 to disambiguate between the two behaviors.
However, the majority of Web applications and frameworks still use the 302 status code as if it were the 303.
303 See Other (since HTTP/1.1)
This and all future requests should be directed to the given URI.
Indicates the request URL has not been modified since last requested.
Typically, the HTTP client provides a header like the If-Modified-Since header to provide a time with which to compare
Utilizing this saves bandwidth and reprocessing on both the server and client.
305 Use Proxy (since HTTP/1.1)
Many HTTP clients (such as Mozilla [1] and Internet Explorer) don't correctly handle responses with this status code,
primarily for security reasons
306 Switch Prox
307 Temporary Redirect (since HTTP/1.1)
No longer used.
In this occasion, the request should be repeated with another URI, but future requests can still be directed to the original
URI.
In contrast to 303, the request method should not be changed when reissuing the original request.
For instance, a POST request must be repeated using another POST request
4xx Client Error
The request contains bad syntax or cannot be fulfilled
The 4xx class of status code is intended for cases in which
the client seems to have erred
Except when responding to a HEAD request, the server
SHOULD include an entity containing an explanation of
the error situation, and whether it is a temporary or
permanent condition
These status codes are applicable to any request method
User agents SHOULD display any included entity to the
user
4xx Client Error
400 Bad Request
401 Unauthorized
Original intention was that this code might be used as part of some form of digital cash or
micropayment scheme
Has not happened, and this code has never been used
403 Forbidden
Similar to 403 Forbidden, specifically for use when authentication is possible but has
failed or not yet been provided
402 Payment Required
The request contains bad syntax or cannot be fulfilled.
Request was a legal request, but the server is refusing to respond to it
Unlike a 401 Unauthorized response, authenticating will make no difference
404 Not Found
405 Method Not Allowed
Request made to a URL using a request method not supported by that URL
Using GET on a form which requires data to be presented via POST
Using PUT on a read-only resource
406 Not Acceptable
407 Proxy Authentication Required
408 Request Timeout
409 Conflict
4xx Client Error
410 Gone
Indicates that the resource requested is no longer
available and will not be available again
Should be used when a resource has been
intentionally removed
In practice, a 404 Not Found is often issued instead
411 Length Required
412 Precondition Failed
413 Request Entity Too Large
414 Request-URI Too Long
415 Unsupported Media Type
416 Requested Range Not Satisfiable
417 Expectation Failed
4xx Client Error
422 Unprocessable Entity (WebDAV)
Request was well-formed but was unable to be followed due to
semantic errors
423 Locked (WebDAV)
The resource that is being accessed is locked
424 Failed Dependency (WebDAV)
The request failed due to failure of a previous request (e.g. a
PROPPATCH).
425 Unordered Collection
Defined in drafts of WebDav Advanced Collections
Not present in "Web Distributed Authoring and Versioning (WebDAV)
Ordered Collections Protocol"
426 Upgrade Required
The client should switch to TLS/1.0.
449 Retry With
A Microsoft extension: The request should be retried after doing the
appropriate action.
5xx Server Error
The server failed to fulfill an apparently valid request
Response status codes beginning with the digit "5" indicate
cases in which the server is aware that it has erred or is
incapable of performing the request
Except when responding to a HEAD request, the server
SHOULD include an entity containing an explanation of
the error situation, and whether it is a temporary or
permanent condition
User agents SHOULD display any included entity to the
user
These response codes are applicable to any request method
5xx Server Error
500 Internal Server Error
501 Not Implemented
502 Bad Gateway
503 Service Temporarily Unavailable
504 Gateway Timeout
505 HTTP Version Not Supported
506 Variant Also Negotiates
507 Insufficient Storage (WebDAV)
509 Bandwidth Limit Exceeded
Not an official HTTP status code
Still used by many servers
510 Not Extended (RFC 2774)
Persistent connections
In HTTP/0.9 and 1.0, the connection is closed after a single
request/response pair.
In HTTP/1.1 a keep-alive-mechanism was introduced, where a
connection could be reused for more than one request.
Such persistent connections reduce lag perceptibly, because
the client does not need to re-negotiate the TCP connection
after the first request has been sent.
Version 1.1 of the protocol also introduced:
Chunked transfer encoding to allow content on persistent
connections to be streamed, rather than buffered
HTTP pipelining, which allows clients to send some types of
requests before the previous response has been received,
further reducing lag
Main
article: HTTP persistent connections
HTTP session state
HTTP can occasionally pose problems for Web developers and
applications since HTTP is stateless
The advantage of a stateless protocol is that hosts do not
need to retain information about users between requests
This forces the use of alternative methods for maintaining
users' state
E.g. when a host would like to customize content for a user who
has visited before
One common method for solving this problem involves the
use of sending and requesting cookies
Other methods include
Server side sessions
Hidden variables
When current page is a form
URL encoded parameters
Such as /index.php?userid=3
Secure HTTP
There are currently two methods of
establishing a secure HTTP connection:
The https URI scheme
The HTTP 1.1 Upgrade header
Introduced by RFC 2817
Browser support for the Upgrade header is
nearly non-existent
The https URI scheme is still the dominant
method of establishing a secure HTTP
connection
Secure HTTP
https URI scheme
A URI scheme syntactically identical to the http:
scheme used for normal HTTP connections
Signals the browser to use an added encryption
layer of SSL/TLS to protect the traffic
SSL – Secure Sockets Layer
TLS – Transport Layer Security
Main
SSL is especially suited for HTTP since it can
provide some protection even if only one side of
the communication is authenticated
In the case of HTTP transactions over the
Internet, typically, only the server side is
authenticated
article: https
Secure HTTP
HTTP 1.1 Upgrade header
HTTP 1.1 introduced support for the Upgrade header.
In the exchange
The client begins by making a clear-text request, which is later
upgraded to TLS
Either the client or the server may request (or demand) that the
connection be upgraded
The most common usage is a clear-text request by the client followed
by a server demand to upgrade the connection, which looks like this:
Client:
GET /encrypted-area HTTP/1.1
Host: www.example.com
Server:
HTTP/1.1 426 Upgrade Required
Upgrade: TLS/1.0, HTTP/1.1
Connection: Upgrade
The server returns a 426 status-code because 400 level codes
indicate a client failure
Correctly alerts legacy clients that the failure was client-related
Secure HTTP
Benefits of using this method for establishing a secure
connection are:
Removes messy and problematic redirection and URL
rewriting on the server side
Allows virtual hosting (single IP, multiple domain-names)
of secured websites
Reduces user confusion by providing a single way to
access a particular resource
A weakness with this method is:
Requirement for secure HTTP cannot be specified in the
URI
In practice, the (untrusted) server will thus be
responsible for enabling secure HTTP, not the (trusted)
client
Sample
Following is a sample conversation between
an HTTP client and an HTTP server running
on www.example.com, port 80
Sample Client Request
GET /index.html HTTP/1.1
Host: www.example.com
Client request
Followed by a blank line
Request ends with a double newline
In the form of a carriage return followed by a line
feed
The "Host" header
Distinguishes between various DNS names
sharing a single IP address
Allows name-based virtual hosting.
Optional in HTTP/1.0, mandatory in HTTP/1.1
Sample Server Response
HTTP/1.1 200 OK
Date: Mon, 23 May 2005 22:38:34 GMT
Server: Apache/1.3.27 (Unix) (Red-Hat/Linux)
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
Etag: "3f80f-1b6-3e1cb03b"
Accept-Ranges: bytes
Content-Length: 438
Connection: close
Content-Type: text/html; charset=UTF-8
Server response
ETag (entity tag) header is used to determine if the URL cached is identical to the requested URL on
the server.
Content-Type specifies the Internet media type of the data conveyed by the http message
Content-Length indicates its length in bytes.
The webserver publishes its ability to respond to requests for certain byte ranges of the document by
setting the header
Accept-Ranges: bytes
Followed by a blank line and text of the requested page
This is useful if the connection was interrupted before the data was completely transferred to the client
Connection: close
It is stated, that the webserver will close the TCP connection immediately after the transfer of this
package.