Transcript WEB - Gadjah Mada University
HTTP WEB Risanuri Hidayat, Ir., M.Sc.
World Wide Web
T. Berners Lee, R. Fielding, H. Frystyk: “Hypertext Transfer Protocol HTTP/1.0”, RFC 1945, 1996.
Naming scheme for resources URL, URN, URI Multimedia documents MIME encoding (RFC) Transfer protocol HTTP/1.0, HTTP/1.1
Implemented over TCP/IP Integrated with Internet infrastructure DNS, SMTP
Sejarah
Hypertext systems: no network access protocol Gopher, WAIS no hyperlinks WWW @ CERN (Tim Berners-Lee, 1990) HTTP/0.9 (1992)
Aplikasi Internet
Application Application layer protocol
e-mail remote terminal access Web file transfer streaming multimedia remote file server Internet telephony smtp [RFC 821] telnet [RFC 854] http [RFC 2068] ftp [RFC 959] proprietary (e.g. RealNetworks) NSF proprietary (e.g., Vocaltec)
Underlying transport protocol
TCP TCP TCP TCP TCP or UDP TCP or UDP typically UDP
What is HTTP
HTTP stands for
Hypertext Transfer Protocol
other data (collectively called other possibilities).
resources
. It's the network protocol used to deliver virtually all files and ) on the World Wide Web, whether they're HTML files, image files, query results, or anything else. Usually, HTTP takes place through TCP/IP sockets (and this tutorial ignores A browser is an
HTTP client
because it sends requests to an
HTTP server
(Web server), which then sends responses back to the client. The standard (and default) port for HTTP servers to listen on is 80, though they can use any port.
HTTP is used to transmit identified by a URL
resources
, not just files. A resource is some chunk of information that can be
HTTP
URL or pathname HTTP version headers message body method
GET //www.dcs.qmw.ac.uk/index.html HTTP/ 1.1
HTTP version
HTTP/1.1
status code reason
200 OK •Resource := MIME-encoded data •Content negotiation •Authentication
headers message body
resource data Methods: •GET, HEAD, POST •PUT, DELETE, TRACE, OPTIONS, CONNECT
URL
URL http://www.cdk3.net:8888/WebExamples/earth.html
DNS lookup Resource ID (IP number, port number, pathname) 55.55.55.55
8888 WebExamples/earth.html
Web server Network address 2:60:8c:2:b0:5a file Socket
HTTP Transactions
HTTP uses the client-server model: An
HTTP client
opens a connection and sends a
request message
to an
HTTP server
; the server then returns a
response message
, usually containing the resource that was requested. After delivering the response, the server closes the connection (making HTTP a
stateless
protocol, i.e. not maintaining any connection information between transactions).
HTTP Protocol
http: hypertext transfer protocol WWW’s application layer protocol client/server model
client:
browser that requests, receives, “displays” WWW objects
server:
WWW server sends objects in response to requests http1.0: RFC 1945 http1.1: RFC 2068 PC running Explorer SUN running Netscape Navigator Server running Apache Web server
HTTP Protocol
http: TCP transport service: client initiates TCP connection (creates socket) to server, port 80 server accepts TCP connection from client http messages (application-layer protocol messages) exchanged between browser (http client) and WWW server (http server) TCP connection closed http is “stateless” server maintains no information about past client requests Protocols that maintain “state” are complex!
past history (state) must be maintained if server/client crashes, their views of “state” may be inconsistent, must be reconciled
HTTP Protocol
The format of the request and response messages are similar, and English oriented. Both kinds of messages consist of: an initial line, zero or more header lines, a blank line (i.e. a CRLF by itself), and an optional message body (e.g. a file, or query data, or query output).
Request
Initial Request Line
A request line has three parts, separated by spaces: a
method
name, the
local path
of the requested resource, and the
version of HTTP
being used. A typical request line is: GET /path/to/file/index.html HTTP/1.0
GET
is the most common HTTP method; it says "give me this resource". Other methods include
POST
and
HEAD
-- more on those later . Method names are always uppercase. The path is the part of the URL after the host name, also called the
request URI
(a URI is like a URL, but more general). The HTTP version always takes the form "
HTTP/x.x
", uppercase
HTTP Request Header Format
Two types of messages:
request
,
response
http request message: ASCII (human-readable format) request line (GET, POST, HEAD commands) header lines
GET /somedir/page.html HTTP/1.1 Connection: close User-agent: Mozilla/4.0 Accept: text/html, image/gif,image/jpeg Accept-language:en
Carriage return, line feed indicates end of message (extra carriage return, line feed)
HTTP Request Header Format
Response/Reply
Initial Response Line (Status Line).
The initial response line, called the
status line
, also has three parts separated by spaces: the HTTP version, a
response status code
that gives the result of the request, and an English
reason phrase
describing the status code. Typical status lines are: HTTP/1.0 200 OK or HTTP/1.0 404 Not Found Notes:
HTTP Reply Header Format
status line (protocol status code status phrase) header lines
HTTP/1.1 200 OK Connection: close Date: Thu, 06 Aug 1998 12:00:15 GMT Server: Apache/1.3.0 (Unix) Last-Modified: Mon, 22 Jun 1998 …... Content-Length: 6821 Content-Type: text/html data data data data data ...
data, e.g., requested html file
HTTP Reply Status Code
200 OK
request succeeded, requested object later in this message
301 Moved Permanently
requested object moved, new location specified later in this message (Location:)
400 Bad Request
request message not understood by server
404 Not Found
requested document not found on this server
505 HTTP Version Not Supported
Sample HTTP Exchange
To retrieve the file at the URL http://www.somehost.com/path/file.html first open a socket to the host the socket:
www.somehost.com
, port 80 (use the default port of 80 because none is specified in the URL). Then, send something like the following through GET /path/file.html HTTP/1.0 From: [email protected] User-Agent: HTTPTool/1.0 [blank line here]
Sample HTTP Exchange
The server should respond with something like the following, sent back through the same socket: HTTP/1.0 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/html Content-Length: 1354
Happy New Millennium!
(more file contents) . . . After sending the response, the server closes the socket.User-server interaction: authentication
Authentication goal: access to server documents control stateless: client must present authorization in each request authorization: typically name, password
authorization:
line in request header if no authorization presented, server refuses access, sends
a WWW authenticate:
header line in response client server usual http request msg 401: authorization req.
WWW authenticate:
usual http request msg +
Authorization:line
usual http response msg usual http request msg +
Authorization:line
usual http response msg time
User-server interaction: cookies
Server sends “cookie” to client in response
Set-cookie: #
Client present cookie in later requests
cookie: #
Server matches presented-cookie with server-stored cookies authentication remembering user preferences, previous choices client usual http request msg usual http response +
Set-cookie: #
usual http request msg
cookie: #
usual http response msg usual http request msg
cookie: #
usual http response msg server cookie spectific action cookie spectific action
User-server interaction: conditional GET
Goal: client has up-to-date stored (cached) version client: specify date of cached copy in http request
If-modified-since:
server: response contains no object if cached copy up-to-date:
HTTP/1.0 304 Not Modified
http request msg
If-modified-since:
http response
HTTP/1.0 304 Not Modified
server object not modified http request msg
If-modified-since:
http response
HTTP/1.1 200 OK …
object modified
Message format: multimedia extensions
MIME: multimedia mail extension, RFC 2045, 2056 additional lines in msg header declare MIME content type MIME version method used to encode data multimedia data type, subtype, parameter declaration encoded data
From: [email protected] To: [email protected] Subject: Picture of yummy crepe. MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Type: image/jpeg base64 encoded data ..... ......................... ......base64 encoded data .
MIME types
Text example subtypes:
plain, html
Video example subtypes:
mpeg, quicktime
Image example subtypes:
jpeg, gif
Audio exampe subtypes:
basic
(8-bit mu-law encoded),
32kadpcm (32 kbps coding)
Application other data that must be processed by reader before “viewable” example subtypes:
msword, octet stream
HTTP Headers (samples)
User-Agent Mozilla/4.0
Accepts: (client-side) Mean #bytes per header: 300 (requests), 160 (responses) * Require parsing !
text/html, image/* Content-type: (server-side) text/html Expires, Last-Modified, If-Modified-Since absolute time stamps (1-sec resolution) Eg: Thu, 03 Jun 1999 20:16:34 GMT= Accept-Language, Accept-Charset Content-encoding
HTTP/1.1 Improvements
B/W optimization persistent connections pipelining does not block waiting for previous responses end-of-message mechanism Content-range access only specified “range” of a resource Explicit cache control (Cache-control) Digest authentication (Content-MD5)
Web Caches (proxy server)
Goal: satisfy client request without involving origin server User sets browser: WWW accesses via web cache client sends all http requests to web cache if object at web cache, web cache immediately returns object in http response else requests object from origin server, then returns http response to client client client Proxy server origin server origin server
Why WWW Caching?
Assume: cache is “close” to client (e.g., in same network) smaller response time: cache “closer” to client decrease traffic to distant servers link out of institutional/local ISP network often bottleneck public Internet institutional network origin servers 1.5 Mbps access link 10 Mbps LAN institutional cache
Web caching (in)effectiveness
Observed hit ratios below 50% even lower byte-weighted ratios !
Possible remedies ?
Prefetching Delta-encoding HTML macros Duplicate suppression (digest-based)
HTTP status & perspective
J. C. Mogul, “What’s wrong with HTTP (and why it doesn’t matter)”, Proc. USENIX Technical Conference, 1999 Definitely not optimal Probably adequate It works well enough It’s not the only game in town Two-way initiation of operations Real-time Deferred delivery Revising it again would be too hard HTTP/1.0 -> HTTP/1.1 evolution took 4+ years !