WEB - Gadjah Mada University

Download Report

Transcript WEB - Gadjah Mada University

HTTP WEB Risanuri Hidayat, Ir., M.Sc.

World Wide Web

T. Berners Lee, R. Fielding, H. Frystyk: “Hypertext Transfer Protocol HTTP/1.0”, RFC 1945, 1996.

     Naming scheme for resources URL, URN, URI Multimedia documents MIME encoding (RFC) Transfer protocol HTTP/1.0, HTTP/1.1

Implemented over TCP/IP Integrated with Internet infrastructure DNS, SMTP

Sejarah

Hypertext systems:  no network access protocol Gopher, WAIS  no hyperlinks WWW @ CERN (Tim Berners-Lee, 1990) HTTP/0.9 (1992)

Aplikasi Internet

Application Application layer protocol

e-mail remote terminal access Web file transfer streaming multimedia remote file server Internet telephony smtp [RFC 821] telnet [RFC 854] http [RFC 2068] ftp [RFC 959] proprietary (e.g. RealNetworks) NSF proprietary (e.g., Vocaltec)

Underlying transport protocol

TCP TCP TCP TCP TCP or UDP TCP or UDP typically UDP

What is HTTP

HTTP stands for

Hypertext Transfer Protocol

other data (collectively called other possibilities).

resources

. It's the network protocol used to deliver virtually all files and ) on the World Wide Web, whether they're HTML files, image files, query results, or anything else. Usually, HTTP takes place through TCP/IP sockets (and this tutorial ignores A browser is an

HTTP client

because it sends requests to an

HTTP server

(Web server), which then sends responses back to the client. The standard (and default) port for HTTP servers to listen on is 80, though they can use any port.

HTTP is used to transmit identified by a URL

resources

, not just files. A resource is some chunk of information that can be

HTTP

URL or pathname HTTP version headers message body method

GET //www.dcs.qmw.ac.uk/index.html HTTP/ 1.1

HTTP version

HTTP/1.1

status code reason

200 OK •Resource := MIME-encoded data •Content negotiation •Authentication

headers message body

resource data Methods: •GET, HEAD, POST •PUT, DELETE, TRACE, OPTIONS, CONNECT

URL

URL http://www.cdk3.net:8888/WebExamples/earth.html

DNS lookup Resource ID (IP number, port number, pathname) 55.55.55.55

8888 WebExamples/earth.html

Web server Network address 2:60:8c:2:b0:5a file Socket

HTTP Transactions

HTTP uses the client-server model:   An

HTTP client

opens a connection and sends a

request message

to an

HTTP server

; the server then returns a

response message

, usually containing the resource that was requested. After delivering the response, the server closes the connection (making HTTP a

stateless

protocol, i.e. not maintaining any connection information between transactions).

HTTP Protocol

http: hypertext transfer protocol WWW’s application layer protocol client/server model  

client:

browser that requests, receives, “displays” WWW objects

server:

WWW server sends objects in response to requests http1.0: RFC 1945 http1.1: RFC 2068 PC running Explorer SUN running Netscape Navigator Server running Apache Web server

HTTP Protocol

http: TCP transport service: client initiates TCP connection (creates socket) to server, port 80 server accepts TCP connection from client http messages (application-layer protocol messages) exchanged between browser (http client) and WWW server (http server) TCP connection closed http is “stateless” server maintains no information about past client requests Protocols that maintain “state” are complex!

past history (state) must be maintained if server/client crashes, their views of “state” may be inconsistent, must be reconciled

HTTP Protocol

The format of the request and response messages are similar, and English oriented. Both kinds of messages consist of:  an initial line,  zero or more header lines,  a blank line (i.e. a CRLF by itself), and  an optional message body (e.g. a file, or query data, or query output).

Request

Initial Request Line

 A request line has three parts, separated by spaces: a

method

name, the

local path

of the requested resource, and the

version of HTTP

being used.  A typical request line is: GET /path/to/file/index.html HTTP/1.0

  

GET

is the most common HTTP method; it says "give me this resource". Other methods include

POST

and

HEAD

-- more on those later . Method names are always uppercase. The path is the part of the URL after the host name, also called the

request URI

(a URI is like a URL, but more general). The HTTP version always takes the form "

HTTP/x.x

", uppercase

HTTP Request Header Format

Two types of messages:

request

,

response

http request message:  ASCII (human-readable format) request line (GET, POST, HEAD commands) header lines

GET /somedir/page.html HTTP/1.1 Connection: close User-agent: Mozilla/4.0 Accept: text/html, image/gif,image/jpeg Accept-language:en

Carriage return, line feed indicates end of message (extra carriage return, line feed)

HTTP Request Header Format

Response/Reply

Initial Response Line (Status Line).

The initial response line, called the

status line

, also has three parts separated by spaces:    the HTTP version, a

response status code

that gives the result of the request, and an English

reason phrase

describing the status code. Typical status lines are:   HTTP/1.0 200 OK or HTTP/1.0 404 Not Found Notes:

HTTP Reply Header Format

status line (protocol status code status phrase) header lines

HTTP/1.1 200 OK Connection: close Date: Thu, 06 Aug 1998 12:00:15 GMT Server: Apache/1.3.0 (Unix) Last-Modified: Mon, 22 Jun 1998 …... Content-Length: 6821 Content-Type: text/html data data data data data ...

data, e.g., requested html file

HTTP Reply Status Code

200 OK

 request succeeded, requested object later in this message

301 Moved Permanently

 requested object moved, new location specified later in this message (Location:)

400 Bad Request

 request message not understood by server

404 Not Found

 requested document not found on this server

505 HTTP Version Not Supported

Sample HTTP Exchange

To retrieve the file at the URL http://www.somehost.com/path/file.html first open a socket to the host the socket:

www.somehost.com

, port 80 (use the default port of 80 because none is specified in the URL). Then, send something like the following through GET /path/file.html HTTP/1.0 From: [email protected] User-Agent: HTTPTool/1.0 [blank line here]

Sample HTTP Exchange

The server should respond with something like the following, sent back through the same socket: HTTP/1.0 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/html Content-Length: 1354

Happy New Millennium!

(more file contents) . . . After sending the response, the server closes the socket.

User-server interaction: authentication

Authentication goal: access to server documents control stateless: client must present authorization in each request authorization: typically name, password 

authorization:

line in request header  if no authorization presented, server refuses access, sends

a WWW authenticate:

header line in response client server usual http request msg 401: authorization req.

WWW authenticate:

usual http request msg +

Authorization:line

usual http response msg usual http request msg +

Authorization:line

usual http response msg time

User-server interaction: cookies

Server sends “cookie” to client in response

Set-cookie: #

Client present cookie in later requests

cookie: #

Server matches presented-cookie with server-stored cookies   authentication remembering user preferences, previous choices client usual http request msg usual http response +

Set-cookie: #

usual http request msg

cookie: #

usual http response msg usual http request msg

cookie: #

usual http response msg server cookie spectific action cookie spectific action

User-server interaction: conditional GET

Goal: client has up-to-date stored (cached) version client: specify date of cached copy in http request

If-modified-since:

server: response contains no object if cached copy up-to-date:

HTTP/1.0 304 Not Modified

http request msg

If-modified-since:

http response

HTTP/1.0 304 Not Modified

server object not modified http request msg

If-modified-since:

http response

HTTP/1.1 200 OK …

object modified

Message format: multimedia extensions

MIME: multimedia mail extension, RFC 2045, 2056 additional lines in msg header declare MIME content type MIME version method used to encode data multimedia data type, subtype, parameter declaration encoded data

From: [email protected] To: [email protected] Subject: Picture of yummy crepe. MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Type: image/jpeg base64 encoded data ..... ......................... ......base64 encoded data .

MIME types

Text example subtypes:

plain, html

Video example subtypes:

mpeg, quicktime

Image example subtypes:

jpeg, gif

Audio exampe subtypes:

basic

(8-bit mu-law encoded),

32kadpcm (32 kbps coding)

Application other data that must be processed by reader before “viewable” example subtypes:

msword, octet stream

HTTP Headers (samples)

User-Agent  Mozilla/4.0

Accepts: (client-side) Mean #bytes per header: 300 (requests), 160 (responses) * Require parsing !

 text/html, image/* Content-type: (server-side)  text/html Expires, Last-Modified, If-Modified-Since   absolute time stamps (1-sec resolution) Eg: Thu, 03 Jun 1999 20:16:34 GMT= Accept-Language, Accept-Charset Content-encoding

HTTP/1.1 Improvements

B/W optimization  persistent connections  pipelining does not block waiting for previous responses end-of-message mechanism  Content-range access only specified “range” of a resource Explicit cache control (Cache-control) Digest authentication (Content-MD5)

Web Caches (proxy server)

Goal: satisfy client request without involving origin server User sets browser: WWW accesses via web cache client sends all http requests to web cache   if object at web cache, web cache immediately returns object in http response else requests object from origin server, then returns http response to client client client Proxy server origin server origin server

Why WWW Caching?

Assume: cache is “close” to client (e.g., in same network) smaller response time: cache “closer” to client decrease traffic to distant servers  link out of institutional/local ISP network often bottleneck public Internet institutional network origin servers 1.5 Mbps access link 10 Mbps LAN institutional cache

Web caching (in)effectiveness

Observed hit ratios below 50%  even lower byte-weighted ratios !

Possible remedies ?

 Prefetching  Delta-encoding  HTML macros  Duplicate suppression (digest-based)

HTTP status & perspective

J. C. Mogul, “What’s wrong with HTTP (and why it doesn’t matter)”, Proc. USENIX Technical Conference, 1999  Definitely not optimal  Probably adequate It works well enough It’s not the only game in town  Two-way initiation of operations   Real-time Deferred delivery Revising it again would be too hard  HTTP/1.0 -> HTTP/1.1 evolution took 4+ years !