Bandwidth Bottlenecks

Download Report

Transcript Bandwidth Bottlenecks

HTTP messages
Entities and Encoding
Herng-Yow Chen
1
Outline



The format and behavior of HTTP
message entities as HTTP containers
How HTTP describes the size of entity
bodies, and what HTTP requires in the
way of sizing
The entity headers used to describe the
format, alphabet, and language of content,
so clients can process it properly
2



Reversible content encoding transforms
data format to take up less space or be
more secure
Transfer encoding modifies how HTTP
ships data to enhance the communication
of some kinds of data
Chunked encoding chops data into
multiple pieces to deliver content of
unknown length safely
3



The assortment of tags, labels, times, and
checksums help clients get the latest
version of requested content
Ranges are useful for continuing aborted
downloads where they left off
Delta encoding extensions allow client to
request just those parts of a web page
that actually have changed since a
previously viewed revision
4

Checksums of entity bodies are used to
detect changes in entity content as it
passes through proxies
5
Message is made up of header and body
HTTP/1.0 200 OK
Server: Netscape_Enterprise/3.6
Date: Sun, 17 Sep 2000 00:01:05 GMT
Content_type: text/plain
Entity headers
Content-length :18
Hi!I’m a message!
Entity
Entity body
6
HTTP 1.1 defines 10 entity headers






Content-Type
Content-Length
Content-Language
Content-Encoding
Content-Location
Content-Range






Content-MD5
Last-Modified
Expires
Allow
ETag
Cache-Control
7
Entity Bodies
8
Why content-length is important?


Detecting Truncation
Incorrect Content-Length problems?



When connection is persistent, where one entity body
ends and the next message begins.
Chunked encoding is an alternate, sending the data in
a series of chunks, each with a specified chunk size.
When content-encoding is applied

Content-length refers to the encoded body, not the
length of the original, unencoded body.
9
Entity Digest



Content-MD5
Is used to check message integrity
Also can be used as a key into a hash
table to quickly locate documents and
reduce duplicate storage of content.
10
Media type and Charset


Content-type refers to original entity body
type before encoding.
Support optional parameters to further
specify the content type.


Character Encodings for Text Media
Content-Type: text/html; charset=iso-8859-4
11
Common media types
Media type
Description
Text/html
Entity body is an HTML document
Text/plain
Entity body is a document in plain text
Image/gif
Entity body is an image of type GIF
Image/jpeg
Entity body is an image of type JPEG
Audio/x-wav
Entity body contains WAV sound data
Model/vrml
Entity body is a three-dimensional VRML model
Application/vnd.ms-powerpoint
Entity body is a Microsoft PowerPoint presentation
Multipart/byteranges
Entity body has multiple parts,each containing a different
range(in bytes) of the full document
Message/http
Entity body contains a complete HTTP message (see TRACE)
12
Multipart Media Types



MIME “multipart” email messages contain
multiple messages stuck together and sent as a
single, complex message.
Each component is self-contained, with its own
headers describing its contents; the different
components are concatenated together and
delimited by a string.
HTTP also supports multipart bodies; however,
only used in two cases: fill-in form submission
and range responses carrying pieces of a
document.
13
Multipart Form Submissions

<form action=http://xxx/cgi
enctype="multipart/form-data“
method=POST> <P>
Your Name?
<INPUT type=“text” name=“submit-name”><br>
Your File to send?
<INPUT type=“file” name=“files”> <br>
<INPUT type=“submit” value=“send”>
<INPUT type=“reset”>
<form>
14
If the user enters “John” and selects
the text file “hello.txt”
Content-Type: multipart/form-data; boundary=AaBo3x
--AaBo3x
Content-Disposition: form-data; name=“submit-name”
John
--AaBo3x
Content-Disposition: form-data; name=“files”; filename=“hello.txt”
Content-Type: text/plain
… contents of hello.txt …
--AaBo3x
15
If selects the text file “hello.txt” and
the second image file “image.gif”
Content-Type: multipart/form-data; boundary=AaBo3x
--AaBo3x
Content-Disposition: form-data; name=“submit-name”
John
--AaBo3x
Content-Disposition: form-data; name=“files”;
Content-type: multipart/mixed; boundary=BbC04y
--BbC04y
Content-Disposition: file: filename=“hello.txt”
Content-type: text/plain
… contents of hello.txt …
--BbC04y
Content-Disposition: file: filename=“image.gif”
Content-Type: image/gif
Content-Transfer-Encoding: binary
… contents of image.gif …
--BbC04y
--AaBo3x
16
Multipart Range Response
HTTP/1.0 206 Partial Content
Server: Microsoft-IIS/5.0
Content-Location: http://xxx/hello.txt
Content-Type: martipart/x-byteranges; boundary=--[abcdefghik…z]-----[abcdefghik…z]—
Content-Type: text/plain
Content-Range: bytes 0-174/1441
…. Part I content ----[abcdefghik…z]-Content-Type: text/plain
Content-Range: bytes 1344-1441/1441
…. Part II content ----[abcdefghik…z]--
17
Content-Encoding



HTTP applications sometimes want to
encode content before sending it, to help
lesson the time it takes to transmit the
data.
Content-Type is the type of the original
format, before encoding
Content-Length is the length of the
encoded length
18
Content Encoding
Original content
Content-Type: text/html
Content-Length: 17571
Content-encoded content
Content-Type: text/html
Content-Length: 5746
content-encoding: gzip
Original content
Content-Type: text/html
Content-Length: 17571
01110001
00110010
Gzip content
decoder
Gzip content
encoder
19
Content-encoding tokens
Content-encoding
value
Description
gzip
Using the GNU zip encoding (RFC1952)
compress
Using the UNIX file compression program
deflate
Using zlib format (RFC1950) for deflate
compression (RFC 1951)
identity
No encoding has been performed. When a
Content-encoding header is not present, this can
be assumed.
20
Accept-Encoding Headers
Request message
GET /logo.gif HTTP/1.1
Accept-encoding: gzip
[…]
client
gunzip
…00101101…
server
HTTP/1.1 200 OK
Content-type: image/gif
Content-encoding: gzip
[…]
gzip
Response message
…00101101…
The server compresses the image with gzip to transport a smaller file over the thin
Network connection between itself and the client.This saves network bandwidth
And reduces the amount of time that the client waits for the transfer.Though,the
Client will have to spend time decompressing the image once the image is served.
21
Client can indicate preferred
encodings by attaching Q values
Accept-Encoding:
Accept-Encoding:
Accept-Encoding:
Accept-Encoding:
Accept-Encoding:
compress, gzip
*
compress;q=0.5, gzip;q=1.0
gzip;q=1.0, identity;q=0.5; *;q=0
22
Transfer Encoding


Content-Encodings are to deal with the
entity content to be encoded for lessspace or security reason, tightly
associated with the content format.
In comparison, transfer encodings are
applied for architectural reasons and are
independent of the content format.
23
Content encoding vs. transfer encoding
Content-encoded response
HTTP/1.0 200 OK
content-encoding: gzip
Content-Type: text/html
[…]
[encoded message]
Normal header block
Normal entity
(just encoded)
Transfer-encoded response
HTTP/1.1 200 OK
Transfer-encoding: Chunked
10
abcdefghijk
1
a
A content-encoded message just encodes the entity
Section of the message. With Transfer-encoded
Messages the encoding is a function of the entire
Message, changing the structure of the message itself
Basic header
Encoded blocks
24
Transfer-Encoding Headers

TE


Used in the request header to tell the server
what extension transfer encoding are okay to
use.
Transfer-Encoding

Used in the response header to tell the
receiver (client) what encoding has been
perform
25
Example
GET /1.html HTTP/1.1
Host: www.csie.ncnu.edu.tw
User-Agent: Mozilla/4.61
TE: trailers, chunked
HTTP/1.1 200 ok
Transfer-Encoding: chunked
Server: Apache 3.0
26
Chunked Encoding
27
Chunked Encoding (continued)

Chunking and Persistent connection

Trailers in chunked messages

Combining Content and Transfer Encoding
28
Combining Content and Transfer Encodings
Content-type: text/heml
Content encoding
Content-Type: text/html
content-encoding: gzip
9BF2578EA4
2670CD
9BF2578EA4
2670CD
Transfer encoding
(chunking)
Content-Type: text/html
content-encoding: gzip
Transfer-encoding: chunked
426
426
8EA
8EA
257
257
98B
98B
29
Time-Varying Instance



Web objects usually are not static.
The same URL can, over time, point to
different versions of an object.
For example, the website of any media
company like CNN, and BBC.
30
Time-Varying Instances
31
Validators and Freshness



In the previous CNN example, the client got the
initial resource V1 and can cache this copy, but
for how long?
Once the document has “expired” at the client, it
must request a fresh copy from the server.
Using a “conditional request” to tell the server
which version it currently has, using a validator,
and ask for a copy to be sent only if its current
copy is no long valid.
32
Cache-Control header directives
Directive
Message type
no-cache
Request
no-store
Request
max-age
Request
max-fresh
Request
no-transform
Request
only-if-cached
Request
public
Response
private
Response
33
Cache-Control header directives
Directive
Message type
no-cache
Response
no-store
Response
no-transform
Response
must-revalidate
Response
proxy-revalidate
Response
max-age
Response
s-max-age
Response
34
Conditional request types
Request type
validator
If-Modified-Since
Last-Modified
If-Unmodified-Since
Last-Modified
If-Match
ETag
If-None-Match
ETag
35
Range Request


HTTP allows clients to actually request just
part or a range of a document.
Applications:



Request RoI (Region of Interest)
Media Indexing and Access
Streaming applications
36
Range Requests
Request message
GET /bigfile.html HTTP/1.1
[…]
client
Response message
110100
111001
101001
110010
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 65537
Accept-Ranges: bytes
[…]
www.csie.ncnu.edu.tw
Range request message
GET /bigfile.html HTTP/1.1
Range: bytes=20224[…]
Range response message
The client’s original request was
Interrupted,but a second request
For the part of the message that
Was not received allows the
Client to resume form the point
Of the interruption
HTTP/1.1 200 OK
Content-Type: text/html
Range: bytes=20224Accept-Ranges: bytes
[…]
www.csie.ncnu.edu.tw
37
Delta Encoding


An extension to the HTTP protocol that
optimizes transfer by communicating
changes instead of entire objects.
RFC 3229 describe delta encoding.
38
Delta Encoding
39
Delta Encoding
40
Delta-encoding headers





Etag
If-None-Match
A-IM
IM
Delta-Base
41
IANA registered types of instance
manipulations
Type
Description
vcdiff
Delta using the vcdiff algorithm
diffe
Delta using the Unix diff-e command
gdiff
Delta using the gdiff algorithm
gzip
Compression using the gzip algorithm
deflate
Compression using the deflate algorithm
range
Used in a server response to indicate that the response is partial content as
the result of a range selection
identity
Used in a client request’s A-IM header to indicate that the client is willing to
accept an identity instance manipulation
42
For More Information

http://www.ietf.org/rfc/rfc2616.txt


http://www.ietf.org/rfc/rfc3229.txt


Multipurpose Internet Mail Extensions(MIME) Part One:Format of Internet
Message Bodies
http://www.ietf.org/rfc/rfc1864.txt


MIME (Multipurpose Internet Mail Extensions) Part One:Mechanisms for
Specifying and Describing the Format of Internet Message Bodies
http://www.ietf.org/rfc/rfc2045.txt


Delta encoding in HTTP
http://www.ietf.org/rfc/rfc1521.txt


Hypertext Transfer Protocol -- HTTP/1.1
The Content-MD5 Header Field
http://www.ietf.org/rfc/rfc3230.txt

Instance Digests in HTTP
43