1 http messages entities and encoding herng-yow chen
TRANSCRIPT
1
HTTP messagesEntities and Encoding
Herng-Yow Chen
2
Outline
The format and behavior of HTTP message entities as HTTP containers
How HTTP describes the size of entity bodies, and what HTTP requires in the way of sizing
The entity headers used to describe the format, alphabet, and language of content, so clients can process it properly
3
Reversible content encoding transforms data format to take up less space or be more secure
Transfer encoding modifies how HTTP ships data to enhance the communication of some kinds of data
Chunked encoding chops data into multiple pieces to deliver content of unknown length safely
4
The assortment of tags, labels, times, and checksums help clients get the latest version of requested content
Ranges are useful for continuing aborted downloads where they left off
Delta encoding extensions allow client to request just those parts of a web page that actually have changed since a previously viewed revision
5
Checksums of entity bodies are used to detect changes in entity content as it passes through proxies
6
Message is made up of header and body
HTTP/1.0 200 OKServer: Netscape_Enterprise/3.6Date: Sun, 17 Sep 2000 00:01:05 GMTContent_type: text/plainContent-length :18
Hi!I’m a message! Entity body
Entity headers
Entity
7
HTTP 1.1 defines 10 entity headers
Content-Type Content-Length Content-Language Content-Encoding Content-Location Content-Range
Content-MD5 Last-Modified Expires Allow ETag Cache-Control
8
Entity Bodies
9
Why content-length is important?
Detecting Truncation Incorrect Content-Length problems?
When connection is persistent, where one entity body ends and the next message begins.
Chunked encoding is an alternate, sending the data in a series of chunks, each with a specified chunk size.
When content-encoding is applied Content-length refers to the encoded body, not the
length of the original, unencoded body.
10
Entity Digest
Content-MD5 Is used to check message integrity Also can be used as a key into a hash
table to quickly locate documents and reduce duplicate storage of content.
11
Media type and Charset Content-type refers to original entity bo
dy type before encoding. Support optional parameters to further
specify the content type. Character Encodings for Text Media Content-Type: text/html; charset=iso-8859-
4
12
Common media typesMedia type Description
Text/html Entity body is an HTML document
Text/plain Entity body is a document in plain text
Image/gif Entity body is an image of type GIF
Image/jpeg Entity body is an image of type JPEG
Audio/x-wav Entity body contains WAV sound data
Model/vrml Entity body is a three-dimensional VRML model
Application/vnd.ms-powerpoint
Entity body is a Microsoft PowerPoint presentation
Multipart/byteranges Entity body has multiple parts,each containing a different range(in bytes) of the full document
Message/http Entity body contains a complete HTTP message (see TRACE)
13
Multipart Media Types
MIME “multipart” email messages contain multiple messages stuck together and sent as a single, complex message.
Each component is self-contained, with its own headers describing its contents; the different components are concatenated together and delimited by a string.
HTTP also supports multipart bodies; however, only used in two cases: fill-in form submission and range responses carrying pieces of a document.
14
Multipart Form Submissions
<form action=http://xxx/cgi enctype="multipart/form-data“
method=POST> <P> Your Name? <INPUT type=“text” name=“submit-name”><br> Your File to send? <INPUT type=“file” name=“files”> <br>
<INPUT type=“submit” value=“send”> <INPUT type=“reset”><form>
15
If the user enters “John” and selects the text file “hello.txt”
Content-Type: multipart/form-data; boundary=AaBo3x--AaBo3xContent-Disposition: form-data; name=“submit-name”John--AaBo3xContent-Disposition: form-data; name=“files”; filename=“hello.t
xt”Content-Type: text/plain… contents of hello.txt …--AaBo3x
16
If selects the text file “hello.txt” and the second image file “image.gif”
Content-Type: multipart/form-data; boundary=AaBo3x--AaBo3xContent-Disposition: form-data; name=“submit-name”John--AaBo3xContent-Disposition: form-data; name=“files”; Content-type: multipart/mixed; boundary=BbC04y--BbC04yContent-Disposition: file: filename=“hello.txt”Content-type: text/plain… contents of hello.txt …--BbC04yContent-Disposition: file: filename=“image.gif”Content-Type: image/gifContent-Transfer-Encoding: binary… contents of image.gif …--BbC04y--AaBo3x
17
Multipart Range Response
HTTP/1.0 206 Partial ContentServer: Microsoft-IIS/5.0Content-Location: http://xxx/hello.txtContent-Type: martipart/x-byteranges; boundary=--[abcdefghik…z]--
----[abcdefghik…z]—Content-Type: text/plainContent-Range: bytes 0-174/1441 …. Part I content -----[abcdefghik…z]--Content-Type: text/plainContent-Range: bytes 1344-1441/1441 …. Part II content -----[abcdefghik…z]--
18
Content-Encoding
HTTP applications sometimes want to encode content before sending it, to help lesson the time it takes to transmit the data.
Content-Type is the type of the original format, before encoding
Content-Length is the length of the encoded length
19
Content EncodingOriginal contentContent-Type: text/htmlContent-Length: 17571
Original contentContent-Type: text/htmlContent-Length: 17571
Content-encoded contentContent-Type: text/htmlContent-Length: 5746content-encoding: gzip
0111000100110010
Gzip contentdecoder Gzip content
encoder
20
Content-encoding tokens
Content-encoding value
Description
gzip Using the GNU zip encoding (RFC1952)
compress Using the UNIX file compression program
deflate Using zlib format (RFC1950) for deflate compression (RFC 1951)
identity No encoding has been performed. When a Content-encoding header is not present, this can be assumed.
21
Accept-Encoding Headers
serverclient
HTTP/1.1 200 OKContent-type: image/gifContent-encoding: gzip[…]
Request message
Response message
…00101101……00101101…
The server compresses the image with gzip to transport a smaller file over the thinNetwork connection between itself and the client.This saves network bandwidthAnd reduces the amount of time that the client waits for the transfer.Though,theClient will have to spend time decompressing the image once the image is served.
gzipgunzip
GET /logo.gif HTTP/1.1Accept-encoding: gzip[…]
22
Client can indicate preferred encodings by attaching Q values
Accept-Encoding: compress, gzipAccept-Encoding:Accept-Encoding: *Accept-Encoding: compress;q=0.5, gzip;q=1.0Accept-Encoding: gzip;q=1.0, identity;q=0.5; *;q=0
23
Transfer Encoding
Content-Encodings are to deal with the entity content to be encoded for less-space or security reason, tightly associated with the content format.
In comparison, transfer encodings are applied for architectural reasons and are independent of the content format.
24
Content encoding vs. transfer encoding
HTTP/1.0 200 OKcontent-encoding: gzipContent-Type: text/html[…][encoded message]
HTTP/1.1 200 OKTransfer-encoding: Chunked
10abcdefghijk1a
Content-encoded response
Transfer-encoded response
Normal header block
Normal entity(just encoded)
Basic header
Encoded blocks
A content-encoded message just encodes the entitySection of the message. With Transfer-encodedMessages the encoding is a function of the entireMessage, changing the structure of the message itself
25
Transfer-Encoding Headers
TE Used in the request header to tell the
server what extension transfer encoding are okay to use.
Transfer-Encoding Used in the response header to tell the
receiver (client) what encoding has been perform
26
Example
GET /1.html HTTP/1.1Host: www.csie.ncnu.edu.twUser-Agent: Mozilla/4.61TE: trailers, chunked
HTTP/1.1 200 okTransfer-Encoding: chunkedServer: Apache 3.0
27
Chunked Encoding
28
Chunked Encoding (continued)
Chunking and Persistent connection
Trailers in chunked messages
Combining Content and Transfer Encoding
29
Combining Content and Transfer Encodings
9BF2578EA42670CD
9BF2578EA42670CD
4268EA
25798B
426
8EA257
98B
Content encoding
Transfer encoding(chunking)
Content-type: text/heml
Content-Type: text/htmlcontent-encoding: gzip
Content-Type: text/htmlcontent-encoding: gzipTransfer-encoding: chunked
30
Time-Varying Instance
Web objects usually are not static. The same URL can, over time, point
to different versions of an object.
For example, the website of any media company like CNN, and BBC.
31
Time-Varying Instances
32
Validators and Freshness In the previous CNN example, the client got th
e initial resource V1 and can cache this copy, but for how long?
Once the document has “expired” at the client, it must request a fresh copy from the server.
Using a “conditional request” to tell the server which version it currently has, using a validator, and ask for a copy to be sent only if its current copy is no long valid.
33
Cache-Control header directives
Directive Message type
no-cache Request
no-store Request
max-age Request
max-fresh Request
no-transform Request
only-if-cached Request
public Response
private Response
34
Cache-Control header directives
Directive Message type
no-cache Response
no-store Response
no-transform Response
must-revalidate Response
proxy-revalidate Response
max-age Response
s-max-age Response
35
Conditional request types
Request type validator
If-Modified-Since Last-Modified
If-Unmodified-Since Last-Modified
If-Match ETag
If-None-Match ETag
36
Range Request
HTTP allows clients to actually request just part or a range of a document.
Applications: Request RoI (Region of Interest) Media Indexing and Access Streaming applications
37
Range Requests
GET /bigfile.html HTTP/1.1[…]
GET /bigfile.html HTTP/1.1Range: bytes=20224-[…]
HTTP/1.1 200 OKContent-Type: text/htmlContent-Length: 65537Accept-Ranges: bytes[…]
HTTP/1.1 200 OKContent-Type: text/htmlRange: bytes=20224-Accept-Ranges: bytes
[…]
Response message
Range response message
Request message
www.csie.ncnu.edu.tw
www.csie.ncnu.edu.tw
client
110100111001101001110010
The client’s original request wasInterrupted,but a second requestFor the part of the message that Was not received allows the Client to resume form the pointOf the interruption
Range request message
38
Delta Encoding
An extension to the HTTP protocol that optimizes transfer by communicating changes instead of entire objects.
RFC 3229 describe delta encoding.
39
Delta Encoding
40
Delta Encoding
41
Delta-encoding headers
Etag If-None-Match A-IM IM Delta-Base
42
IANA registered types of instance manipulations
Type Descriptionvcdiff Delta using the vcdiff algorithm
diffe Delta using the Unix diff-e command
gdiff Delta using the gdiff algorithm
gzip Compression using the gzip algorithm
deflate Compression using the deflate algorithm
range Used in a server response to indicate that the response is partial content as the result of a range selection
identity Used in a client request’s A-IM header to indicate that the client is willing to accept an identity instance manipulation
43
For More Information
http://www.ietf.org/rfc/rfc2616.txt Hypertext Transfer Protocol -- HTTP/1.1
http://www.ietf.org/rfc/rfc3229.txt Delta encoding in HTTP
http://www.ietf.org/rfc/rfc1521.txt MIME (Multipurpose Internet Mail Extensions) Part One:Mechanisms for
Specifying and Describing the Format of Internet Message Bodies http://www.ietf.org/rfc/rfc2045.txt
Multipurpose Internet Mail Extensions(MIME) Part One:Format of Internet Message Bodies
http://www.ietf.org/rfc/rfc1864.txt The Content-MD5 Header Field
http://www.ietf.org/rfc/rfc3230.txt Instance Digests in HTTP