Download - Web technologies: HTTP
HTTP
HTTP
• HyperText Transfer Protocol• Application level protocol for the exchange of hypertext
document• Standardizes
– Resource names (URL)– requests– responses
• Versions: HTTP/0.9, 1.0, 1.1• Ref: Tim Berners Lee, Request for Comment 1945,
HTTP/1.0– http://www.w3.org/Protocols/rfc1945/rfc1945
HTTP as a client server system• Client
– An application program that establishes connections for the purpose of sending requests.
• Server – An application program that accepts connections in order to service
requests by sending back responses • User agent
– The client which initiates a request. These are often browsers, editors, spiders (web-traversing robots), or other end user tools
• Origin server– The server on which a given resource resides or is to be created
• Resource– A network data object or service which can be identified by a URI
The HTTP browser
• Sends HTTP requests to a server• Receives and interprets responses• Visualizes resources• Timeline
http://meyerweb.com/eric/browsers/timeline-structured.html
Browser features
• Version of the document description languages supported (HTML, CSS)
• Native programming language support (Javascript)
• Extension mechanisms– Plug-in interface
• Content viewers (e.g., Adobe Acrobat for PDF, Microsoft Silverlight, Apple Quicktime)
• Programming language interpreters (e.g., Java)
The HTTP server• Functionality
– Network access with HTTP for handling requests
– Access to resources in secondary storage
– Delivery of HTTP responses– Access control– Server-side program execution– Logging– Monitoring and administration– Virtual hosting– URL mapping– Connection to application
servers
HTTP server vs application server
ClientWeb
serverApplication
server
Database (with pooled connections)
App.
Servers
Applications
Example
HTTP limitations
• HTTP is stateless– Every HTTP request-response cycle is independent– No data are preserved between two connections
of the same client or of different clients– HTTP is thus sessionless– HTTP 1.0 also closes the TCP connection between
the client and the server host at each roundtrip (fixed in HTTP 1.1)
Application server features
• The application server can be stateful (e.g. a residential process)
• It can preserve the user’s session across multiple request-response cycles
• Can preserve session data• Can handle shared resources (e.g, pool of database
connections) • Can be optimized (multi-threading, multi-processing,
multi-host distribution)• Can be multi-protocol (e.g., Corba IIOP, COM/DCOM)
HTTP Proxy
• An intermediary program which acts as both a server and a client for the purpose of making requests on behalf of other clients.
• Main usage:– Access control (inbound,
outbound)– Resource caching
HTTP Gateway• A server which acts as an
intermediary for some other server. Unlike a proxy, a gateway receives requests as if it were the origin server for the requested resource; the requesting client may not be aware that it is communicating with a gateway.
• Usage – protocol translators for access
to resources stored on non-HTTP systems.
Uniform Resource Locator (URL)• Structured string
– http_URL = "http:" "//" host [ ":" port ] [ abs_path ]
– http://www.elet.polimi.it:8080/people/fraterna.html• Protocol: http, but also ftp, file• Host address:
– symbolic: www.elet.polimi.it– numeric (IP): 131.175.21.1
• Can include port number (e.g. :8080)• Path: directory sequence• Resource name: file id
– If resource is an html file, can include an internal fragment address (e.g. fraterna.html#curriculum)
• More on the URL when introducing dynamic Web resources
HTTP request• full-request :- request-line
*(general-header | request-header |
entity-header) CRLF [entity-body]
• request-line :- method SP URL SP version CRLF
• method :- GET | POST | HEAD | others..
• Example of request-line:GET /pub/papers/pap101.html HTTP/1.0
HTTP Response
• full-response :- status-line*(general-header |
request-header | entity-header)
CRLF [entity-body]• status-line :- version SP status SP message CRLF• status: Codici di stato:
1XX (informative), 2XX (success),3XX (redirection), 4XX(client error), 5XX (server error)
• Example: HTTP 404 - File not found
Headersentity-header = Allow | Content-Encoding | Content-Language | Content-Length | Content-Location | Content-MD5 | Content-Range | Content-Type | Expires | Last-Modified
general-header = Cache-Control | Connection
| Date | Pragma | Trailer | Transfer-Encoding
| Upgrade | Via | Warning
Headersrequest-header = Accept
| Accept-Charset | Accept-Encoding | Accept-Language | Authorization
| Expect | From
| Host | If-Match | If-Modified-Since | If-None-Match
| If-Range | If-Unmodified-Since
| Max-Forwards | Proxy-Authorization | Range | Referer | TE
| User-Agent
response-header = Accept-Ranges | Age
| ETag | Location | Proxy-Authenticate | Retry-After | Server | Vary | WWW-Authenticate
Quick reference to HTTP headershttp://www.cs.tut.fi/~jkorpela/http.html
Test for the headers sent by the browserhttp://www.tipjar.com/cgi-bin/test
HTTP headers in a request (examples)Field name Description Example
Accept Content-Types that are acceptable Accept: text/plain
Accept-Charset Character sets that are acceptable Accept-Charset: utf-8
Accept-Encoding Acceptable encodings. See HTTP compression. Accept-Encoding: gzip, deflate
Accept-Language Acceptable human languages for response Accept-Language: en-US
Authorization Authentication credentials for HTTP authentication Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==
Cache-ControlUsed to specify directives that MUST be obeyed by all caching
mechanisms along the request/response chainCache-Control: no-cache
Connection What type of connection the user-agent would prefer Connection: keep-alive
Cookiean HTTP cookie previously sent by the server with Set-
Cookie (below)Cookie: $Version=1; Skin=new;
Content-Length The length of the request body in octets (8-bit bytes) Content-Length: 348
Content-MD5A Base64-encoded binary MD5 sum of the content of the request
bodyContent-MD5: Q2hlY2sgSW50ZWdyaXR5IQ==
Content-TypeThe MIME type of the body of the request (used with POST and
PUT requests)Content-Type: application/x-www-form-urlencoded
Date The date and time that the message was sent Date: Tue, 15 Nov 1994 08:12:31 GMT
ExpectIndicates that particular server behaviors are required by the
clientExpect: 100-continue
User-Agent The user agent string of the user agentUser-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0)
Gecko/20100101 Firefox/12.0
....
HTTP headers in a response (examples)
Field name Description Example
Accept-Ranges What partial content range types this server supports Accept-Ranges: bytes
Age The age the object has been in a proxy cache in seconds Age: 12
Cache-ControlTells all caching mechanisms from server to client whether they
may cache this object. It is measured in secondsCache-Control: max-age=3600
Connection Options that are desired for the connection[21] Connection: close
Content-Encoding The type of encoding used on the data. See HTTP compression. Content-Encoding: gzip
Content-Language The language the content is in Content-Language: da
Content-Length The length of the response body in octets (8-bit bytes) Content-Length: 348
Content-Location An alternate location for the returned data Content-Location: /index.htm
Content-MD5A Base64-encoded binary MD5 sum of the content of the
responseContent-MD5: Q2hlY2sgSW50ZWdyaXR5IQ==
Content-Range Where in a full body message this partial message belongs Content-Range: bytes 21010-47021/47022
Content-Type The MIME type of this content Content-Type: text/html; charset=utf-8
Date The date and time that the message was sent Date: Tue, 15 Nov 1994 08:12:31 GMT
Expires Gives the date/time after which the response is considered stale Expires: Thu, 01 Dec 1994 16:00:00 GMT
Last-ModifiedThe last modified date for the requested object, in RFC 2822
formatLast-Modified: Tue, 15 Nov 1994 12:45:26 GMT
HTTP security• Resources are pooled in domains at the server (called realms)• Realms can be protected• HTTP request for protected resource must provide authorization header
– Credentials transmitted in clear, base64-encoded• If credentials are wrong server sends response with status code 401
(unauthorized) + (authenticate) header, which causes the dialog for inputting credential to appear
HTTP 1.1
• Calendar– Jan 1997: HTTP/1.1 becomes Proposed Standard (RFC
2068) – June 1999 Improvements and updates under RFC 2616 in– Main innovations
• Tunnels• Chunked encoding• Multi-request connections• Content negotiation• Advanced cache management• New methods (OPTIONS, PUT, DELETE, TRACE, CONNECT,
extension-method)
Tunnels• Tunnel = An intermediary
program which is acting as a blind relay between two connections.
• A tunnel is not a party to the HTTP communication, though the tunnel may have been initiated by an HTTP request. It does not change the messages;
• Tunnels are used when the communication needs to pass through an intermediary (such as a firewall) even when the intermediary cannot understand the contents of the messages.
Chuncked transfer encoding
Behavior• A data transfer mechanism in which
data is sent in blocks called "chunks“• It uses the Transfer-Encoding header
in place of the Content-Length header, the sender does not need to know the length of the content before it starts transmitting a response to the receiver. (useful for dynamically-generated content).
• Size is sent before the chunk so that the receiver can tell when it has finished receiving data for that chunk.
• Data transfer is terminated by a final chunk of length zero.
Benefits• Allows a server to maintain
an HTTP persistent connection for dynamically generated content
• Allows the sender to send header fields after the message body, in cases where values cannot be known until the content has been produced (e.g., digital signature)
Persistent connection
Behavior• HTTP 1.0 required opening a new
connection for every single request/response pair
• Connection: Keep-Alive header used in HTTP 1.0 to avoid dropping the connection.
• When the client sends another request, it uses the same connection. This will continue until either the client or the server decides that the conversation is over, and one of them drops the connection.
• In HTTP 1.1 all connections are persistent, unless otherwise specified
Benefits• Less CPU and memory usage
(because fewer connections are open simultaneously)
• Enables HTTP pipelining of requests and responses
• Reduced network congestion (fewer TCP connections)
• Reduced latency in subsequent requests (no handshaking)
• Errors can be reported without the penalty of closing the TCP connection
Content negotiation
Behavior• Server driven: the request
contains headers (e.g., accept-encoding) and the server pick the corresponding version (client must include header in each request)
• Agent driven: the response contains the URIs of the alternative versions (Alternates) and client chooses (requires 2 requests)
• Trasparent: managed by the proxy cache
Benefits• makes it possible to serve
different versions of resource at the same URI, so that user agents can obtain the version that fits their capabilities the best
Cache management
• Goal: minimaze network traffic and bandwidth usage
• Mechanism: storing a duplicate of the resource in a location closer to the client and serving that in response to a request
• Semantic transparency: – the client must be unaware of the cache– Warning must be given to the client if the duplicate
may be disaligned wrt to the original resource
Cache operations
• Expiration– The server can declare the validity in time of a resource (Cache-
Control and Expires header)– Requires computing the age of a resource (in the Age header)
in presence of time zones and differences, multiple responses• Validation
– The cache can control the validity of the expired copy, (e.g., based on Date and Last-Modified time, or on explicit entity tags, i.e., version control numbers)
– Requires conditional requests and validation headers– May produce the Warning general-header, when the response
contains a possibly stale entity
References
• HTTP1.0: Tim Berners Lee, Request for Comment 1945, HTTP1.0
• HTTP1.1: Internet Draft <draft-ietf-http-v11-spec-rev-06> (November 18, 1998) http://www.w3.org/Protocols/History.html#HTTP11
• HTTP Status codes: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
• HTTP Intro: http://jmarshall.com/easy/http/• Web info: http://www.webopedia.com