http

34
Introduction to HyperText Transfer Protocol

Upload: shwetha

Post on 17-Nov-2015

215 views

Category:

Documents


2 download

DESCRIPTION

Notes about hyper text transfer protocol

TRANSCRIPT

  • Introduction to HyperText Transfer Protocol

  • IntroductionHTTP is a communications protocol for the transfer of information on intranets and the WWWanApplication Layerprotocol designed within the framework of theInternet Protocol Suite. Its original purpose was to provide a way to publish and retrieve hypertext pages over the Internet.HTTP/1.1, the version of HTTP in common use.It is a request/response standard between a client and a server. The client makes an HTTP request - using a web browser, spider, or other end-user tool - is referred to as the user agent. The responding server - which stores or creates resources such as HTML files and images - is called the origin server. In between the user agent and origin server may be several intermediaries, such as proxies, gateways.HTTP is a stateless protocol.

  • In HTML FormsCan be designed using tagNAME is used for future manipulation of data byscripting languageACTION indicates a program on the server that will beexecuted when this form is submitted. Mostly it will be anASP or a CGI script.METHOD indicates the way the form is submitted to the server - popular options are GET/POST Ex: ..(form elements go here)

  • HTTP SessionMake the connection

    opens a standard TCP connection to the server. Port 80 is default. Where ports other than 80 are used the port number is added to the URL http://www.some.server.com:8080Request a document

    Using a URL GET /first.html HTTP/1.1 The document is assumed to be stored on the server so fully qualified name is not provided.Respond to a request

    - Server response begins with response code and then other informationClose the connection

  • HTTP MessagesThe format of an HTTP message is: Header1: value1 Header2: value2 Header3: value3
  • The request messageA request line has three parts, separated by spaces: a method name, the local path of the requested resource, and the version of HTTP being used. A typical request line is: Ex : GET /path/to/file/index.html HTTP/1.0 Ex: GET /images/logo.gif HTTP/1.1, which requests the file logo.gif from the /images directory Headers, such as Accept-Language: en An empty line An optional message body The request line and headers must all end with (that is, a carriage return followed by a line feed). The empty line must consist of only and no other whitespace. In the HTTP/1.1 protocol, all headers except Host are optional.

  • Initial Response LineThe initial response line, called the status line, also has three parts separated by spaces: the HTTP version, a response status code that gives the result of the request, an English reason phrase describing the status code. Typical status lines are: HTTP/1.0 200 OK or HTTP/1.0 404 Not Found

  • Header Lines

    Header lines provide information about the request or response, or about the object sent in the message body. are in the usual text header format, which is: one line per header, of the form "Header-Name:valueending with CRLF. The header name is not case-sensitive .Any number of spaces or tabs may be between the ":" and the value. Header lines beginning with space or tab are actually part of the previous header line, folded into multiple lines for easy reading.

  • Sample HTTP ExchangeTo retrieve the file at the URL http://www.somehost.com/path/file.html (1) open a socket to the host www.somehost.com(2) port 80 (use the default port of 80 because none is specified in the URL)(3) Send something like the following through the socket: GET /path/file.html HTTP/1.0From: [email protected] User-Agent: HTTPTool/1.0 [blank line here]

  • Sample HTTP ExchangeThe server should respond with something like the following, sent back through the same socket:

    HTTP/1.0 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/html Content-Length: 1354 Happy New Millennium! (more file contents) . . . After sending the response, the server closes the socket.

  • Common HTTP status codes1xx Informational100 Continue- The client should continue with its request.2xx Successful200 OK - The request was fulfilled successfully by the server and the response sent to the client 3xx Redirection302 - Found - The server found what the client requested but it is in a different location304 - Not Modified - The document requested has not been modified since last time the client requested the same document 4xx - Client Error404 - Not Found - The server has not found anything matching the Request-URI401 Unauthorized -The request requires user authentication403 Forbidden - The server understood the request, but is refusing to fulfill it 5xx - Server Error500 - Internal Server Error - The server encountered an unexpected condition which prevented it from fulfilling the request503 - Service Unavailable -The server is currently unable to handle the request due to a temporary overloading or maintenance of the server

  • HTTP Versions

    The client tells in the beginning of the request the version it usesThe server uses the same or earlier version in the response.HTTP 0.9 Deprecated. Supports only one command, GET, which does not specify the HTTP version. Does not support headers. Since this version does not support POST, the client cannot pass much information to the server. HTTP/1.0 (May 1996) This is the first protocol revision to specify its version in communications and is still in wide use, especially by proxy servers. HTTP1.1Generally speaking, it is a superset of HTTP1.0. Improvements include: Faster response, by allowing multiple transactions to take place over a single persistent connection. Faster response and great bandwidth savings, by adding cache support. Faster response for dynamically-generated pages, by supporting chunked encoding, which allows a response to be sent before its total length is known. Efficient use of IP addresses, by allowing multiple domains to be served from a single IP address. HTTP/1.2 The initial 1995 working drafts of the document PEP an Extension Mechanism for HTTP (which proposed the , abbreviated PEP) were prepared by the World Wide Web Consortium and submitted to the Internet Engineering Task Force.

  • HTTP Features(1) It is a stateless protocol- every request is a fresh request.- no session or sequence number field is retained in the next exchange- e-commerce applications require a state management mechanism- The cookie provides a HTTP state management mechanism

  • HTTP Features(2) HTTP is a file transfer like protocol - more efficient than FTP for a file transfer on the Internet.- No command line overheads HTTP is simple.

    (3) HTTP is very light- small format so speedy as compared to other existing protocols

  • HTTP Features(4) It is flexible.- breaking of connection does not affect because it is stateless(5) It is based on OOP - Methods are applied to objects identified by URL(6) From HTTP 1.0 version onwardsSupports MIME type file definitionHTTP specific methods are GET, POST, HEAD, CONNECT, PUT, DELETE, TRACE, OPTIONS

  • HTTP Features(7) Provision for user authentication exists- HTTP 1.1 - Digest Access Authentication prevents transmission of user name and password as HTML or text(8) Host Header field to support virtual hosting

  • HTTP Features(10) Status Code in the response(11) Caching of a resource provided at server(12) Byte range specification helps in large response in parts(13) Selection among various characteristics on a retrieval by the client - Language and encoding can be specified in the server environment variables while client sends the request header for retrieving the resource.

  • HTTP Request MethodsHTTP defines eight methods (sometimes referred to as "verbs") indicating the desired action to be performed on the identified resource.HEAD Asks for the response identical to the one that would correspond to a GET request, but without the response body. This is useful for retrieving meta-information written in response headers, without having to transport the entire content. This is useful to check characteristics of a resource without actually downloading it, thus saving bandwidth. GET Requests a representation of the specified resource. It is the most common method used on the Web today. POST Submits data to be processed (e.g. from an HTML form) to the identified resource. The data is included in the body of the request. This may result in the creation of a new resource or the updates of existing resources or both. A POST request is used to send data to the server to be processed in some way, like by a CGI script.

  • HTTP Request MethodsPUT Uploads a representation of the specified resource. DELETE Deletes the specified resource.TRACE Echoes back the received request, so that a client can see what intermediate servers are adding or changing in the request. OPTIONS Returns the HTTP methods that the server supports for specified URI. This can be used to check the functionality of a web server by requesting '*' instead of a specific resource. CONNECT Converts the request connection to a transparent TCP/IP tunnel, usually to facilitate SSL-encrypted communication (HTTPS) through an unencrypted HTTP proxy. HTTP servers are supposed to implement at least the GET and HEAD methods and, whenever possible, also the OPTIONS method.

  • GET vs POSTGET The form data is retrieved from the QUERY_STRING environment variable. The form data is part of the URL and ends up in the server logs The query string shows up in the history list as well as user bookmarks There is a limit on the (255 characters) length of the query string So any login details with password should never be posted by using GET method.As the data transfers through address bar ( URL ) there are some restrictions in using space, some characters like ampersand ( & ) etc in the GET method of posting data. POSTdata is not passed through URL as a query string in user address bar The form data is available on STDIN, which is a handle for the standard input stream The POST data is not recorded in the server logs The data is not cached in the history or bookmarks so it is more secure There is no size limit on a HTTP header

  • HTTP HeadersHTTP1.0 defines 16 headers, though none are required. HTTP1.1 defines 46 headers, and one (Host:) is required in requests. Headers include The From: header gives the email address of whoever's making the request, or running the program doing so. The User-Agent: header identifies the program that's making the request, in the form "Program-name/x.xx", where x.xx is the (mostly) alphanumeric version of the program. The Server: header is analogous to the User-Agent: header: it identifies the server software in the form "Program-name/x.xx". For example, one beta version of Apache's server returns "Server:Apache/1.2b3-dev". The Last-Modified: header gives the modification date of the resource that's being returned. It's used in caching and other bandwidth-saving activities. Use Greenwich Mean Time, in the format Last-Modified: Fri, 31 Dec 1999 23:59:59 GMT Host : The domain name of the server (for virtual hosting), mandatory since HTTP/1.1Ex: Host: en.wikipedia.org

  • Safe methodsvs. Unsafe MethodsSafe methods e.g. HEAD, GET, OPTIONS, and TRACEMeans they are intended only for information retrieval and should not change the state of the server No side effectsUnsafe methods include POST, PUT and DELETEshould be displayed to the user in a special way, typically as buttons rather than links thus making the user aware of possible obligations (such as a button that causes a financial transaction).

  • Idempotent Methods and Web Applications

    Meaning that multiple identical requests should have the same effect as a single request. Methods GET, HEAD, OPTIONS and TRACE, being safe, are inherently idempotent.The RFC allows a user-agent, such as a browser to assume that any idempotent request can be retried without informing the user. This is done to improve the user experience when connecting to unresponsive or heavily-loaded web servers.However, note that the idempotence is not assured by the protocol or web server. Methods PUT and DELETE are not idempotent.

  • The Message Body

    An HTTP message may have a body of data sent after the header lines. In a response, this is where the requested resource is returned to the client (the most common use of the message body), or perhaps explanatory text if there's an error. In a request, this is where user-entered data or uploaded files are sent to the server. If an HTTP message includes a body, there are usually header lines in the message that describe the body. Header lines include The Content-Type: header gives the MIME-type of the data in the body, such as text/html or image/gif. The Content-Length: header gives the number of bytes in the body.

  • HTTP1.1 Protocol

    To comply with HTTP1.1, clients must include the Host: header with each request accept responses with chunked data either support persistent connectionshandle the "100Continue" response Support absolute URLsor include the "Connection:close" header with each request include the Date: header in each response handle requests with If-Modified-Since: or If-Unmodified-Since: headers support at least the GET and HEAD methods support HTTP1.0 requests

  • (1) Host : Header(1) Host: HeaderStarting with HTTP1.1, one server at one IP address can be multi-homed, i.e. the home of several Web domains. For example, "www.host1.com" and "www.host2.com" can live on the same server. Thus, every HTTP request must specify which host name (and possibly port) the request is intended for, with the Host: header. A complete HTTP1.1 request might be

    GET /path/file.html HTTP/1.1 Host: www.host1.com:80 [blank line here]

  • (2) Chunked Transfer-Encoding

    If a server wants to start sending a response before knowing its total length (like with long script output) It can use the simple chunked transfer-encodingwhich breaks the complete response into smaller chunks and sends them in series. You can identify such a response because it contains the "Transfer-Encoding:chunked" header. All HTTP1.1 clients must be able to receive chunked messages. A chunked message body contains a series of chunks, followed by a line with "0" (zero) followed by optional footers (just like headers), and a blank line. Each chunk consists of two parts: Example

    HTTP/1.1 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/plain Transfer-Encoding: chunked 1a; abcdefghijklmnopqrstuvwxyz 10 1234567890abcdef 0 The chunks can contain any binary data, and may be much larger than the examples here.

  • (3) Persistent Connections

    In HTTP/0.9 and 1.0, the connection is closed after a single request/response pair. In HTTP/1.1 a keep-alive-mechanism was introduced, where a connection could be reused for more than one request.Opening and closing TCP connections takes a substantial amount of CPU time, bandwidth, and memory. In practice, most Web pages consist of several files on the same server, so much can be saved by allowing several requests and responses to be sent through a single persistent connection. Persistent connections are the default in HTTP1.1, so nothing special is required to use them. Just open a connection and send several requests in series (called pipelining), and read the responses in the same order as the requests were sent.

  • (4) The "100Continue" Response

    During the course of an HTTP1.1 client sending a request to a server, the server might respond with an interim "100Continue" response. This means the server has received the first part of the request, and can be used to aid communication over slow links. The "100Continue" response is structured like any HTTP response, i.e. consists of a status line, optional headers, and a blank line. Unlike other responses, it is always followed by another complete, final response. So, further extending the last example, the full data that comes back from the server might consist of two responses in series, like HTTP/1.1 100 Continue HTTP/1.1 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/plain Content-Length: 42 some-footer: some-value another-footer: another-value abcdefghijklmnoprstuvwxyz1234567890abcdef

  • (5) Accepting Absolute URL's

    The Host: header is actually an interim solution to the problem of host identification. In future versions of HTTP, requests will use an absolute URL instead of a pathname, like GET http://www.somehost.com/path/file.html HTTP/1.2 To enable this protocol transition, HTTP1.1 servers must accept this form of request, even though HTTP1.1 clients won't send them.

  • (6) Persistent Connections and the "Connection:close" Header

    If an HTTP1.1 client sends multiple requests through a single connection, the server should send responses back in the same order as the requests.If a request includes the "Connection:close" header, that request is the final one for the connection and the server should close the connection after sending the response. Also, the server should close an idle connection after some timeout period (can be anything; 10 seconds is fine). If you don't want to support persistent connections, include the "Connection:close" header in the response.

  • (7) The Date: Header

    Caching is an important improvement in HTTP1.1, and it can't work without timestamped responses. So, servers must timestamp every response with a Date: header containing the current time, in the form Ex: Date: Fri, 31 Dec 1999 23:59:59 GMT All responses except those with 100-level status (but including error responses) must include the Date: header. All time values in HTTP use Greenwich Mean Time.

  • (8) If-Modified-Since: or If-Unmodified-Since: Headers

    To avoid sending resources that don't need to be sent, thus saving bandwidth, HTTP1.1 defines the If-Modified-Since: and If-Unmodified-Since: request headers. If-Modified-Since : Only send the resource if it has changed since this dateIf-Unmodified-Since - says the opposite.

    Clients aren't required to use them, but HTTP1.1 servers are required to honor requests that do use them. If-Modified-Since: Fri, 31 Dec 1999 23:59:59 GMT If-Modified-Since: Friday, 31-Dec-99 23:59:59 GMT If-Modified-Since: Fri Dec 31 23:59:59 1999

    Although servers must accept all three date formats, HTTP1.1 clients and servers must only generate the first kind.

  • State information

    A browser can keep state information in Cookies Hidden form fields URL rewriting Challenge/response CookiesCookie are passed at the HTTP layer HTTP format is Set-Cookie: cookie-value Cookies are sent from the server to browser and returned from browser to server Cookies have a lifetime a domain a flag to return on secure or non-secure channels