http hypertext transfer protocol zach kokkeler, scott hansen, mustafa ashurex

HTTPHTTPHyperText Transfer Protocol

Zach Kokkeler, Scott Hansen, Mustafa Ashurex

OVERVIEW

• What Is HTTP?

• Creator

• Client Request

• Server Response

• Statelessness

• We all use it every day, we type it in our browsers; ( http:// ) but do we really know what HTTP is?

• HyperText Transfer Protocol is a protocol, or way of communicating, that computers around the world use to convey information on the World Wide Web.

• It is important to know that the Web and the Internet are NOT the same thing. The Internet is merely the international network of connected computers. The Web and HTTP use the internet to transfer documents from one computer to another.

• Common users of HTTP are; Firefox, Internet Explorer, Netscape, and many more programs.

HTTP ?

HTTP ? (continued)

• In greater detail;

• HTTP is an application-level protocol for distributed, collaborative, hypermedia information systems. It is a generic, stateless, protocol which can be used for many tasks beyond its use for hypertext, such as name servers and distributed object management systems, through extension of its request methods, error codes and headers. A feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred.

•Obtained from:http://www.faqs.org/rfcs/rfc2616.html

Tim Berners-Lee• Born in London, Oxford Graduate

• Created both HTTP and HTML in 1989

While at CERN in Geneva (European Physics Lab)

• Wanted web to be open source, editable, network.

• Director of the World Wide Web Consortium

• Headquarters in France,

Japan, and the U.S. (MIT)

• Senior Researcher at MIT

“It is just as important to be able to edit the web as browse it.”

OSI APPLICATION LAYER

• The OSI, or Open System Interconnection, model defines a networking framework for implementing protocols in seven layers. Control is passed from one layer to the next, starting at the application layer in one station, proceeding to the bottom layer, over the channel to the next station and back up the hierarchy.

• The application layer provides the interface to the communications environment which is used by the application process. It is responsible for communicating application process parameters.

• HTTP generally operates over TCP connections, usually to port 80, though this can be overridden and another port, or any reliable transport protocol used. After a successful connection, the client transmits a request message to the server, which sends a reply message back. HTTP messages are human-readable, and an HTTP server can be manually operated with a command such as telnet server 80.

TCP

• Short for Domain Name System (or Service/Server), an Internet service that resolves domain names into

IP addresses. Because domain names are alphabetic, they're easier to remember. The Internet however,

is based on IP addresses. Every time you use a domain name, therefore, a DNS service must resolve the name into the corresponding IP address. For example, the domain name www.oregonstate.edu

resolves to 128.193.4.112

• The DNS system is, in fact, its own network. If one DNS server doesn't know how to translate a particular

domain name, it asks another one, and so on, until the correct IP address is returned.

DNS

• Short for Multipurpose Internet Mail Extensions, a specification for formatting non-ASCII messages so that they can be sent over the Internet. Many e-mail clients now support MIME, which enables them to send and receive graphics, audio, and video files via the Internet mail system. In addition, MIME supports messages in character sets other than ASCII.

• In addition to e-mail applications, Web browsers also support various MIME types. This enables the browser to display or output files that are not in HTML format. MIME was defined in 1992 by the Internet Engineering Task Force (IETF). A new version, called S/MIME, supports encrypted transfers

MIME

CLIENT / SERVER RELATIONSHIP

• HTTP is a request/response protocol. A client sends a request to the server in the form of a request method, URI, and protocol version, followed by a MIME-like message containing request modifiers, client information, and possible body content over a connection with a server.

• The server responds with a status line, including the message's protocol version and a success or error code, followed by a MIME-like message containing server information, entity meta-information, and possible entity-body content.

CLIENT/SERVER RELATIONSHIP

CLIENT REQUEST

• GET: By far the most common method used to request for a specified URL. • HEAD: Identical to GET, except that the page content is

not returned; just the headers are. Useful for retrieving meta-information. • POST: Similar to GET, except that a message body, typically containing key-value pairs from an HTML form submission, is included in the request. • PUT: Used for uploading files to a specified URI. • DELETE: Deletes a resource (files, etc.)• TRACE: Echoes back the received request, so that a client can see what intermediate servers are adding or changing in the request. •OPTIONS Returns the HTTP methods that the server supports. Header Example

HEADERS

• One of the server header fields will be Content-type:, which specifies a MIME type to describe how the document should be interpreted.

• If the document has moved, the server can specify its new location with a Location: field, allowing the client to retry the request using the new URL. • The Authorization: and WWW-Authenticate: fields allow

access controls to be placed on Web documents.

• The Referrer: field allows the client to tell the server the URL of the document that triggered this request, permitting servers to trace clients through requests.

SERVER RESPONSE

• Here is a sample of HTTP 1.1 exchange headers:

• GET / HTTP/1.1 > > < HTTP/1.1 200 OK < Date: Wed, 18 Sep 2005 20:18:59 GMT < Server: Apache/1.0.0 < Content-type: text/html < Content-length: 1579 < Last-modified: Mon, 22 Jul 2005 22:23:34 GMT < < HTML document

More HTTP Headers

http://www.delorie.com/web/headers.html

STATELESS

• HTTP is stateless; i.e., it cannot store data. To over come this, programmers use a variety of methods to create state, or memory of acquired data.

• Client side files such as cookies store recent information that may be transferred back to the server.

• Server side databases attached to the server can also store data that needs to be collected at a future time.

• Another method, is to pass parameters through the page itself, often in the URL or headers.

ROBOTS

• A robot is a type of HTTP client. Search engines look for a file in the root of a domain for a file named robots.txt. The robots file tells these search engines which files it may download.

• Normal web browsers (Explorer/Firefox) are not robots because they are operated by a human and do not automatically retrieve referenced documents

(other than inline images)

• Robots are often referred to as Web Wanderers, Web Crawlers, or Spiders.

ROBOTS

• Robots Exclusion Protocol is a method that allows Web site administrators to indicate to visiting robots which

parts of their site should not be visited by the robot.

• There are two different ways to tell the visiting robot; through HTML Meta tags, and through a file called robots.txt in the root of the web server.

• The HTML tag can be used by web programmers who may not have root access to the server in order to disallow link harvesting or page indexing and can be done on a page by page basis by including a tag like either of the following:

ROBOTS (continued)

<META NAME="ROBOTS" CONTENT="NOINDEX">

• This tag is a request for the robot to not index the page,

<META NAME="ROBOTS" CONTENT="NOFOLLOW">

• This tag is a request to not allow links to be harvested.

ROBOTS.txt

• If you can obtain administrator rights, by writing a structured text file you can indicate to robots that certain parts of your server are off-limits to some or all robots. It is best explained with an example:

User-agent: webcrawlerDisallow:User-agent: *Disallow: /tmpDisallow: /logs

• The first record specifies that the robot 'webcrawler' has nothing disallowed

• The second indicates that all others should not visit URLs starting with /tmp or /log.

•Note the '*' is a special token, meaning "any other User-agent"; you cannot use wildcard patterns or regular expressions in either User-agent or Disallow lines.

SOURCES

• www.w3.org • www.luc.edu • www.dns.net• en.wikipedia.org• www.robotstxt.org• www.webopedia.com• www.searchengineworld.com

http://www.dns.net/

http hypertext transfer protocol zach kokkeler, scott hansen, mustafa ashurex

Documents

http server

http messages

feature of http

browsers http

common users of http

tcp slide

applicationlevel protocol

osi application layer