internet programming - vocational training councillwsua.vtc.edu.hk/~cim/cmc3524/note2004.pdf ·...

35
Internet Programming CMC3524 Dr. S.F. Wu (R323, x706, [email protected]) Continuous Assessment – 30% (Tests 10%, Assignment 10%, Lab 10%) Examination – 70% Reference book: Programming the World Web Web, R.W. Sebesta, 2 nd edition, Addison Wesley

Upload: others

Post on 29-May-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming CMC3524

Dr. S.F. Wu (R323, x706, [email protected]) Continuous Assessment – 30% (Tests 10%, Assignment 10%, Lab 10%) Examination – 70% Reference book: Programming the World Web Web, R.W. Sebesta, 2nd edition, Addison Wesley

Page 2: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 1

Background Origins • ARPAnet - late 1960s and early 1970s • Network reliability • For ARPA-funded research organizations

BITnet, CSnet - late 1970s & early 1980s email and file transfer for other institutions

NSFnet - 1986 • Originally for non-DOD funded places • Initially connected five supercomputer centers • By 1990, it had replaced ARPAnet for non-military uses • Soon became the network for all (by 1990)

Page 3: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 2

What is Internet? A world-wide network of computer networks

At the lowest level, since 1982, all connections use TCP/IP

TCP/IP hides the differences among devices connected to the Internet

Internet Protocol (IP) Addresses • Every node has a unique numeric address • Form: 32-bit binary number (E.g. 191.57.126.0) • New standard, IPv6, has 128 bits (1998) • Organizations are assigned groups of IPs for their computers

Domain names • Form: host-name.domain-names (e.g. www.microsoft.com) • First domain is the smallest; last is the largest • Last domain specifies the type of organization • Fully qualified domain name - the host name and all of the domain names • DNS servers - convert fully qualified domain names to IPs.

Page 4: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 3

World Wide Web By the mid-1980s, several different protocols had been invented and were being used on the Internet, all with different user interfaces (E.g. telnet, ftp, usenet). They all run on top of TCP/IP. The WWW was designed as a possible solution to the proliferation of different protocols being used on the Internet. Origins • Tim Berners-Lee at CERN proposed the Web in 1989 • Purpose: to allow scientists to have access to many databases of scientific work through

their own computers Document form: hypertext as HTML Browser and Web Server

http

Page 5: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 4

Web Programming HTML • Text file to describe the general form and layout of documents • An HTML document is a mix of content and controls • Controls have tags and their attributes • Tags often delimit content and specify something about how the content should be arranged

in the document • Attributes provide additional information about the content of a tag • Text editor (notepad) or HTML editor (Frontpage) • Cascading Style Sheet (CSS) JavaScript • A client-side HTML-embedded scripting language • Only related to Java through syntax • Dynamically typed and not object-oriented • Provides a way to access elements of HTML documents and dynamically change them Perl • Provides server-side computation for HTML documents, through CGI • Perl is good for CGI programming because:

Direct access to operating systems functions

Page 6: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 5

Powerful character string pattern-matching operations Access to database systems

• Perl is highly platform independent, and has been ported to all common platforms • Perl is not just for CGI PHP • A server-side scripting language • An alternative to CGI • Great for form processing and database access through the Web Database Connectivity • Dynamic contents • RDBMS, SQL • PHP-MySQL Socket Programming • The guts • An Introduction to Sockets • Java Connection-Oriented Classes • Java Datagram Classes • An HTTP Server Application

Page 7: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 6

Operation of the Web The Web software is designed around a distributed client-server architecture. A Web client(called a Web browser if it is intended for interactive use) is a program which can send requests for documents to any Web server. A Web server is a program that, upon receipt of a request, sends the document requested(or an error message if appropriate) back to the requesting client. Using a distributed architecture means that a client program may be running on a completely separate machine from that of the server, possibly in another room or even in another country. Because the task of document storage is left to the server and the task of document presentation is left to the client, each program can concentrate on those duties and progress independently of each other. Because servers usually operate only when documents are requested, they put a minimal amount of workload on the computers they run on.

http://members.iworld.net/joo/physics/overview/node7.html

Page 8: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 7

Operation of the Web

Running a Web client, the user selects a hyperlink in a piece of hypertext connecting to another document – “The history of computers”, for example. The Web client uses the address associated with that hyperlink to connect to the Web server at a specified network address and asks for the document associated with “The history of computers”. The server responds by sending the text and any other media within that text(pictures, sounds, or movies) to the client, which the client then renders for presentation on the user's screen.

The language that the Web clients and servers use to communicate with each other is called the Hypertext Transfer Protocol(HTTP). All Web clients and servers must be able to speak HTTP in order to send and receive hypermedia documents. A popular web server is Apache.

Page 9: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 8

Open Standard Interconnection (OSI)

In early 1980s, manufacturers began tostandardize networking so that networks from different manufacturers could communicate. • International Organization for Standardization (ISO) • Institute of Electrical and Electronics Engineers (IEEE) Open Systems Interconnect (OSI) • A networking model developed by ISO to identify and standardize all the levels of

communication needed in networking A Layered Model provides: • modular engineering • reduces complexity • accelerates evolution • standardizes interfaces • interoperable technology • simplifies learning

Page 10: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 9

OSI Model: Concept

Page 11: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 10

OSI Model: The Seven Layers

Page 12: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 11

OSI

Application Layer • Interfaces with application software such as web browsers or web servers. Presentation Layer • Receives requests for files from the Application layer and presents the requests to the

Session layer. • Reformats, compresses, or encrypts data as necessary. Sessions Layer • Establishes and maintains a session between two networked stations or hosts. Transport Layer • Responsible for error checking. • Requests a resend when the data is corrupted. • Guarantees successful delivery of data. Network Layer • Divides a block of data into segments (data packets or datagrams) that are small enough to

travel over a network. • Responsible for routing (finding the best possible route by which to send the data • packets over a group of networks)

Page 13: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 12

• Reassembles the packets once they reach their destination Data Link Layer • Disassembles packets of data into smaller packets as needed to transport over the network. • Reassembles the data at the other end. Physical Layer • Passes data packets onto the cabling media.

http://www.infocom.cqu.edu.au/Courses/win2001/COIT13147/Animations/1_ISO_7_layer_reference_model/ http://www.infocom.cqu.edu.au/Courses/win2001/COIT13147/Animations/2_Using_the_ISO_7_layer_reference_model/

Page 14: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 13

Network Device and OSI

False. IPX belongs to network layer. Error recovery belongs to transport layer.

Page 15: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 14

OSI and TCP/IP The focus in the TCP/IP world is on agreeing on a protocol standard which can be made to work in diverse heterogeneous networks. The focus in the OSI world has always been more on the standard than the implementation of the standard.

Page 16: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 15

TCP, UDP and IP

TCP connection oriented UDP connectionless How about IP?

Page 17: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 16

TCP TCP (Transmission Control Protocol) is a connection-based protocol that provides a reliable flow of data between two computers. When two applications want to communicate to each other reliably, they establish a connection and send data back and forth over that connection. This is analogous to making a telephone call. If you want to speak to Aunt Beatrice in Kentucky, a connection is established when you dial her phone number and she answers. You send data back and forth over the connection by speaking to one another over the phone lines. Like the phone company, TCP guarantees that data sent from one end of the connection actually gets to the other end and in the same order it was sent. Otherwise, an error is reported. TCP provides a point-to-point channel for applications that require reliable communications. The Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), and Telnet are all examples of applications that require a reliable communication channel. The order in which the data is sent and received over the network is critical to the success of these applications. When HTTP is used to read from a URL, the data must be received in the order in which it was sent. Otherwise, you end up with a jumbled HTML file, a corrupt zip file, or some other invalid information.

Page 18: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 17

UDP UDP (User Datagram Protocol) is a protocol that sends independent packets of data from one computer to another with no guarantees about arrival. UDP is not connection-based like TCP. The UDP protocol provides for communication that is not guaranteed between two applications on the network. UDP is not connection-based like TCP. Sending datagrams is much like sending a letter through the postal service: The order of delivery is not important and is not guaranteed, and each message is independent of any other. For many applications, the guarantee of reliability is critical to the success of the transfer of information from one end of the connection to the other. However, other forms of communication don't require such strict standards. In fact, they may be slowed down by the extra overhead or the reliable connection may invalidate the service altogether. Many firewalls and routers have been configured not to allow UDP packets. If you're having trouble connecting to a service outside your firewall, or if clients are having trouble connecting to your service, ask your system administrator if UDP is permitted.

Page 19: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 18

Ports Generally speaking, a computer has a single physical connection to the network. All data destined for a particular computer arrives through that connection. However, the data may be intended for different applications running on the computer. So how does the computer know to which application to forward the data? Through the use of ports. Data transmitted over the Internet is accompanied by addressing information that identifies the computer and the port for which it is destined. The computer is identified by its 32-bit IP address, which IP uses to deliver data to the right computer on the network. Ports are identified by a 16-bit number, which TCP and UDP use to deliver the data to the right application. In connection-based communication such as TCP, a server application binds a socket to a specific port number. This has the effect of registering the server with the system to receive all data destined for that port. A client can then meet with the server at the server's port, as illustrated here:

Page 20: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 19

Ports In connectionless communication such as UDP, the datagram packet contains the port number of its destination and UDP routes the packet to the appropriate application, as illustrated in this figure:

Port numbers range from 0 to 65,535 because ports are represented by 16-bit numbers. The port numbers ranging from 0 - 1023 are restricted; they are reserved for use by well-known services such as HTTP (80) and FTP (20) and other system services. These ports are called well-known ports. Your applications should not attempt to bind to them.

Page 21: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 20

TCP and UDP Datagram

TCP datagram UDP datagram What can you conclude from these two datagrams?

Page 22: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 21

IP Datagram

The datagram fields are clarified below: VERS - is the IP version number (currently binary 0100 (4), but can now also be version 6). All nodes must use the same version. HLEN - header length in 32-bit words, so if the number is 6, then 6 x 32 bit words are in the header i.e. 24 bytes which is the maximum size. The minimum size is 20 bytes or 5 x 32-bit words. Type of Service - is how the datagram should be used, e.g. delay, precedence, reliability, minimum cost, throughput etc. This TOS field is now used by Differentiated Services and is called the Diff Serv Code Point (DSCP). Total Length - is the number of octets that the IP datagram takes up including the header. The maximum size that an IP datagram can be is 65,535 octets. Identification - The Identification is a unique number assigned to a datagram fragment to help in the reassembly of fragmented datagrams.

Page 23: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 22

IP Datagram Flags - Bit 0 is always 0 and is reserved. Bit 1 indicates whether a datagram can be fragmented (0) or not (1). Bit 2 indicates to the receiving unit whether the fragment is the last one in the datagram (1) or if there are still more fragments to come (0). Frag Offset - in units of 8 octets (64 bits) this specifies a value for each data fragment in the reassembly process. Different sized Maximum Transmission Units (MTUs) can be used throughout the Internet. TTL - the time that the datagram is allowed to exist on the network. A router that processes the packet decrements this by one. Once the value reaches 0, the packet is discarded. Protocol - Layer 4 protocol sending the datagram, UDP uses the number 17, TCP uses 6, ICMP uses 1, IGRP uses 88 and OSPF uses 89. Header Checksum - error control for the header only. IP Options - this field is for testing, debugging and security. Padding - there is padding added sometimes just to make sure that the datagram is confined within a 32 bit boundary in multiples of 32 bits.

Page 24: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 23

IP Address To be able to identify a host on the internet, each host is assigned an address, the IP address, or Internet Address. When the host is attached to more than one network, it is called multi-homed and it has one IP address for each network interface. The IP address consists of a pair of numbers: IP address = <network number><host number> IP addresses are 32-bit numbers usually represented in a dotted decimal form (as the decimal representation of four 8-bit values concatenated with dots). For example 128.2.7.9 is an IP address with 128.2 being the network number and 7.9 being the host number. The rules used to divide an IP address into its network and host parts are explained below. The binary format of the IP address 128.2.7.9 is: 10000000 00000010 00000111 00001001 IP addresses are used by the IP protocol (see Internet Protocol (IP)) to uniquely identify a host on the internet. IP datagrams (the basic data packets exchanged between hosts) are transmitted by some physical network attached to the host and each IP datagram contains a source IP address and a destination IP address.

Page 25: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 24

Classes of Address The first few bits of the IP address specify how the rest of the address should be separated into its network and host part.

Class A addresses use 7 bits for the network number giving 126 possible networks (we shall see below that out of every group of network and host numbers, two have a special meaning). The remaining 24 bits are used for the host number, so each networks can have up to 2(superscript 24)-2 2 to the power 24 minus 2 (16,777,214) hosts. Class B addresses use 14 bits for the network number, and 16 bits for the host number giving 16382 networks each with a maximum of 65534 hosts. Class C addresses use 21 bits for the network number and 8 for the host number giving 2,097,150 networks each with up to 254 hosts. Class D addresses are reserved for multicasting, which is used to address groups of hosts in a limited area. Class E addresses are reserved for future use.

Page 26: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 25

IP Routing On a LAN, every host sees every packet that is sent by every other host on that LAN. Normally, it will only do something with that packet if it is addressed to itself, or if the destination is a broadcast address. A router is different. A router examines every packet, and compares the destination address with a table of addresses that it holds in memory. If it finds an exact match, it forwards the packet to an address associated with that entry in the table. This associated address may be the address of another network in a point- to- point link, or it may be the address of the next-hop router. If the router doesn’t find a match, it runs through the table again, this time looking for a match on just the network ID part of the address. Again, if a match is found, the packet is sent on to the address associated with that entry. If a match still isn’t found, the router looks to see if a default next- hop address is present. If so, the packet is sent there. If no default address is present, the router sends an host unreachable? or network unreachable? message back to the sender. If you see this message, it usually indicates a router failure at some point in the network. The difficult part of a router’s job is not how it routes packets, but how it builds up its table. In the simplest case, the router table is static: it is read in from a file at start- up. This is adequate for simple networks.

Page 27: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 26

Traceroute The traceroute utility can be used to trace the route from a source to a destination. It works by sending three UDP datagrams to an invalid port address at the remote host. Using the default settings, three datagrams are sent, each with a Time-To-Live (TTL) field value set to one. The TTL value of 1 causes the datagram to time out as soon as it hits the first router in the path. The router then responds with an message indicating that the time has expired. Another three UDP datagrams are sent with TTL set to 2, causing the second router to respond. This process continues until the packets eventually reach the destination. $ ./traceroute www.altavista.com traceroute to www.altavista.com (209.73.164.91), 30 hops max, 38 byte packets 1 192.168.20.249 (192.168.20.249) 0.353 ms 0.294 ms 0.237 ms 2 fw17.vtc.edu.hk (192.168.16.176) 0.770 ms 0.881 ms 0.740 ms 3 cw7204.vtc.edu.hk (202.40.210.220) 1.331 ms 1.682 ms 1.694 ms 4 203.192.153.65 (203.192.153.65) 2.151 ms 2.420 ms 2.582 ms 5 ge0-1-0-998-1000M.ar1.HKG1.gblx.net (203.192.134.157) 2.143 ms 1.645 ms 1.668 ms 6 pos1-1-622M.cr1.HKG1.gblx.net (203.192.134.121) 2.126 ms 1.716 ms 1.529 ms 7 pos0-0-2488M.cr2.LAX1.gblx.net (67.17.68.170) 137.982 ms 137.494 ms 137.559 ms 8 so0-0-0-2488M.br1.LAX1.gblx.net (67.17.68.141) 137.499 ms 138.248 ms 146.431 ms 9 pos1-1.core2.LosAngeles1.Level3.net (209.0.227.61) 137.599 ms 137.483 ms137.659 ms 10 so-4-1-0.bbr1.LosAngeles1.level3.net (209.247.10.197) 138.606 ms 138.111 ms 139.159 ms 11 so-1-0-0.mp2.SanJose1.level3.net (209.247.9.182) 145.910 ms 146.847 ms 146.898 ms 12 gige10-2.ipcolo3.SanJose1.Level3.net (64.159.2.169) 146.500 ms 146.571 ms 146.741 ms 13 unknown.Level3.net (64.152.64.6) 147.135 ms 151.027 ms 146.581 ms 14 10.28.2.9 (10.28.2.9) 146.413 ms 146.493 ms 146.381 ms 15 altavista.com (209.73.164.91) 146.850 ms 146.803 ms 147.090 ms

Page 28: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 27

HTTP HTTP stands for Hypertext Transfer Protocol. It's the network protocol used to deliver virtually all files and other data (collectively called resources) on the World Wide Web, whether they're HTML files, image files, query results, or anything else. Usually, HTTP takes place through TCP/IP sockets. A browser is an HTTP client because it sends requests to an HTTP server (Web server), which then sends responses back to the client. The standard (and default) port for HTTP servers to listen on is 80, though they can use any port. HTTP is the network protocol of the Web. It is both simple and powerful. Knowing HTTP enables you to write Web browsers, Web servers, automatic page downloaders, link-checkers, and other useful tools. HTTP is used to transmit resources, not just files. A resource is some chunk of information that can be identified by a URL (it's the R in URL). The most common kind of resource is a file, but a resource may also be a dynamically-generated query result, the output of a CGI script, a document that is available in several languages, or something else.

Page 29: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 28

HTTP Transactions An HTTP client opens a connection and sends a request message to an HTTP server; the server then returns a response message, usually containing the resource that was requested. After delivering the response, the server closes the connection (making HTTP a stateless protocol, i.e. not maintaining any connection information between transactions). The format of the request and response messages are similar, and English-oriented. Both kinds of messages consist of:

an initial line, zero or more header lines, a blank line (i.e. a CRLF by itself), and an optional message body (e.g. a file, or query data, or query output).

Initial lines and headers should end in CRLF, though you should gracefully handle lines ending in just LF. (More exactly, CR and LF here mean ASCII values 13 and 10, even though some platforms may use different characters.)

Page 30: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 29

Initial Request Line The initial line is different for the request than for the response. A request line has three parts, separated by spaces: a method name, the local path of the requested resource, and the version of HTTP being used. A typical request line is: GET /path/to/file/index.html HTTP/1.0

GET is the most common HTTP method; it says "give me this resource". Other methods include POST and HEAD. Method names are always uppercase. The path is the part of the URL after the host name. The HTTP version always takes the form "HTTP/x.x", uppercase.

Page 31: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 30

The POST Method To send data to the server to be processed in some way, like by a CGI script. A POST request is different from a GET request in the following ways:

There's a block of data sent with the request, in the message body. There are usually extra headers to describe this message body, like Content-Type: and Content-Length:. The request URL is not a resource to retrieve; it's usually a program to handle the data you're sending. The HTTP response is normally program output, not a static file.

The most common use of POST, is to submit HTML form data to CGI scripts. In this case, the Content-Type: header is usually application/x-www-form-urlencoded, and the Content-Length: header gives the length of the URL-encoded form data. The CGI script receives the message body through STDIN, and decodes it. Here's a typical form submission, using POST: POST /path/script.cgi HTTP/1.0 From: [email protected] User-Agent: HTTPTool/1.0 Content-Type: application/x-www-form-urlencoded Content-Length: 32 home=Cosby&favorite+flavor=flies You can use a POST request to send whatever data you want, not just form submissions. Just make sure the sender and the receiving program agree on the format.

Page 32: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 31

The Header Lines Header lines provide information about the request or response, or about the object sent in the message body. HTTP 1.0 defines 16 headers, though none are required. HTTP 1.1 defines 46 headers, and one (Host:) is required in requests. For Net-politeness, consider including these headers in your requests:

The From: header gives the email address of whoever's making the request, or running the program doing so. (This must be user-configurable, for privacy concerns.)

The User-Agent: header identifies the program that's making the request, in the form "Program-name/x.xx", where x.xx is the (mostly) alphanumeric version of the program. For example, Netscape 3.0 sends the header "User-agent: Mozilla/3.0Gold".

These headers help webmasters troubleshoot problems. They also reveal information about the user. When you decide which headers to include, you must balance the webmasters' logging needs against your users' needs for privacy.

Page 33: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 32

The Message Body An HTTP message may have a body of data sent after the header lines. In a response, this is where the requested resource is returned to the client (the most common use of the message body), or perhaps explanatory text if there's an error. In a request, this is where user-entered data or uploaded files are sent to the server. If an HTTP message includes a body, there are usually header lines in the message that describe the body. In particular:

The Content-Type: header gives the MIME-type of the data in the body, such as text/html or image/gif. The Content-Length: header gives the number of bytes in the body.

Page 34: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 33

Initial Response Line (Status Line) The initial response line, called the status line, also has three parts separated by spaces: the HTTP version, a response status code that gives the result of the request, and an English reason phrase describing the status code. Typical status lines are: HTTP/1.0 200 OK or HTTP/1.0 404 Not Found

The most common status codes are: 200 OK The request succeeded, and the resulting resource (e.g. file or script output) is returned in the message body. 404 Not Found The requested resource doesn't exist. 500 Server Error An unexpected server error. The most common cause is a server-side script that has bad syntax, fails, or otherwise can't run correctly.

Page 35: Internet Programming - Vocational Training Councillwsua.vtc.edu.hk/~cim/CMC3524/note2004.pdf · Internet Programming Slide 3 World Wide Web By the mid-1980s, several different protocols

Internet Programming Slide 34

Sample HTTP Exchange GET /path/file.html HTTP/1.0 From: [email protected] User-Agent: HTTPTool/1.0 [blank line here]

The server should respond with something like the following, HTTP/1.0 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/html Content-Length: 1354 <html> <body> <h1>Happy New Millennium!</h1> (more file contents) . . . </body> </html>

After sending the response, the server closes the connection.