Introduction to HTTP
The HyperText Transport Protocol is an ‘application-layer’ protocol for the ‘client/server’ paradigm
Built on TCP/IP
• Application programmers will need to be aware that HTTP relies on TCP’s reliable, stream-oriented and connection-based transport-layer facilities when specifying the socket types, functions, and options
socket()
bind()
listen()
accept()
read()
write()
close()
socket()
connect()
write()
read()
close()
server client
Sample Request line
“GET /home/web/cs336/syllabus.s09 HTTP/1.0\r\n”
command (one word - all capitals)
space
resource pathname (UNIX filename syntax)
space
protocol and version-number
carriage-return and line-feed
Sample Request header-lines
• The header-lines must be followed by an ‘empty’ line (carriage-return and line-feed)
“Connection: close\r\n”“User-agent: Mozilla 4.0\r\n”“Accept-language: en\r\n”
carriage-return and line-feed
Sample Response line
“HTTP/1.0 200 OK\r\n”
space space
protocol and version-number
carriage-return and line-feed
response phrase
status code
Sample Response header-lines
• The header-lines must be followed by an ‘empty’ line (carriage-return and line-feed)
“Connection: close\r\n”“Date: Tue, 15 March 2009\r\n”“Server: Apache/1.3 *Unix)\r\n”“Content-Type: text/html\r\n”
carriage-return and line-feed
Demo: ‘grabfile.cpp’
• We shall construct a simple HTTP client which will allow a user to obtain a named internet object by typing its URL (Uniform Resource Locator) on the command-line:
$ grabfile http://www.cs.usfca.edu/index.html
The URL concept
• URL means ‘Uniform Resource Locator’
• It’s a standard way of specifying any kind of information available on the Internet
• Four elements of a URL specification: – Method (i.e., the protocol for object retrieval)– Host (i.e., location hostname or IP-address)– Port (i.e., port-number for contacting server)– Path (i.e., pathname of the resource’s file)
The URL Format
method :// host port: path/
EXAMPLE: http://cs.usfca.edu:80/~cruse/cs336/syllabus.pdf
Note: The port-number is often omitted in cases where the ‘method’ is an internet protocol (like HPPT) which uses a ‘well-known port’
Application’s organizationParse the URL entered on the command-line
to determine the server’s hostname and port-number and the pathname to the desired file-obsect
Open a stream-oriented TCP internet socket and establish a connection with the server
Form the HTTP Request message and write it to the socket
Read from the socket to receive the HTTP Response message (and echo it to the display)
Close the socket to terminate the TCP connection
Parsing the URL
• The most challenging part of this program concerns the parsing of the command-line argument, allowing for some ‘degenerate’ cases and some malformed specifications
• Several standard string-functions from the UNIX runtime-library are put to good use, including ‘strlen()’, ‘strncpy()’, ‘strtok()’ and ‘strtok_r()’, plus ‘strspn()’ and ‘strcspn()’
‘strlen()’
• This function calculates the length of the null-terminated string whose address is supplied as the function-argument
size_t strlen( const char *s );
#include <string.h> char message[ ] = “Hello”; int main( void ) {
int len = strlen( message );printf( “\’%s\’ has %d characters\n”, len );
}
OUTPUT: ‘Hello’ has 5 characters
‘strncpy()’
• This function copies at most n characters from the ‘src’ string into the ‘dst’ string, so provides a ‘safe’ way to copy from a string that might be too long to fit the destination
char *strncpy( char *dst,const char *src, size_t n );
int main( int argc, char *argv[] ) { char param[ 64 ];
if ( argc == 1 ) { fprintf( stderr, “ param? \n” ); exit(1); } strncpy( param, argv[ 1 ], 63 ); // source string has unknown length …
}
‘strtok()’
• This function extracts tokens from a string, but after being called once, it remembers where it stopped in case the caller wants to extract more tokens from that string
char *strtok( char *s, const char *delim );
char sentence[ ] = “Hello, world!\n”; char *word1 = strtok( sentence, “ ,!\n” ); char *word2 = strtok( NULL, “ ,!\n” ); char *word3 = strtok( NULL, “ ,!\n” );
printf( “ \’%s\’ \’%s\’ \’%s\’ \n”, word1, word2, word3 );
OUTPUT: ‘Hello’ ‘world’ ‘<nul>’
‘strtok_r()’
• This function is a ‘reentrant’ version of the ‘strtok()’ function, placing the address of the character where a subsequent search for another token to extract would begin
char *strtok_r( char *s, const char *delim, char **saveptr );
char sentence[ ] = “Hello, world!\n”;char *word1, *word2, *word3;
word1 = strtok( sentence, “ ,!\n”, word2 ); strtok( word2, “ ,!\n”, word3 ); printf( “ \’%s\’ \’%s\’ \’%s\’ \n”, word1, word2, word3 );
OUTPUT: ‘Hello’ ‘world’ ‘<nul>’
‘strspn()’
• This function searches a string for a set of characters, and returns the length of the initial segment which consists entirely of characters that are in the ‘accept’ string
size_t strspn( const char *s, const char *accept );
char vowels[ ] = “aeiou”; char word[ ] = “eating”; int len = strspn( word, vowels ); printf( “\’%s\’ has %d vowels before any consonant \n”, word, vowels );
OUTPUT: ‘eating’ has 2 vowels before any consonant
‘strcspn()’
• This function searches a string for a set of characters, and returns the length of the initial segment which consists entirely of characters that are not in the ‘reject’ string
size_t strcspn( const char *s, const char *reject );
char vowels[ ] = “aeiou”; char word[ ] = “shout”; int len = strcspn( word, vowels ); printf( “\’%s\’ has %d consonants before any vowel \n”, word, vowels );
OUTPUT: ‘shout’ has 2 consonants before any vowel
Examples
• Here are a few examples of ‘malformed’ and ‘degenerate’ URL parameter-strings
http://:54321/index.html # no server hostname
http://yahoo.com:/index.html # missing port
http://usfca.edu:::54321/index.html # excess ‘:’s
www.sfmuni.com/index.html # no ‘method’
http://www.bart.gov/ # no pathname
www.sfsu.edu:80:57/index.html # extra chars