how the www operates - some history and terminology mark levene (follow the links to learn more!)

21
How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)

Upload: samuel-curtis

Post on 28-Mar-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)

How the WWW operates - some history and terminology

Mark Levene

(Follow the links to learn more!)

Page 2: How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)

Bush 1945 – As We May Think

The memex is a desktop

machine, consisting of:1) A user interface.

2) A repository of documents.

3) A search engine.

4) A linking mechanism.

5) Memex II can learn from its experience.

Page 3: How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)

Quote from As We May Think

“The human mind … operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain. … trails that are not frequently followed are prone to fade …. Yet the speed of action, the intricacy of trails, … is awe-inspiring beyond all else in nature.”

“A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.”

“There is a new profession of trail blazers, those who find delight in the task of establishing useful trails through the enormous mass of the common record.”

Page 4: How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)

Nelson’s Hypertext

• A universal hypertext.• Xanadu is a

distributed network of documents (1960’s).

• User interface - transpointing windows.

• Elaborate copyright mechansim.

• Superceded by WWW

Page 5: How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)

Engelbart’s oN Line System (1968)

First working hypertext system, where documents were liked together.

Page 6: How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)

Tim Berners-Lee’s WWW

• Cern 1990 - First Browser

• Web protocols– URL– HTTP– HTML

• World Wide Web Consortium (W3C) founded in 1994

Page 7: How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)

Mosaic – The Web Browser that changed history

• Released late 1993 – developed by Marc Andreessen

• Netscape triggered the boom of WWW throughout the 90’s

• Browser wars with Microsoft – IE won (2003 stats: 95.6% - IE, 3.7% NS)

Page 8: How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)

Difference between the internet and the web

• Internet – physical computer network infrastructure on which the web is built.

• The World Wide Web (web) is a virtual network defined through the web protocols.

• The internet supports other protocols such as email, ftp and instant messaging.

Page 9: How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)

Map of the Internet from 1998

Page 10: How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)

Graph of Web pages related to www.dcs.bbk.ac.uk

Page 11: How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)

IP Addresses • Internet Protocol (IP) address – each

machine connected to the Internet is identified by a unique 32 bit number.

• My IP address is: 193.61.29.152 (ipconfig.exe from command prompt)

• IP addresses may be dynamic.

• IP addresses have corresponding Domain Name Server (DNS) addresses.

• My DNS address is: dhcp34.dcs.bbk.ac.uk

Page 12: How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)

URLs – Uniform Resource Locators• Address of an internet resource

• E.g. http://www.dcs.bbk.ac.uk/~mark– http is the protocol (others: ftp, mailto, file)– www.dcs.bbk.ac.uk is the domain name– ~mark is the path to the resource

• Query string follows a ? to run a script (dynamic URL) e.g.– http://www.google.com/search?q=url

Page 13: How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)

HTTP – HyperText Transfer Protocol

• Protocol of messages exchanged by a user agent (client) and a web server.

• Most common request is GET:– GET URL (agent’s request)– HTTP/1.1 200 OK (server’s response)– Response header (includes display type)– Blank line– Response data follows

Page 14: How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)

HTML – HyperText Markup Language

• I am assuming you all have some knowledge of HTML !

• The combination of the three components: URL, HTTP and HTML, defines the basic functionality of the web.

Page 15: How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)

Server Log Files

• IP or DNS address of agent making request

• Timestamp, status, transfer volume

• Referrer URL (where the request was made from)

• Requested URL (from the HTTP request)

• User Agent (browser, OS)

• Other information such as authentication.

Page 16: How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)

Cookies

• A cookie is a piece of text that a web site can store on the user's machine when the user is browsing the site.

• This information can be retrieved later by the web site, for example in order to identify a user returning to the site.

• Can be used for statistics, personalisation.

• Some security and privacy issues.

Page 17: How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)

Tracking Users with CookiesAcross multiple sites

BrowserBanner

Ad Web site

HTTP requestfor web page

Send web pageincludes ad links

HTTP request for ad with cookie

Send ad andupdate cookie

Page 18: How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)

W3C Extended Logging DefinitionsField Date Description

Date date The date that the activity occurredTime time The time that the activity occurredClient IP address c-ip The IP address of the client that accessed your server

User Name cs-usernameThe name of the autheticated user who access your server, anonymous users are represented by -

Servis Name s-sitename The Internet service and instance number that was accessed by a clientServer Name s-computername The name of the server on which the log entry was generatedServer IP Address s-ip The IP address of the server that accessed your serverServer Port s-port The port number the client is connected toMethod cs-method The action the client was trying to performURI Stem cs-uri-stem The resource accessedURI Query cs-uri-query The query, if any, the client was trying to performProtocol Status sc-status The status of the action, in HTTP or FTP termsWin32 Status sc-win32-status The status of the action, in terms used by Microsoft WindowsBytes Sent sc-bytes The number of bytes sent by the serverBytes Received cs-bytes The number of bytes received by the serverTime Taken time-taken The duration of time, in milliseconds, that the action consumedProtocol Version cs-version The protocol (HTTP, FTP) version used by the clientHost cs-host Display the content of the host header

User Agent cs(User Agent) The browser used on the clientCookie cs(Cookie) The content of the cookie sent or received, if any

Referrer cs(Referrer)The previous site visited by the user. This site provided a link to the current site

cs = client-to-server actions

s = server actionsc = client actions

sc = server-to-client actions

Page 19: How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)

date

2003-01-07 08:58:12 193.133.103.63 DCSNT\gtuff01 193.61.29.180 80 GET /support/ - 302 Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+NT+5.0)2003-01-07 08:58:19 193.133.103.63 - 193.61.29.180 80 GET /intranet/cs/ - 401 Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+NT+5.0) 

c-iptime cs-username s-ip s-port

cs-method

cs-uri-stem

cs-uri-query

sc-status cs (User-Agent)

2003-01-07 08:58:12 193.133.103.63 DCSNT\gtuff01 193.61.29.180 80 GET /support/ - 302 Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+NT+5.0)2003-01-07 08:58:19 193.133.103.63 - 193.61.29.180 80 GET /intranet/cs/ - 401 Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+NT+5.0) 

Example of extended log entries format

Page 20: How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)

2003-02-01 00:01:44 80.192.25.125 - GET /library/HM.js - 200 www.i-resign.com Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1) i%2Dresign%2Dlogin=UID=3649008;+interstitial=not;+ASPSESSIONIDGQQGQYAO=OAMCCDGBODIOFHLAFHFAGKHD -

2003-02-01 00:02:19 62.255.0.5 - GET /uk/discussion/new_topic.asp t=331 200 www.i-resign.com Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1) - http://www.google.co.uk/search?q=i+hate+my+job&ie=UTF-8&oe=UTF-8&hl=en&meta=cr%3DcountryUK%7CcountryGB

c-iptime

cs-username

cs-status

cs-uri-stem

cs-uri-query

sc-method cs (User-Agent)

cs (Cookie)

2003-02-01

2003-02-01

00:01:44

00:02:19

80.192.25.125

62.255.0.5

-

-

GET

GET

/library/HM.js

/uk/discussion/new_topic.asp

t=331

- 200

200

www.i-resign.com

www.i-resign.com

Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1)

Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1)

i%2Dresign%2Dlogin=UID=3649008;+interstitial=not;+ASPSESSIONIDGQQGQYAO=OAMCCDGBODIOFHLAFHFAGKHD http://www.google.co.uk/search?q=i+hate+my+job&ie=UTF-8&oe=

TF-8&hl=en&meta=cr%3DcountryUK%7CcountryGB

-

cs (Referrer)

date

-

cs-host

Another example of extended log entries format

Page 21: How the WWW operates - some history and terminology Mark Levene (Follow the links to learn more!)

• Yahoo! (www.yahoo.com) - (1994-) directory service and search engine.

• Infoseek – (1994-2001) search engine.• Inktomi – (1995-) search engine infrastructure, acquired by

Yahoo! 2003.• AltaVista – (1995-) search engine, acquired by Overture in

2003.• AlltheWeb – (1999-) search engine, acquired by Overture

in 2003 .• Ask Jeeves (www.ask.com) - (1996-) Q&A and search

engine, acquired by IAC/InterActiveCorp in 2005.• Overture – (1997-) pay-per-click search engine, acquired

by Yahoo! 2003.• Bing (www.bing.com) – (2009-) Microsoft rebarded search

engine, was Live in 2006 and MSN search before.• Google (www.google.com) – (1998-) – search engine.

Brief History of Search Engines