experiments towards reverse linking on the web

Experiments Toward Reverse Linking on the Web

Yeliz Yesilada, Darren Lunn and Simon Harper

Information Management Group

University of Manchester

Links and Browsing

• Links Allow Movement in Information Space

• Etymology of Browsing– To nibble at leaves, tender shoots, or other soft

vegetation

• A User Is In Control of What to Read or Examine

Current Web Model

• Closed Hypermedia System

• Links Embedded Within the Document By The Author

• Outbound Uni-Directional Links

• Limits the Users Browsing Experience

A B C

Bi-Directional Linking

• Used in Open Hypermedia Systems

• Users Can Travel in Both Directions

• Links Stored in a Separate Link Base

• Links Generated Dynamically

?

A B?

Existing Bi-Directional Web Linking

• Back Button– Uses the Browser Cache– User Only Knows About Pages Previously Visited

• Surfing The Web Backwards (Chakrabati ‘99)– Netscape Browser Extension– Web Server Extension

• Trackback– An Acknowledgement Between Sites that a Link

Exists– Both Sites Need to Be Trackback Enabled

Our Approach

• Use Web Logs To Establish Who Links To Our Website

• Reduced Spam Threat as Users Must Click on a Link

• Links Available to Any JavaScript Supporting Browser

Architecture

Web Page +

Browser

Client-Side

WebServer

Server-Side

1. User Clicks A Link To Request a Web Page

1

Architecture

Web Page +

Browser

Client-Side

WebServer

Log File

Server-Side

1. Server Records Request

2

Architecture

Web Page +

Browser

Client-Side

WebServer

Log File Log Processor Pages.xml

Server-Side

1. Log Processor Parses Log To Create Linkbase

3

Architecture

Web Page +

Browser

Client-Side

WebServer


Pages.html

Server-Side

1. Link Base is Added To Page

4

Architecture

Web Page +

Browser

Client-Side

WebServer


Pages.html

Server-Side

1. Web page Plus Reverse Links Sent To User

5

User Follows Link (1)

Server Creates Web Log (2)

• Web Server Logs HTTP Requests– Page Requested– Destination Client of the Requested Page

• Also Logs Additional Information– The Page Where the User Clicked the Link to

Request Page– Client Platform

• W3C Extended Log File Format

Example Web Log

01: 130.88.199.206 02: - 03: - 04: [08/Aug/2007:18:30:39 +0000] 05: "GET /ht07/index.php HTTP/1.1" 06: 200 07: 3811 08: "http://markbernstein.org/ 09: "Mozilla/5.0 (Windows NT 5.1; en-GB;) Gecko/20061204 Firefox/2.0.0.1"

Linkbase Creation (3)

• Parse the Log File for Referrer / Get Request Pairs

• Create Simple XML File

• Each Webpage has a Corresponding XML Linkbase– index.php index.xml

• Individual XML Linkbases Allow– Reduced Processing on the Server– Reduced Delay on the Client

Example Linkbase (index.xml)

<linkbase> <link> <title>Home page of Mark Bernstein</title> <url>http://markbernstein.org/</url> </link> <link> <title>HCI Conference and Workshops</title> <url>http://degraaff.org/hci/conference.html</url> </link> <link> <title>D-Lib Workshops and Conferences: 2007</title> <url>http://dlib.org/groups.html</url> </link> . . . </linkbase>

Links Added To The Page (4)

• Add JavaScript To Each Webpage

• Widely Supported By Most Browser Software

• When Page is Loaded, Look For Corresponding Linkbase

• Extracts Links From Linkbase

• Add Links to Page

Displaying Links - Menu (5)

• As Part of the Menu

• Immediately Available For Use

• Menu Size Increases Significantly

Displaying Links - Menu (5)

Displaying Links - Breadcrumb (5)

• Breadcrumbs Act As Navigation Aids

• They Inform Users Where They Are Within a Website

• Reverse Links Recommend Common Paths To Get To The Current Page

• Add A “Recommender” Extension To The Breadcrumb Trail

Displaying Links - Breadcrumb (5)

Evaluation

• Technical Evaluation– In the Lab– Live on the Hypertext Website

• No User Evaluation– Previous Work has Show Reverse Linking Can

Enhance Web Browsing [Chakrabati ‘99]

Issues To Address

• How Often Should The Log File be Parsed?– Too Frequent - May slow down the server speed– Too Infrequent - Links may be out of date– Monthly - Anecdotally this seemed to work OK

• How Do We Manage The Link Box Size?– We only added links that occurred more than once– Could use time to keep only the most recently

followed links

Issues To Address

• Can Fine Grained Linking Be Achieved?– We link to the page– Is it possible to link to fragments eg Blogs?

• How Do We Ensure Link Quality?– Some referrers were password protected– Some pages had been relocated eg Blogs– Some pages might be spam

Conclusions

• Reverse Linking Is Possible Using Server Logs

• Our Technique is Platform Independent

• Enhance Users Browsing Experience

• This Is A First Step - More Investigation Is Required

Questions

http://hcw.cs.manchester.ac.uk/

experiments towards reverse linking on the web

Technology