generating intelligent links to web pages by mining access patterns of individuals and the community...

20
Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005

Upload: esmond-kelley

Post on 29-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005

Generating Intelligent Links to Web Pages by Mining Access Patterns of

Individuals and the Community

Benjamin Lambert

Omid Fatemieh

CS598CXZ

Spring 2005

Page 2: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005

Outline

• Motivation

• The main idea of the project

• Accomplished tasks

• Remaining tasks

• Discussion

Page 3: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005

The Problem

• The problem we would like to solve is:– How can we best assist a person browsing

the Web by providing links to the pages that they are looking for.

• There are many reasons we might want to do this (e.g. pages hidden in a large Web site, broken links, seminar announcements, etc.)

Page 4: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005

Previous Work

• This problem has been studied a lot and people have used many approaches.

• The two main ways of solving this are:– Modeling user behavior (Markov models, HMMs,

etc.)– Data mining for common browsing patterns

• Despite all the work that has been done, many other techniques have not been tried.

Page 5: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005

Markov Model Approaches

• These primarily model a user enough to suggest which link on the page they are looking at they should click.

• This is not useful unless there are many links on a page (e.g.www.perl.com)

Page 6: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005

Data-Mining Approaches

• These are better able to find pages that are several links away from the current page.– Suppose we see a sequence of requests

for pages A, B, C, D, E occurring frequently, we may consider adding a shortcut from A to E.

Page 7: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005

New Ideas for Solving This Problem

• Using recent activity to make recommendations.

• Using the contents of Web pages to make recommendations.

• Combining data mining and user modeling approaches.

• Using a machine learning approach

Page 8: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005

Data

• Data: Web server logs– CS department Web logs from Dec 6, 2004, to Feb 28,

2005 (thanks to Chuck Thompson)– NASA Kennedy Space Center collected over July and

August 1995 (available freely online)

• The logs are long lists of Web page requests, each request is represented by:– The requester’s IP address– The time and date requested– The page requested– Etc.

Page 9: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005

Data Cleaning

• First, for privacy reasons, data had to be “sanitized” and the actual IP addresses were removed before we can have access to it.

• Requests for .gif, .jpg, .css, etc. files should be discarded.– Only looking at the extension of the requested file in not

enough e.g. "GET /research/areas.php?area=proglang HTTP/1.1“ has no extension.

• Requests from crawlers. (robots.txt)• Unsuccessful GETs.(code 200 only, not 404)• Refreshes (consecutive requests for the same page)

Page 10: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005

Recommendations by a First Order Markov Model

• We wrote Perl scripts to parse and store the clean data

• We implemented a recommending model using simple first order Markov Models– This provides the user with links to the most

frequently clicked links on the current page

Page 11: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005

Results for First Order Markov Model

• Evaluation was performed on the existing logs• If the next click in a browsing session is the

recommended page, it is a hit, otherwise it is a miss.• Hit ratio for when only one page is recommended:

– CS logs: • Number of testing records: approx. 500,000• Hit ratio: 18.7%

– NASA logs:• Number of testing records: approx. 2 million for one month• Hit ratio: 30%

• Other researchers have performed evaluation similarly. In some cases, a hit is considered to be when any recommended page is browsed to.

Page 12: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005

Using Recent Activity

• Suppose there is an important event somewhere in the Siebel Center at 4pm. – Many people might go to

http://www.cs.uiuc.edu to find the location between 3:45 and 4:05!

– It would be good to automatically discover this and generate the link for users

Page 13: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005

Dynamic Markov Model

• To model such recent browsing activity, we need a more sophisticated model that more heavily weights recent browsing activity.

• To do this, we implemented an “online” recommending model using “dynamic first order Markov Models”

• We set a threshold t– Only the requests within the past t minutes affect

the model

Page 14: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005

Dynamic Markov Model Results

• This is too simplistic to work.• Most successful recommendation are

for major browsing patterns that do not change over time:– /info/prospective.php -> /graduate/admissions.php

• Accuracy decreases as t decreases• We would need to recognize that the

user is looking for ephemeral pages.

Page 15: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005

Using the Web Page Contents (To Do)

• Can we use the content of the previously browsed pages to recommend some links to the user?– E.g., if the last 10 pages the user has

browsed contain the word IR, recommend Prof. Zhai’s web page.

• Perhaps we can use a machine learning algorithms to cast this as a multi-class classification problem.

Page 16: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005

Hybrid approaches (To Do)

• How to combine user-modeling with pattern mining?

• How to best combine individual user patterns (personalizations) with collective patterns (recommender systems)?

Page 17: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005

Other Things To Do

• Incorporate pattern mining

• Experimentally evaluate new models and combinations

• Actual Implementation (CGI scripts and cookies)

• Higher order Markov Models

Page 18: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005

Other Paradigms for Making Recommendations (Future

Work)• Recommendations as:

– An AI planning problem?– An optimization problem?– Others?

Page 19: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005

Discussion

• Ideas about the model? 

• Other paradigms to consider?

• How can we incorporate content?

• Suggestions?

Page 20: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005

Thank You.