generating intelligent links to web pages by mining access patterns of individuals and the community...
TRANSCRIPT
![Page 1: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e755503460f94b760e6/html5/thumbnails/1.jpg)
Generating Intelligent Links to Web Pages by Mining Access Patterns of
Individuals and the Community
Benjamin Lambert
Omid Fatemieh
CS598CXZ
Spring 2005
![Page 2: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e755503460f94b760e6/html5/thumbnails/2.jpg)
Outline
• Motivation
• The main idea of the project
• Accomplished tasks
• Remaining tasks
• Discussion
![Page 3: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e755503460f94b760e6/html5/thumbnails/3.jpg)
The Problem
• The problem we would like to solve is:– How can we best assist a person browsing
the Web by providing links to the pages that they are looking for.
• There are many reasons we might want to do this (e.g. pages hidden in a large Web site, broken links, seminar announcements, etc.)
![Page 4: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e755503460f94b760e6/html5/thumbnails/4.jpg)
Previous Work
• This problem has been studied a lot and people have used many approaches.
• The two main ways of solving this are:– Modeling user behavior (Markov models, HMMs,
etc.)– Data mining for common browsing patterns
• Despite all the work that has been done, many other techniques have not been tried.
![Page 5: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e755503460f94b760e6/html5/thumbnails/5.jpg)
Markov Model Approaches
• These primarily model a user enough to suggest which link on the page they are looking at they should click.
• This is not useful unless there are many links on a page (e.g.www.perl.com)
![Page 6: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e755503460f94b760e6/html5/thumbnails/6.jpg)
Data-Mining Approaches
• These are better able to find pages that are several links away from the current page.– Suppose we see a sequence of requests
for pages A, B, C, D, E occurring frequently, we may consider adding a shortcut from A to E.
![Page 7: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e755503460f94b760e6/html5/thumbnails/7.jpg)
New Ideas for Solving This Problem
• Using recent activity to make recommendations.
• Using the contents of Web pages to make recommendations.
• Combining data mining and user modeling approaches.
• Using a machine learning approach
![Page 8: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e755503460f94b760e6/html5/thumbnails/8.jpg)
Data
• Data: Web server logs– CS department Web logs from Dec 6, 2004, to Feb 28,
2005 (thanks to Chuck Thompson)– NASA Kennedy Space Center collected over July and
August 1995 (available freely online)
• The logs are long lists of Web page requests, each request is represented by:– The requester’s IP address– The time and date requested– The page requested– Etc.
![Page 9: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e755503460f94b760e6/html5/thumbnails/9.jpg)
Data Cleaning
• First, for privacy reasons, data had to be “sanitized” and the actual IP addresses were removed before we can have access to it.
• Requests for .gif, .jpg, .css, etc. files should be discarded.– Only looking at the extension of the requested file in not
enough e.g. "GET /research/areas.php?area=proglang HTTP/1.1“ has no extension.
• Requests from crawlers. (robots.txt)• Unsuccessful GETs.(code 200 only, not 404)• Refreshes (consecutive requests for the same page)
![Page 10: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e755503460f94b760e6/html5/thumbnails/10.jpg)
Recommendations by a First Order Markov Model
• We wrote Perl scripts to parse and store the clean data
• We implemented a recommending model using simple first order Markov Models– This provides the user with links to the most
frequently clicked links on the current page
![Page 11: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e755503460f94b760e6/html5/thumbnails/11.jpg)
Results for First Order Markov Model
• Evaluation was performed on the existing logs• If the next click in a browsing session is the
recommended page, it is a hit, otherwise it is a miss.• Hit ratio for when only one page is recommended:
– CS logs: • Number of testing records: approx. 500,000• Hit ratio: 18.7%
– NASA logs:• Number of testing records: approx. 2 million for one month• Hit ratio: 30%
• Other researchers have performed evaluation similarly. In some cases, a hit is considered to be when any recommended page is browsed to.
![Page 12: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e755503460f94b760e6/html5/thumbnails/12.jpg)
Using Recent Activity
• Suppose there is an important event somewhere in the Siebel Center at 4pm. – Many people might go to
http://www.cs.uiuc.edu to find the location between 3:45 and 4:05!
– It would be good to automatically discover this and generate the link for users
![Page 13: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e755503460f94b760e6/html5/thumbnails/13.jpg)
Dynamic Markov Model
• To model such recent browsing activity, we need a more sophisticated model that more heavily weights recent browsing activity.
• To do this, we implemented an “online” recommending model using “dynamic first order Markov Models”
• We set a threshold t– Only the requests within the past t minutes affect
the model
![Page 14: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e755503460f94b760e6/html5/thumbnails/14.jpg)
Dynamic Markov Model Results
• This is too simplistic to work.• Most successful recommendation are
for major browsing patterns that do not change over time:– /info/prospective.php -> /graduate/admissions.php
• Accuracy decreases as t decreases• We would need to recognize that the
user is looking for ephemeral pages.
![Page 15: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e755503460f94b760e6/html5/thumbnails/15.jpg)
Using the Web Page Contents (To Do)
• Can we use the content of the previously browsed pages to recommend some links to the user?– E.g., if the last 10 pages the user has
browsed contain the word IR, recommend Prof. Zhai’s web page.
• Perhaps we can use a machine learning algorithms to cast this as a multi-class classification problem.
![Page 16: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e755503460f94b760e6/html5/thumbnails/16.jpg)
Hybrid approaches (To Do)
• How to combine user-modeling with pattern mining?
• How to best combine individual user patterns (personalizations) with collective patterns (recommender systems)?
![Page 17: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e755503460f94b760e6/html5/thumbnails/17.jpg)
Other Things To Do
• Incorporate pattern mining
• Experimentally evaluate new models and combinations
• Actual Implementation (CGI scripts and cookies)
• Higher order Markov Models
![Page 18: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e755503460f94b760e6/html5/thumbnails/18.jpg)
Other Paradigms for Making Recommendations (Future
Work)• Recommendations as:
– An AI planning problem?– An optimization problem?– Others?
![Page 19: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e755503460f94b760e6/html5/thumbnails/19.jpg)
Discussion
• Ideas about the model?
• Other paradigms to consider?
• How can we incorporate content?
• Suggestions?
![Page 20: Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005](https://reader035.vdocuments.site/reader035/viewer/2022072014/56649e755503460f94b760e6/html5/thumbnails/20.jpg)
Thank You.