topic-sensitive pagerank taher h. haveliwala. pagerank importance is propagated a global ranking...

44
Topic-Sensitive PageRank Taher H. Haveliwala

Post on 22-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Topic-Sensitive PageRank

Taher H. Haveliwala

Page 2: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

PageRank

Importance is propagatedA global ranking vector is pre-computed

Page 3: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

PageRank

Page 4: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Topic-Sensitive PageRank

Basic idea For each topic, the importance scores for each page

are computed Composite score of a page are calculated by

combining the scores of the page based on the topics of the query

Page 5: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Topic-Sensitive PageRank

ODP-Biasing The top level categories of the Open Directory (16 topics)

is used Let Tj be the set of URLs in the ODP categories cj

In computing the PageRank vector for topic cj, we replace the uniform damping vector by the non-uniform vector where

It will be referred as

Page 6: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Topic-Sensitive PageRank

We chose to make P(cj) uniform

Page 7: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Topic-Sensitive PageRank

Page 8: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Experiment

Page 9: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Experimental Results

Similarity Measure for Induced Rankings overlap of two sets A and B

= . k = 20

Kendall’s distance measure

Page 10: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Experimental Results

Page 11: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Experimental Results

Page 12: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Experimental Results

Page 13: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Experimental Results

Page 14: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Experimental Results

Page 15: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Experimental Results

Query-Sensitive Scoring User Study

10 queries (randomly selected from our test set) 5 volunteers For each query, the volunteer was shown 2 result

rankings:• 1. top 10 results ranked with the unbiased PageRank

vector• 2. top 10 results ranked with the topic-sensitive

PageRank vector

Page 16: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Experimental Results

User Study( con’t) The volunteer was asked to

• 1. select all URLs which were “relevant” to the query• 2. select the ranking list which is better

(They were not told anything about how either of the rankings was generated.)

Page 17: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Experimental Results

Page 18: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Experimental Results

Page 19: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Experimental Results

Context-Sensitive Scoring

Page 20: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Experimental Results

Page 21: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Other issues

Search Context hierarchical directory users’ browsing patterns Bookmarks email archives

Page 22: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Other issues

Flexibility Apply to any kinds of context

Transparency tune the classifier used on the search context, or adjust

topic weights

Privacy a client-side program could use the user context to

generate the user profile locally

Efficiency query-time cost and the offline preprocessing cost is low

Page 23: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Automatic Identification of User Interest For Personalized Search

Feng Qiu Junghoo Cho

Page 24: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

User Preference Representation

Topic Preference Vector T = [T(1),…,T(m)] T(i) represents the user’s degree of interest in the ith

topic

Page 25: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

User Preference Representation

Page 26: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

User Model

Topic-Driven Random Surfer Model• The user browses the web in a two-step process.• First, the user chooses a topic of interest t for the

ensuing sequence of random walks with probability T(t)• Then with equal probability, she jumps to one of the

pages on topic t• Starting from this page, the user then performs a random

walk, such that at each step, with probability d, she randomly follows an out-link on the current page; with the remaining probability 1-d she gets bored and picks a new topic of interest for the next sequence of random walks based on T and jumps to a page on the chosen topic.

• This process is repeated forever.

Page 27: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

User Model

Topic-Driven Searcher Model• The user always visits web pages through a search

engine in a two-step process.• First, the user chooses a topic of interest t with

probability T(t).• Then the user goes to the search engine and issues a

query on the chosen topic t. • The search engine then returns pages ranked by

TSPRt(p), on which the user clicks.

Page 28: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

User Model

Relationship between V and T Under Topic-Driven Random Surfer Model

Under Topic-Driven Searcher Model

Page 29: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Learning Topic Preference Vector

Problem

Given V and TSPRi, find T satisfies

Page 30: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Learning Topic Preference Vector

Linear regression Minimize the square-root error

Maximum likelihood estimator **

= the probability that the user visits the page p

Page 31: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Ranking Search Results Using Topic Preference Vectors

Ranking of page p =

because

Page 32: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Evaluation Metrics

Accuracy of topic preference vector

Te is our estimation based on the user’s click history T is the user’s actual topic preference vector

Page 33: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Evaluation Metrics

Accuracy of personalized ranking Kendall distance between and is the sorted list of top-k pages based on the

estimated personalized ranking scores is the sorted list of top-k pages computed the user

‘s true preference vector

Page 34: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Evaluation Metrics

Improvement in search quality Average rank of relevant pages in the search

result

S denotes the set of the pages the user u selected

R(p) is the ranking of the page p

Page 35: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Experiments

User Study 10 subjects in the UCLA Computer Science

Department 04/2004 – 10/2004 (6 months) Queries to Google, results and clicked URLs

average number of queries per subject = 255.6 average number of clicks per query = 0.91

Page 36: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Experiments

Accuracy of Learning Method synthetic dataset generated by simulation based on

our topic-driven searcher model Generation of topic preference vector

• Randomly choose K topics and assign random weight for them. The weight of others are set to zero. The vector is then normalized

Generation of click history• Use the generated topic preference vector to generate the

clicks by the visit probability distribution dictated by the topic-driven searcher model

Page 37: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Experiments

Accuracy of estimated topic preference vector

Page 38: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Experiments

Accuracy of estimated topic preference vector

Page 39: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Experiments

Accuracy of Personalized PageRank

Page 40: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Experiments

Accuracy of Personalized PageRank

Page 41: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Experiments

Quality of Personalized Search

Page 42: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Experiments

Quality of Personalized Search

Page 43: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Conclusion

Proposed a framework to investigate the problem of personalizing web searching by the user search history and TSPR

Conducted both theoretical and real life experiments to evaluate the approach

Page 44: Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed

Thank you