predicting short-term interests using activity-based search context

32
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh

Upload: wren

Post on 20-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Predicting Short-Term Interests Using Activity-Based Search Context. CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh. Outline. Introduction Modeling Search Activity Study Conclusions. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Predicting Short-Term Interests Using Activity-Based Search Context

Predicting Short-Term Interests Using Activity-Based

Search ContextCIKM’10Advisor: Jia Ling, KohSpeaker: Yu Cheng, Hsieh

Page 2: Predicting Short-Term Interests Using Activity-Based Search Context

Outline

• Introduction

• Modeling Search Activity

• Study

• Conclusions

Page 3: Predicting Short-Term Interests Using Activity-Based Search Context

Introduction

• Satisfying searchers’ information needs involves a through understanding of their interests through:

- search query

- search engine result page (SERP) clicks

- post-SERP browsing behavior

• Construct interest models of the current query which including: - previous queries

- previous clicks on SERP

• Evaluate the predictive effectiveness of these models using future actions

Page 4: Predicting Short-Term Interests Using Activity-Based Search Context

Modeling Search Activity

• Data - The data set contained browser logs with both

searching and browsing episodes.

- Log entries include a timestamp for each page

view, and the URL of the Web page visited

- Only in English-speaking United States locale

- Search sessions on the Bing Web search engine were

extracted

Page 5: Predicting Short-Term Interests Using Activity-Based Search Context

Modeling Search Activity

• ODP Labeling - Represented context a distribution across categories in ODP

topical hierarchy.

- Provides a consistent topical representation of queries and page

visits from which to build the models.

- ODP category label can also reflect topical differences in the

search results for a query or a user’s interests

- Automatic classification skill to assign an ODP category labels to

each page.

- 219 categories at the top two levels of the ODP hierarchy were

used ( called L )

-

Page 6: Predicting Short-Term Interests Using Activity-Based Search Context

Modeling Search Activity

• ODP Labeling - Strategy of labeling a page

1. Begin with URLs present in the ODP

2. Incrementally prunes non-present URLs until a match is found,

or miss declared

3. Check for exact match with logistic regression classifier

Page 7: Predicting Short-Term Interests Using Activity-Based Search Context

Modeling Search Activity

• Sources and Source Combinations - ODP labels automatically assigned to the following

sources: 1. Query: the top 10 search results for the query

2. SERPClick: the search results clicked by the user during the search

session

3. NavTrai: Web pages that the user visits from a SERP click

Page 8: Predicting Short-Term Interests Using Activity-Based Search Context

Modeling Search Activity

• Model Definitions – Query Model(Q) - For each query, the category labels for the top 10

search results were obtained.

- Probabilities are assigned to the categories in L by

1. normalized click frequencies for each top 10 results

from search-engine click log data

2. the distribution across all ODP category labels

- ODP categories in L that are not used to label are

assigned the prior probabilities

Page 9: Predicting Short-Term Interests Using Activity-Based Search Context

Modeling Search Activity

• Model Definitions – Context Model(X) - The context model is constructed based on actions

which comprise previous data as follows:

1. Queries

2. Web pages visited through a SERP click

3. Web pages visited on the navigational trail

following a SERP click

Page 10: Predicting Short-Term Interests Using Activity-Based Search Context

Modeling Search Activity

Page 11: Predicting Short-Term Interests Using Activity-Based Search Context

Modeling Search Activity

• Model Definition – Intent Model(I)

Page 12: Predicting Short-Term Interests Using Activity-Based Search Context

Modeling Search Activity

• Relevance Model or Ground Truth (R) - The relevance model contains actions that occur

following the current query in the session

Page 13: Predicting Short-Term Interests Using Activity-Based Search Context

Modeling Search Activity

Page 14: Predicting Short-Term Interests Using Activity-Based Search Context

Study

Page 15: Predicting Short-Term Interests Using Activity-Based Search Context

Study

Page 16: Predicting Short-Term Interests Using Activity-Based Search Context

Study

Page 17: Predicting Short-Term Interests Using Activity-Based Search Context

Study

Page 18: Predicting Short-Term Interests Using Activity-Based Search Context

Study

• Learning Optimal Context Weights

Steps 1. Identify the optimal context weight (w) for each query

on a held out training set

2. Create features for the query and the context that could

be useful in predicting w

Page 19: Predicting Short-Term Interests Using Activity-Based Search Context

Study

• Learning Optimal Context Weights

- To create a training set, the query, context, and

relevance models were used to compute the

optimal context weight per query by minimizing

the regularized cross-entropy for each query

independently.

Page 20: Predicting Short-Term Interests Using Activity-Based Search Context

Study

A regularizer that penalizes deviations

from w=0.5

Page 21: Predicting Short-Term Interests Using Activity-Based Search Context

Study

• Generating Features of Query and Context

- Divide features into three classes: 1. Query class: capturing characteristics of the current query and the query

model.

2. Context class: capturing aspects of the pre-query interaction behavior as

well as features of the context model themselves.

3. QueryContext: capturing aspects of how the query model and context

model compare.

- These features were generated for each session in the

set and used to train a predictive model

Page 22: Predicting Short-Term Interests Using Activity-Based Search Context

Study

• Generating Features of Query and Context

- Query class

Page 23: Predicting Short-Term Interests Using Activity-Based Search Context

Study

• Generating Features of Query and Context

- Context class

Page 24: Predicting Short-Term Interests Using Activity-Based Search Context

Study

• Generating Features of Query and Context

- QueryContext class

Page 25: Predicting Short-Term Interests Using Activity-Based Search Context

study

Page 26: Predicting Short-Term Interests Using Activity-Based Search Context

study

• Predicting the Optimal Context Weight - 60% of those queries for training, 20% for validation, 20%

for testing

- 10-fold cross validation was performed to improve result

reliability.

- The folds were constructed by splitting session, so that

all queries in a session are used for either training,

validation, or testing

Page 27: Predicting Short-Term Interests Using Activity-Based Search Context

study

Page 28: Predicting Short-Term Interests Using Activity-Based Search Context

study

• Predicting the Optimal Context WeightThe most performant features related to the information divergence to

the query models and the context model

Page 29: Predicting Short-Term Interests Using Activity-Based Search Context

study

• Predicting the Optimal Context Weight

Page 30: Predicting Short-Term Interests Using Activity-Based Search Context

study

Page 31: Predicting Short-Term Interests Using Activity-Based Search Context

study

• Varying Context and Relevance Information

Page 32: Predicting Short-Term Interests Using Activity-Based Search Context

Conclusions

• A study of investigating the effectiveness of activity-based context in predicting user’s search interests.

• Explored the value of modeling the current query, its context and their combination, and different sources.

• Intent models developed from many sources perform best overall.

• Developed techniques to learn the optimal combinations.