sigir 2012 - explicit relevance models in intent-oriented information retrieval diversification

28
IR G IR Group @ UAM Explicit Relevance Models in Intent-Aware IR Diversification 35 th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) Portland, OR, 13 August 2012 http://ir.ii.uam.es Explicit Relevance Models in Intent-Aware IR Diversification 35 th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012) SaΓΊl Vargas, Pablo Castells and David Vallet Universidad AutΓ³noma de Madrid http://ir.ii.uam.es Portland, OR, 13 August 2012

Upload: pablo-castells

Post on 27-May-2015

920 views

Category:

Technology


2 download

DESCRIPTION

The intent-oriented search diversification methods developed in the field so far tend to build on generative views of the retrieval system to be diversified. Core algorithm components –in particular redundancy assessment– are expressed in terms of the probability to observe documents, rather than the probability that the documents be relevant. This has been sometimes described as a view considering the selection of a single document in the underlying task model. In this paper we propose an alternative formulation of aspect-based diversification algorithms which explicitly includes a formal relevance model. We develop means for the effective computation of the new formulation, and we test the resulting algorithm empirically. We report experiments on search and recommendation tasks showing competitive or better performance than the original diversification algorithms. The relevance-based formulation has further interesting properties, such as unifying two well-known state of the art algorithms into a single version. The relevance-based approach opens alternative possibilities for further formal connections and developments as natural extensions of the framework. We illustrate this by modeling tolerance to redundancy as an explicit configurable parameter, which can be set to better suit the characteristics of the IR task, or the evaluation metrics, as we illustrate empirically.

TRANSCRIPT

Page 1: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

http://ir.ii.uam.es

Explicit Relevance Models in Intent-Aware IR Diversification

35th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

SaΓΊl Vargas, Pablo Castells and David Vallet Universidad AutΓ³noma de Madrid

http://ir.ii.uam.es

Portland, OR, 13 August 2012

Page 2: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Outline

Context: IR diversification formulation and algorithms

Proposed approach: relevance-based reformulation

of diversification algorithms

Experiments

Adjustable tolerance to redundancy

Conclusion

Page 3: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Brief recap

Appliance

Golf

Chemical element

Nutrition / Health

Mining / Metallurgy

Page 4: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Brief recap

Appliance

Golf

Chemical element

Nutrition / Health

Mining / Metallurgy

Diversity as a means to address uncertainty in user queries

– The same query may have different intents or aspects in the information need underneath

Revision of document relevance independence

– Marginal utility of additional relevant documents decreases fast

Trade diminishing marginal utility for increased intent coverage

– Thus maximize the number of users who obtain at least some useful document

Page 5: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversification – Problem statement

Given a query π‘ž on a collection

Find 𝑆 βŠ‚ of given size maximizing:

𝑝 some 𝑑 ∈ 𝑆 relevant π‘ž

Agrawal 2009, Santos 2010, Chen 2006, …

𝝋 𝒅, 𝑺 𝒒 ∝ 𝑝 𝑑 is relevant ∧ no 𝑑′ ∈ 𝑆 is relevant π‘ž

Greedy approx

NP-hard

arg maxπ‘‘βˆˆπ‘…βˆ’π‘†

𝝋 𝒅, 𝑺 𝒒

𝑆 Diversified ranking

𝑅 βˆ’ 𝑆 Baseline ranking 𝑝(𝑑|π‘ž)

Page 6: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Instantiations of objective function

IA-Select scheme (Agrawal 2009)

πœ‘ 𝑑, 𝑆 π‘ž = 𝑝 𝒛 π‘ž 𝑝 𝒛 𝑑 𝑝 𝑑 π‘ž 1βˆ’ 𝑝 𝒛 𝑑′ 𝑝 𝑑 π‘ž

π‘‘β€²βˆˆπ‘†π‘§

xQuAD scheme (Santos 2010)

πœ‘ 𝑑, 𝑆 π‘ž = 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝑑,Β¬ 𝑆 π‘ž

= 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝒛 π‘ž 𝑝 𝑑 π‘ž, 𝒛 1βˆ’ 𝑝 𝑑′ π‘ž, 𝒛

π‘‘β€²βˆˆπ‘†π‘§

Explicit query aspects

Explicit query aspects

State of the art aspect-based approaches

Page 7: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Instantiations of objective function

IA-Select scheme (Agrawal 2009)

πœ‘ 𝑑, 𝑆 π‘ž = 𝑝 𝑧 π‘ž 𝑝 𝑧 𝑑 𝑝 𝑑 π‘ž 1βˆ’ 𝑝 𝑧 𝑑′ 𝑝 𝑑 π‘ž

π‘‘β€²βˆˆπ‘†π‘§

xQuAD scheme (Santos 2010)

πœ‘ 𝑑, 𝑆 π‘ž = 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝑑,Β¬ 𝑆 π‘ž

= 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝑧 π‘ž 𝑝 𝑑 π‘ž, 𝑧 1βˆ’ 𝑝 𝑑′ π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

Query aspect coverage

State of the art aspect-based approaches

Page 8: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Instantiations of objective function

IA-Select scheme (Agrawal 2009)

πœ‘ 𝑑, 𝑆 π‘ž = 𝑝 𝑧 π‘ž 𝑝 𝑧 𝑑 𝑝 𝑑 π‘ž 1βˆ’ 𝑝 𝑧 𝑑′ 𝑝 𝑑 π‘ž

π‘‘β€²βˆˆπ‘†π‘§

xQuAD scheme (Santos 2010)

πœ‘ 𝑑, 𝑆 π‘ž = 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝑑,Β¬ 𝑆 π‘ž

= 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝑧 π‘ž 𝑝 𝑑 π‘ž, 𝑧 1βˆ’ 𝑝 𝑑′ π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

Document β€œrelevance” for query aspect

State of the art aspect-based approaches

Page 9: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Instantiations of objective function

Redundancy penalization

State of the art aspect-based approaches

IA-Select scheme (Agrawal 2009)

πœ‘ 𝑑, 𝑆 π‘ž = 𝑝 𝑧 π‘ž 𝑝 𝑧 𝑑 𝑝 𝑑 π‘ž 1βˆ’ 𝑝 𝑧 𝑑′ 𝑝 𝑑 π‘ž

π‘‘β€²βˆˆπ‘†π‘§

xQuAD scheme (Santos 2010)

πœ‘ 𝑑, 𝑆 π‘ž = 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝑑,Β¬ 𝑆 π‘ž

= 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝑧 π‘ž 𝑝 𝑑 π‘ž, 𝑧 1βˆ’ 𝑝 𝑑′ π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

Page 10: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Instantiations of objective function

IA-Select scheme (Agrawal 2009)

πœ‘ 𝑑, 𝑆 π‘ž = 𝑝 𝑧 π‘ž 𝑝 𝑧 𝑑 𝑝 𝑑 π‘ž 1βˆ’ 𝑝 𝑧 𝑑′ 𝑝 𝑑 π‘ž

π‘‘β€²βˆˆπ‘†π‘§

xQuAD scheme (Santos 2010)

πœ‘ 𝑑, 𝑆 π‘ž = 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝑑,Β¬ 𝑆 π‘ž

= 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝑧 π‘ž 𝑝 𝑑 π‘ž, 𝑧 1βˆ’ 𝑝 𝑑′ π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

Mixture with baseline

State of the art aspect-based approaches

πœ† Degree of diversification

Page 11: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Instantiations of objective function

IA-Select scheme (Agrawal 2009)

πœ‘ 𝑑, 𝑆 π‘ž = 𝑝 𝑧 π‘ž 𝑝 𝑧 𝑑 𝑝 𝑑 π‘ž 1βˆ’ 𝑝 𝑧 𝑑′ 𝑝 𝑑 π‘ž

π‘‘β€²βˆˆπ‘†π‘§

xQuAD scheme (Santos 2010)

πœ‘ 𝑑, 𝑆 π‘ž = 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝑑,Β¬ 𝑆 π‘ž

= 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝑧 π‘ž 𝑝 𝑑 π‘ž, 𝑧 1βˆ’ 𝑝 𝑑′ π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

Probability to observe documents

πœ‘ 𝑑, 𝑆 π‘ž ∝ 𝑝 𝑑 is 𝐫𝐞π₯𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑′ ∈ 𝑆 is 𝐫𝐞π₯𝐞𝐯𝐚𝐧𝐭 π‘ž

Page 12: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Relevance-based instantiation of objective function

IA-Select scheme – relevance-based

πœ‘ 𝑑, 𝑆 π‘ž = 𝑝 𝑧 π‘ž 𝑝 𝒓 𝑑, π‘ž, 𝑧 1βˆ’ 𝑝 𝒓 𝑑′, π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

xQuAD scheme – relevance-based

πœ‘ 𝑑, 𝑆 π‘ž = 1 βˆ’ πœ† 𝑝 𝒓𝑑 π‘ž + πœ† 𝑝 𝒓𝑑 , Β¬ 𝒓𝑆 π‘ž

= 1 βˆ’ πœ† 𝑝 𝒓 𝑑, π‘ž + πœ† 𝑝 𝑧 π‘ž 𝑝 𝒓 𝑑, π‘ž, 𝑧 1βˆ’ 𝑝 𝒓 𝑑′, π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

Probability of relevance

Our proposal

πœ‘ 𝑑, 𝑆 π‘ž ∝ 𝑝 𝑑 is 𝐫𝐞π₯𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑′ ∈ 𝑆 is 𝐫𝐞π₯𝐞𝐯𝐚𝐧𝐭 π‘ž

Page 13: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Relevance-based instantiation of objective function

IA-Select scheme – relevance-based

πœ‘ 𝑑, 𝑆 π‘ž = 𝑝 𝑧 π‘ž 𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧 1βˆ’ 𝑝 π‘Ÿ 𝑑′, π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

xQuAD scheme – relevance-based

πœ‘ 𝑑, 𝑆 π‘ž = 1 βˆ’ πœ† 𝑝 𝒓𝑑 π‘ž + πœ† 𝑝 𝒓𝑑 , Β¬ 𝒓𝑆 π‘ž

= 1 βˆ’ πœ† 𝑝 π‘Ÿ 𝑑, π‘ž + πœ† 𝑝 𝑧 π‘ž 𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧 1βˆ’ 𝑝 π‘Ÿ 𝑑′, π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

More literal interpretation of initial problem statement

πœ‘ 𝑑, 𝑆 π‘ž ∝ 𝑝 𝑑 is 𝐫𝐞π₯𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑′ ∈ 𝑆 is 𝐫𝐞π₯𝐞𝐯𝐚𝐧𝐭 π‘ž

Page 14: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Relevance-based instantiation of objective function

IA-Select scheme – relevance-based

πœ‘ 𝑑, 𝑆 π‘ž = 𝑝 𝑧 π‘ž 𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧 1βˆ’ 𝑝 π‘Ÿ 𝑑′, π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

xQuAD scheme – relevance-based

πœ‘ 𝑑, 𝑆 π‘ž = 1 βˆ’ πœ† 𝑝 π‘Ÿπ‘‘ π‘ž + πœ† 𝑝 π‘Ÿπ‘‘ , Β¬ π‘Ÿπ‘† π‘ž

= 1 βˆ’ πœ† 𝑝 π‘Ÿ 𝑑, π‘ž + πœ† 𝑝 𝑧 π‘ž 𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧 1βˆ’ 𝑝 π‘Ÿ 𝑑′, π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

Equivalent for πœ† = 1

πœ‘ 𝑑, 𝑆 π‘ž ∝ 𝑝 𝑑 is relevant ∧ no 𝑑′ ∈ 𝑆 is relevant π‘ž

Page 15: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Relevance distirbution vs. document distribution

𝑑 0

1

𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧𝑑

= E nr relevant docs β‰₯ 1

1 βˆ’ πœ† 𝑝 π‘Ÿ 𝑑, π‘ž + πœ† 𝑝 𝑧 π‘ž 𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧 1βˆ’ 𝑝 π‘Ÿ 𝑑′, π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

𝑝 𝑑 π‘ž, 𝑧𝑑

= 1

Different potential behavior E.g. stronger redundancy penalization

𝑝 π‘Ÿ 𝑑,Β· vs. 𝑝 𝑑 Β· – The difference does matter (in this context)

Potential rank equivalences do not apply here

Page 16: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Relevance-based greedy diversification

Relevance-based reformulation of diversification algorithm

1. Need to estimate 𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧

2. Does it work? Test empirically

3. Further development: parameterized tolerance to redundancy

Page 17: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Aspect-based relevance model

Estimate 𝒑 𝒓 𝒅, 𝒒, 𝒛

Cannot use odds, logs, constant removal… or any other rank-preserving step

(we need the specific values)

𝑝 π‘Ÿ 𝑑, π‘ž

𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧

𝑝 𝑧 𝑑

𝑝 𝑧 π‘ž

𝑝 𝑑 π‘ž

𝑝(𝑧)

Normalized baseline IR system score (as in e.g. Bache 2009)

Estimate 𝑝 𝑧 𝑑 or 𝑝 𝑧 π‘ž depending

on available observations:

β€’ 𝑧 as document classes (e.g. ODP)

β€’ 𝑧 as subqueries (e.g. reformulations)

Then derive the other two parameters

Positional relevance 𝑝 π‘Ÿ rank 𝑑, π‘ž

Page 18: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Positional relevance distribution estimate

𝒑 𝒓 𝒅, 𝒒 ∼ 𝑝 π‘Ÿ rank 𝑑, π‘ž = 𝒑 𝒓 π’Œ

1E-05

1E-04

1E-03

1E-02

1E-01

1E+00

0 20 40 60 80 100 120 140 160 180 200

p(r

|k)

k

pLSA

Lemur

AOL

Click log statistics

Precision estimates

𝑝 π‘Ÿ π‘˜

π‘˜

Page 19: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Relevance-based greedy diversification

Relevance-based reformulation of diversification algorithm

1. Need to estimate 𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧

2. Does it work? Test empirically

3. Further development: parameterized tolerance to redundancy

Page 20: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Experiments

Collection: ClueWeb09 category B (50M documents)

Query/subtopic set: TREC 2009/10 diversity task (100 queries)

Baseline ranking: Lemur Indri search engine (Web service) Diversified top n : 100

Query aspect space:

a) ODP categories level 4 (~7K categories)

b) TREC subtopics (oracle for reference)

Specific parameter estimates:

𝑝 𝑧 π‘ž Uniform

𝑝 𝑧 𝑑

𝑝 π‘Ÿ π‘˜

Search diversity

ODP categories: semi-supervised text classification by Textwise

TREC subtopics: Indri search system run on 𝑧 as if a query

i. P@k estimates with TREC relevance judgments (2-fold 2009/10 cross validation)

ii. Click statistics from AOL log (thus different IR system)

Page 21: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Experiments – Search diversity on TREC

ERR

-IA

Based on 𝑝 𝑑 π‘ž, 𝑧

Based on 𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧

ERR

-IA

Ξ»

ODP categories TREC subtopics

Ξ»

xQuAD scheme

𝑝 π‘Ÿ π‘˜ from qrels

Page 22: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Experiments – Search diversity on TREC

-nDCG@20 ERR-IA@20 nDCGIA@20 S-recall@20

Lemur - 0.2587 0.1630 0.2396 0.4636

a) O

DP

ca

tego

rie

s IA-Select - 0.2651 0.1681 0.2423 0.4483

xQuAD 0.9 0.2675 0.1656 0.2451 0.4864

Rel-based xQuAD

i. Qrels 0.1 0.2858β–³β–² 0.1828β–³β–² 0.2655β–³β–² 0.4898β–²β–³

ii. Clicks 0.4 0.2841β–²β–³ 0.1831β–³β–³ 0.2605β–³β–² 0.4830β–²β–½

b)

TR

EC

sub

top

ics IA-Select - 0.3541 0.2346 0.3213 0.5787

xQuAD 1.0 0.3445 0.2241 0.3127 0.5704

Rel-based xQuAD

i. Qrels 1.0 0.3543β–³β–³ 0.2349β–³β–³ 0.3192β–½β–³ 0.5782β–½β–³

ii. Clicks 1.0 0.3512β–½β–³ 0.2320β–½β–³ 0.3166β–½β–³ 0.5748β–½β–³

β€œinformally” maximizing ERR-IA by 0.1 steps for each diversifier

Best value in bold green

β–² β–Ό 𝑝 < 0.05

Page 23: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Experiments

Dataset 1: MovieLens 1M

Dataset 2: Last.fm crawl

Adaptation of IR diversity paradigm

(Vargas, Castells & Vallet SIGIR 2011)

Baseline rankings: Diversified top n: 100

Specific parameter estimates:

𝑝 𝑧 π‘ž Uniform

𝑝 𝑧 𝑑 Uniform on 𝑑 (based on binary aspect/item association)

𝑝 π‘Ÿ π‘˜ P@k estimates with 2-fold cross-validation on test users

Recommendation diversity

Queries users Documents items (movies, music artists) Subtopics item features (genres, tags) Relevance judgments test ratings from data split

Collection: 6K users, 4K movies, 1M ratings

Subtopic set: 10 movie genres

Collection: 1K users, 175K artists, 20M playcounts

Subtopic set: 120K social tags on artists by Last.fm users

a) pLSA

b) Popularity-based recommendation

Page 24: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Experiments – Recommendation diversity on MovieLens and Last.fm

Ξ»

MovieLens 1M

ERR

-IA

Last.fm

Ξ»

pLS

A r

eco

mm

en

der

R

eco

mm

end

atio

n

by

item

po

pu

lari

ty

ERR

-IA

Based on 𝑝 𝑑 π‘ž, 𝑧

Based on 𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧

Page 25: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Relevance-based greedy diversification

Relevance-based reformulation of diversification algorithm

1. Need to estimate 𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧

2. Does it work? Test empirically

3. Further development: parameterized tolerance to redundancy

Page 26: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Adjustable tolerance to redundancy

Generalization of relevance-based diversification scheme

Formally support adjustable redundancy penalization

Approach: generalize relevance to browsing model

πœ‘ 𝑑, 𝑆 π‘ž = 1 βˆ’ Ξ» 𝑝 π‘Ÿ 𝑑, π‘ž + Ξ» 𝑝 π‘Ÿπ‘‘ , Β¬ 𝒔𝒕𝒐𝒑𝑆 π‘ž = β‹―

= 1 βˆ’ Ξ» 𝑝 π‘Ÿ 𝑑, π‘ž + Ξ» 𝑝 𝑧 π‘ž 𝑝 π‘Ÿ 𝑑, 𝑧, π‘ž 1βˆ’ 𝑝 π‘Ÿ 𝑑′, 𝑧, π‘ž 𝒑 𝒔𝒕𝒐𝒑 𝒓

π‘‘β€²βˆˆπ‘†π‘

Adjustable redundancy tolerance parameter 𝑝 π‘ π‘‘π‘œπ‘ π‘Ÿ ∈ [0,1]

– High 𝑝 π‘ π‘‘π‘œπ‘ π‘Ÿ for aggresive penalization, low for e.g. high-recall searches

– In this view, original formulations would implicitly assume 𝑝 π‘ π‘‘π‘œπ‘ π‘Ÿ = 1,

i.e. a single relevant document is sought

Tolerance to redundancy

Page 27: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Empirical observation: 𝑝 π‘ π‘‘π‘œπ‘ π‘Ÿ vs. in -nDCG

Adjustable tolerance to redundancy

π‘π‘ π‘‘π‘œπ‘π‘Ÿ

π‘π‘ π‘‘π‘œπ‘π‘Ÿ

Search task Lemur on TREC / Subtopics

Recommendation task pLSA on MovieLens / Genres

0 0 1 1

1 1

best -nDCG value of column

worst -nDCG value of column For each

Page 28: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Conclusion

Alternative, relevance-based formulation of greedy aspect-based diversification

– Unifies two previous aspect-based algorithms

– More literal expression of formal problem statement (and metrics?)

𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧 vs. 𝑝 𝑑 π‘ž, 𝑧

– Literal value estimates needed (rather than rank-equivalent approximations)

– Estimate based on positional relevance (relevance or click data needed)

Seems to perform well empirically

– Light requirements on relevance or click data for training positional relevance

– Improvement trend, but needs to be tested under further optimizations

Formal support for redundancy tolerance adjustment