sigir 2012 - explicit relevance models in intent-oriented information retrieval diversification
DESCRIPTION
The intent-oriented search diversification methods developed in the field so far tend to build on generative views of the retrieval system to be diversified. Core algorithm components βin particular redundancy assessmentβ are expressed in terms of the probability to observe documents, rather than the probability that the documents be relevant. This has been sometimes described as a view considering the selection of a single document in the underlying task model. In this paper we propose an alternative formulation of aspect-based diversification algorithms which explicitly includes a formal relevance model. We develop means for the effective computation of the new formulation, and we test the resulting algorithm empirically. We report experiments on search and recommendation tasks showing competitive or better performance than the original diversification algorithms. The relevance-based formulation has further interesting properties, such as unifying two well-known state of the art algorithms into a single version. The relevance-based approach opens alternative possibilities for further formal connections and developments as natural extensions of the framework. We illustrate this by modeling tolerance to redundancy as an explicit configurable parameter, which can be set to better suit the characteristics of the IR task, or the evaluation metrics, as we illustrate empirically.TRANSCRIPT
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
http://ir.ii.uam.es
Explicit Relevance Models in Intent-Aware IR Diversification
35th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
SaΓΊl Vargas, Pablo Castells and David Vallet Universidad AutΓ³noma de Madrid
http://ir.ii.uam.es
Portland, OR, 13 August 2012
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Outline
Context: IR diversification formulation and algorithms
Proposed approach: relevance-based reformulation
of diversification algorithms
Experiments
Adjustable tolerance to redundancy
Conclusion
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity β Brief recap
Appliance
Golf
Chemical element
Nutrition / Health
Mining / Metallurgy
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity β Brief recap
Appliance
Golf
Chemical element
Nutrition / Health
Mining / Metallurgy
Diversity as a means to address uncertainty in user queries
β The same query may have different intents or aspects in the information need underneath
Revision of document relevance independence
β Marginal utility of additional relevant documents decreases fast
Trade diminishing marginal utility for increased intent coverage
β Thus maximize the number of users who obtain at least some useful document
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversification β Problem statement
Given a query π on a collection
Find π β of given size maximizing:
π some π β π relevant π
Agrawal 2009, Santos 2010, Chen 2006, β¦
π π , πΊ π β π π is relevant β§ no πβ² β π is relevant π
Greedy approx
NP-hard
arg maxπβπ βπ
π π , πΊ π
π Diversified ranking
π β π Baseline ranking π(π|π)
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity β Instantiations of objective function
IA-Select scheme (Agrawal 2009)
π π, π π = π π π π π π π π π 1β π π πβ² π π π
πβ²βππ§
xQuAD scheme (Santos 2010)
π π, π π = 1 β π π π π + π π π,Β¬ π π
= 1 β π π π π + π π π π π π π, π 1β π πβ² π, π
πβ²βππ§
Explicit query aspects
Explicit query aspects
State of the art aspect-based approaches
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity β Instantiations of objective function
IA-Select scheme (Agrawal 2009)
π π, π π = π π§ π π π§ π π π π 1β π π§ πβ² π π π
πβ²βππ§
xQuAD scheme (Santos 2010)
π π, π π = 1 β π π π π + π π π,Β¬ π π
= 1 β π π π π + π π π§ π π π π, π§ 1β π πβ² π, π§
πβ²βππ§
Query aspect coverage
State of the art aspect-based approaches
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity β Instantiations of objective function
IA-Select scheme (Agrawal 2009)
π π, π π = π π§ π π π§ π π π π 1β π π§ πβ² π π π
πβ²βππ§
xQuAD scheme (Santos 2010)
π π, π π = 1 β π π π π + π π π,Β¬ π π
= 1 β π π π π + π π π§ π π π π, π§ 1β π πβ² π, π§
πβ²βππ§
Document βrelevanceβ for query aspect
State of the art aspect-based approaches
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity β Instantiations of objective function
Redundancy penalization
State of the art aspect-based approaches
IA-Select scheme (Agrawal 2009)
π π, π π = π π§ π π π§ π π π π 1β π π§ πβ² π π π
πβ²βππ§
xQuAD scheme (Santos 2010)
π π, π π = 1 β π π π π + π π π,Β¬ π π
= 1 β π π π π + π π π§ π π π π, π§ 1β π πβ² π, π§
πβ²βππ§
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity β Instantiations of objective function
IA-Select scheme (Agrawal 2009)
π π, π π = π π§ π π π§ π π π π 1β π π§ πβ² π π π
πβ²βππ§
xQuAD scheme (Santos 2010)
π π, π π = 1 β π π π π + π π π,Β¬ π π
= 1 β π π π π + π π π§ π π π π, π§ 1β π πβ² π, π§
πβ²βππ§
Mixture with baseline
State of the art aspect-based approaches
π Degree of diversification
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity β Instantiations of objective function
IA-Select scheme (Agrawal 2009)
π π, π π = π π§ π π π§ π π π π 1β π π§ πβ² π π π
πβ²βππ§
xQuAD scheme (Santos 2010)
π π, π π = 1 β π π π π + π π π,Β¬ π π
= 1 β π π π π + π π π§ π π π π, π§ 1β π πβ² π, π§
πβ²βππ§
Probability to observe documents
π π, π π β π π is π«ππ₯ππ―ππ§π β§ no πβ² β π is π«ππ₯ππ―ππ§π π
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity β Relevance-based instantiation of objective function
IA-Select scheme β relevance-based
π π, π π = π π§ π π π π, π, π§ 1β π π πβ², π, π§
πβ²βππ§
xQuAD scheme β relevance-based
π π, π π = 1 β π π ππ π + π π ππ , Β¬ ππ π
= 1 β π π π π, π + π π π§ π π π π, π, π§ 1β π π πβ², π, π§
πβ²βππ§
Probability of relevance
Our proposal
π π, π π β π π is π«ππ₯ππ―ππ§π β§ no πβ² β π is π«ππ₯ππ―ππ§π π
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity β Relevance-based instantiation of objective function
IA-Select scheme β relevance-based
π π, π π = π π§ π π π π, π, π§ 1β π π πβ², π, π§
πβ²βππ§
xQuAD scheme β relevance-based
π π, π π = 1 β π π ππ π + π π ππ , Β¬ ππ π
= 1 β π π π π, π + π π π§ π π π π, π, π§ 1β π π πβ², π, π§
πβ²βππ§
More literal interpretation of initial problem statement
π π, π π β π π is π«ππ₯ππ―ππ§π β§ no πβ² β π is π«ππ₯ππ―ππ§π π
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity β Relevance-based instantiation of objective function
IA-Select scheme β relevance-based
π π, π π = π π§ π π π π, π, π§ 1β π π πβ², π, π§
πβ²βππ§
xQuAD scheme β relevance-based
π π, π π = 1 β π π ππ π + π π ππ , Β¬ ππ π
= 1 β π π π π, π + π π π§ π π π π, π, π§ 1β π π πβ², π, π§
πβ²βππ§
Equivalent for π = 1
π π, π π β π π is relevant β§ no πβ² β π is relevant π
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Relevance distirbution vs. document distribution
π 0
1
π π π, π, π§π
= E nr relevant docs β₯ 1
1 β π π π π, π + π π π§ π π π π, π, π§ 1β π π πβ², π, π§
πβ²βππ§
π π π, π§π
= 1
Different potential behavior E.g. stronger redundancy penalization
π π π,Β· vs. π π Β· β The difference does matter (in this context)
Potential rank equivalences do not apply here
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Relevance-based greedy diversification
Relevance-based reformulation of diversification algorithm
1. Need to estimate π π π, π, π§
2. Does it work? Test empirically
3. Further development: parameterized tolerance to redundancy
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Aspect-based relevance model
Estimate π π π , π, π
Cannot use odds, logs, constant removal⦠or any other rank-preserving step
(we need the specific values)
π π π, π
π π π, π, π§
π π§ π
π π§ π
π π π
π(π§)
Normalized baseline IR system score (as in e.g. Bache 2009)
Estimate π π§ π or π π§ π depending
on available observations:
β’ π§ as document classes (e.g. ODP)
β’ π§ as subqueries (e.g. reformulations)
Then derive the other two parameters
Positional relevance π π rank π, π
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Positional relevance distribution estimate
π π π , π βΌ π π rank π, π = π π π
1E-05
1E-04
1E-03
1E-02
1E-01
1E+00
0 20 40 60 80 100 120 140 160 180 200
p(r
|k)
k
pLSA
Lemur
AOL
Click log statistics
Precision estimates
π π π
π
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Relevance-based greedy diversification
Relevance-based reformulation of diversification algorithm
1. Need to estimate π π π, π, π§
2. Does it work? Test empirically
3. Further development: parameterized tolerance to redundancy
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Experiments
Collection: ClueWeb09 category B (50M documents)
Query/subtopic set: TREC 2009/10 diversity task (100 queries)
Baseline ranking: Lemur Indri search engine (Web service) Diversified top n : 100
Query aspect space:
a) ODP categories level 4 (~7K categories)
b) TREC subtopics (oracle for reference)
Specific parameter estimates:
π π§ π Uniform
π π§ π
π π π
Search diversity
ODP categories: semi-supervised text classification by Textwise
TREC subtopics: Indri search system run on π§ as if a query
i. P@k estimates with TREC relevance judgments (2-fold 2009/10 cross validation)
ii. Click statistics from AOL log (thus different IR system)
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Experiments β Search diversity on TREC
ERR
-IA
Based on π π π, π§
Based on π π π, π, π§
ERR
-IA
Ξ»
ODP categories TREC subtopics
Ξ»
xQuAD scheme
π π π from qrels
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Experiments β Search diversity on TREC
-nDCG@20 ERR-IA@20 nDCGIA@20 S-recall@20
Lemur - 0.2587 0.1630 0.2396 0.4636
a) O
DP
ca
tego
rie
s IA-Select - 0.2651 0.1681 0.2423 0.4483
xQuAD 0.9 0.2675 0.1656 0.2451 0.4864
Rel-based xQuAD
i. Qrels 0.1 0.2858β³β² 0.1828β³β² 0.2655β³β² 0.4898β²β³
ii. Clicks 0.4 0.2841β²β³ 0.1831β³β³ 0.2605β³β² 0.4830β²β½
b)
TR
EC
sub
top
ics IA-Select - 0.3541 0.2346 0.3213 0.5787
xQuAD 1.0 0.3445 0.2241 0.3127 0.5704
Rel-based xQuAD
i. Qrels 1.0 0.3543β³β³ 0.2349β³β³ 0.3192β½β³ 0.5782β½β³
ii. Clicks 1.0 0.3512β½β³ 0.2320β½β³ 0.3166β½β³ 0.5748β½β³
βinformallyβ maximizing ERR-IA by 0.1 steps for each diversifier
Best value in bold green
β² βΌ π < 0.05
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Experiments
Dataset 1: MovieLens 1M
Dataset 2: Last.fm crawl
Adaptation of IR diversity paradigm
(Vargas, Castells & Vallet SIGIR 2011)
Baseline rankings: Diversified top n: 100
Specific parameter estimates:
π π§ π Uniform
π π§ π Uniform on π (based on binary aspect/item association)
π π π P@k estimates with 2-fold cross-validation on test users
Recommendation diversity
Queries users Documents items (movies, music artists) Subtopics item features (genres, tags) Relevance judgments test ratings from data split
Collection: 6K users, 4K movies, 1M ratings
Subtopic set: 10 movie genres
Collection: 1K users, 175K artists, 20M playcounts
Subtopic set: 120K social tags on artists by Last.fm users
a) pLSA
b) Popularity-based recommendation
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Experiments β Recommendation diversity on MovieLens and Last.fm
Ξ»
MovieLens 1M
ERR
-IA
Last.fm
Ξ»
pLS
A r
eco
mm
en
der
R
eco
mm
end
atio
n
by
item
po
pu
lari
ty
ERR
-IA
Based on π π π, π§
Based on π π π, π, π§
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Relevance-based greedy diversification
Relevance-based reformulation of diversification algorithm
1. Need to estimate π π π, π, π§
2. Does it work? Test empirically
3. Further development: parameterized tolerance to redundancy
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Adjustable tolerance to redundancy
Generalization of relevance-based diversification scheme
Formally support adjustable redundancy penalization
Approach: generalize relevance to browsing model
π π, π π = 1 β Ξ» π π π, π + Ξ» π ππ , Β¬ πππππ π = β―
= 1 β Ξ» π π π, π + Ξ» π π§ π π π π, π§, π 1β π π πβ², π§, π π ππππ π
πβ²βππ
Adjustable redundancy tolerance parameter π π π‘ππ π β [0,1]
β High π π π‘ππ π for aggresive penalization, low for e.g. high-recall searches
β In this view, original formulations would implicitly assume π π π‘ππ π = 1,
i.e. a single relevant document is sought
Tolerance to redundancy
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Empirical observation: π π π‘ππ π vs. in -nDCG
Adjustable tolerance to redundancy
ππ π‘πππ
ππ π‘πππ
Search task Lemur on TREC / Subtopics
Recommendation task pLSA on MovieLens / Genres
0 0 1 1
1 1
best -nDCG value of column
worst -nDCG value of column For each
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Conclusion
Alternative, relevance-based formulation of greedy aspect-based diversification
β Unifies two previous aspect-based algorithms
β More literal expression of formal problem statement (and metrics?)
π π π, π, π§ vs. π π π, π§
β Literal value estimates needed (rather than rank-equivalent approximations)
β Estimate based on positional relevance (relevance or click data needed)
Seems to perform well empirically
β Light requirements on relevance or click data for training positional relevance
β Improvement trend, but needs to be tested under further optimizations
Formal support for redundancy tolerance adjustment