on statistical analysis and optimization of information retrieval effectiveness metrics
Post on 05-Jul-2015
92 Views
Preview:
TRANSCRIPT
On Statistical Analysis and Optimization of Information Retrieval
Effectiveness Metrics
Jun Wang
Joint work with Jianhan Zhu
Department of Computer Science
University College London
J.Wang@cs.ucl.ac.uk
Motivation
IR Models
Calculate (relevance) scores for individual documents
Probability Indexing
BM25
Language Models
The Binary Independent Rel. Model
Motivation
✔
✖
✔
✖m (a rank order | “true” relevance of documents))
A general definition:
MotivationWe have different rank preferences and thus IR metrics
NDCG
IR ModelsMRR
MAP
?
…
Something missing in between
MotivationThe fundamental question
What is the underlying generative retrieval process?
Outline
• What is happening right now• The statistical retrieval process• Text retrieval experiments
What is happening right now (1)?
• Still focusing on (relevance) score, but with the acknowledgement the final rank context
– The “less is more” model [Chen&Karger 2006] extended the relevance model
– assumed the previously retrieved documents non-relevant when calculating the rel. of documents for the current rank position,
– equivalent to maximizing the Reciprocal Rank measure
What is happening right now (2)?
• Still focusing on (relevance) score, but with the acknowledgement the final rank context
– In the Language Model framework, various loss functions were defined to incorporate various ranking strategies [Zhai&Lafferty 2006]
What is happening right now (3)?
• Focusing on IR metrics and Ranking– bypass the step of estimating the relevance states of
individual documents– construct a document ranking model from training data
by directly optimizing an IR metric [Volkovs&Zemel 2009]
• However, not all IR metrics necessarily summarize the (training) data well; thus, training data may not be fully explored [Yilmaz&Robertson2009]
A “balanced” view of the retrieval process
– let us first understand (infer) the relevance of documents as accurate as possible,
– and to summarize it by the joint probability of documents’ relevance
– dependency between documents is considered
– Secondly, rank preference is specified by an IR metric.
– The rank decision making is a stochastic one due to the uncertainty about the relevance
– As a result, the optimal ranking action is the one that maximizes the expected value of the IR metric
Given an IR Metric
The statistical document ranking process
a = αργ µ αξα Ε(µ | θ)
= αργ µ αξα1 ,...,αΝ( µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ))
ρ1 ,...,ρΝ
∑
The joint probability of relevance given a query
IR metric:Input: 1.A rank order2.Relevance of docs. r1,...,rN
a1,...,aN
The Optimal Ranker
uncertaintyFixed an IR Metric
OUTPUT: the estimated Performance Score
E(m | q) = µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ)ρ1 ,...,ρΝ
∑
m
a1,...,aN
p(r1,...,rN | q)
E(m | q)
Now the question is how to calculate the Expected IR metric under the joint probability of relevance
if we predefine the IR metric
E(m | q) = µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ)ρ1 ,...,ρΝ
∑
m(a1,...,aN | r1,...,rN )
We worked out it for the major IR metrics (Average Precision, DCG, Precision at N, Reciprocal Rank)
• Certain assumptions are needed
• The join distribution of relevance is summarized by the marginal means and co-variances
E(r1 | q),...,E(rN | q)cov(ri ,rj | q)
p(r1,...,rN | q)
Some of the results
• Expect Average Precision:
• Expected Reciprocal Rank (two documents):
E[ m ]
Properties of IR metrics under the uncertainty
But, is this analysis can be used in practice?
• The key question is how to obtain the joint probability of relevance? – Click through data– Marginal mean
• Current IR models – relevance models, language models
- Co-variance of relevance- Use the documents’ score correlation to estimate the relevance
correlation. - It is query-independent. We approximate it by sampling queries
and calculating the correlation between documents’ ranking scores
E(r1 | q),..., E(rN | q)
cov(ri ,rj | q)
TREC evaluation
No free lunch
The ideal can be applied for evaluation too.
uncertaintyFixed an IR Metric
Output the estimated Performance Score
m
a1,...,aN
p(r1,...,rN | q)
E(m | q)
Input a IR model
Relevance judgments
top related