On Statistical Analysis and Optimization of Information Retrieval
Effectiveness Metrics
Jun Wang
Joint work with Jianhan Zhu
Department of Computer Science
University College London
Motivation
IR Models
Calculate (relevance) scores for individual documents
Probability Indexing
BM25
Language Models
The Binary Independent Rel. Model
Motivation
✔
✖
✔
✖m (a rank order | “true” relevance of documents))
A general definition:
MotivationWe have different rank preferences and thus IR metrics
NDCG
IR ModelsMRR
MAP
?
…
Something missing in between
MotivationThe fundamental question
What is the underlying generative retrieval process?
Outline
• What is happening right now• The statistical retrieval process• Text retrieval experiments
What is happening right now (1)?
• Still focusing on (relevance) score, but with the acknowledgement the final rank context
– The “less is more” model [Chen&Karger 2006] extended the relevance model
– assumed the previously retrieved documents non-relevant when calculating the rel. of documents for the current rank position,
– equivalent to maximizing the Reciprocal Rank measure
What is happening right now (2)?
• Still focusing on (relevance) score, but with the acknowledgement the final rank context
– In the Language Model framework, various loss functions were defined to incorporate various ranking strategies [Zhai&Lafferty 2006]
What is happening right now (3)?
• Focusing on IR metrics and Ranking– bypass the step of estimating the relevance states of
individual documents– construct a document ranking model from training data
by directly optimizing an IR metric [Volkovs&Zemel 2009]
• However, not all IR metrics necessarily summarize the (training) data well; thus, training data may not be fully explored [Yilmaz&Robertson2009]
A “balanced” view of the retrieval process
– let us first understand (infer) the relevance of documents as accurate as possible,
– and to summarize it by the joint probability of documents’ relevance
– dependency between documents is considered
– Secondly, rank preference is specified by an IR metric.
– The rank decision making is a stochastic one due to the uncertainty about the relevance
– As a result, the optimal ranking action is the one that maximizes the expected value of the IR metric
Given an IR Metric
The statistical document ranking process
a = αργ µ αξα Ε(µ | θ)
= αργ µ αξα1 ,...,αΝ( µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ))
ρ1 ,...,ρΝ
∑
The joint probability of relevance given a query
IR metric:Input: 1.A rank order2.Relevance of docs. r1,...,rN
a1,...,aN
The Optimal Ranker
uncertaintyFixed an IR Metric
OUTPUT: the estimated Performance Score
E(m | q) = µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ)ρ1 ,...,ρΝ
∑
m
a1,...,aN
p(r1,...,rN | q)
E(m | q)
Now the question is how to calculate the Expected IR metric under the joint probability of relevance
if we predefine the IR metric
E(m | q) = µ (α1,...,αΝ | ρ1,...,ρΝ )π(ρ1,...,ρΝ | θ)ρ1 ,...,ρΝ
∑
m(a1,...,aN | r1,...,rN )
We worked out it for the major IR metrics (Average Precision, DCG, Precision at N, Reciprocal Rank)
• Certain assumptions are needed
• The join distribution of relevance is summarized by the marginal means and co-variances
E(r1 | q),...,E(rN | q)cov(ri ,rj | q)
p(r1,...,rN | q)
Some of the results
• Expect Average Precision:
• Expected Reciprocal Rank (two documents):
E[ m ]
Properties of IR metrics under the uncertainty
But, is this analysis can be used in practice?
• The key question is how to obtain the joint probability of relevance? – Click through data– Marginal mean
• Current IR models – relevance models, language models
- Co-variance of relevance- Use the documents’ score correlation to estimate the relevance
correlation. - It is query-independent. We approximate it by sampling queries
and calculating the correlation between documents’ ranking scores
E(r1 | q),..., E(rN | q)
cov(ri ,rj | q)
TREC evaluation
No free lunch
The ideal can be applied for evaluation too.
uncertaintyFixed an IR Metric
Output the estimated Performance Score
m
a1,...,aN
p(r1,...,rN | q)
E(m | q)
Input a IR model
Relevance judgments