a general approximation framework for direct optimization of information retrieval measures

26
A general approximation framework for direct optimization of information retrieval measures Presenter: Shih-Hsiang Lin ( 林林林 ) Tao Qin, Tie-Yan Liu, Hang Li Microsoft Research Asia, Beijing, Chin e: ms, T. (2002). Optimizing search engines using clickthrough data. In KDD ’02 , Y., et al., (2003). An efficient boosting algorithm for combining preferences. Journal of Machine Learning Researc , C., et al., (2005). Learning to rank using gradient descent. In ICML ’05 ., et al., (2007). Learning to rank: From pairwise approach to listwise approach. In ICML ’07 , & Li, H. (2007). Adarank: A boosting algorithm for information retrieval. In SIGIR ’07 , et al., (2008). Are algorithms directly optimizing ir measures really direct? Technical Report MSR-TR-2008-154, Mi ., et al., (2008). Listwise approach to learning to rank: Theory and algorithm. In ICML ’08 , et al., (2008). Directly optimizing evaluation measures in learning to rank. In SIGIR ’08

Upload: maili

Post on 22-Mar-2016

19 views

Category:

Documents


1 download

DESCRIPTION

A general approximation framework for direct optimization of information retrieval measures. Tao Qin, Tie-Yan Liu, Hang Li Microsoft Research Asia, Beijing, China. Presenter: Shih-Hsiang Lin ( 林士翔 ). Reference: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A general approximation framework for direct optimization of information retrieval measures

A general approximation framework for directoptimization of information retrieval measures

Presenter: Shih-Hsiang Lin (林士翔 )

Tao Qin, Tie-Yan Liu, Hang LiMicrosoft Research Asia, Beijing, China

Reference:1. Joachims, T. (2002). Optimizing search engines using clickthrough data. In KDD ’022. Freund, Y., et al., (2003). An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4, 933–969.3. Burges, C., et al., (2005). Learning to rank using gradient descent. In ICML ’054. Cao, Z., et al., (2007). Learning to rank: From pairwise approach to listwise approach. In ICML ’075. Xu, J., & Li, H. (2007). Adarank: A boosting algorithm for information retrieval. In SIGIR ’076. He, Y., et al., (2008). Are algorithms directly optimizing ir measures really direct? Technical Report MSR-TR-2008-154, Microsoft Corporation.7. Xia, F., et al., (2008). Listwise approach to learning to rank: Theory and algorithm. In ICML ’088. Xu, J., et al., (2008). Directly optimizing evaluation measures in learning to rank. In SIGIR ’08

Page 2: A general approximation framework for direct optimization of information retrieval measures

Recently direct optimization of information retrieval (IR) measures has become a new trend in learning to rank◦ IR measures are explicitly considered in the direct

optimization approach◦ Generally, they can be grouped into two categories

introduce upper bounds of the IR measures approximate the IR measures using some smooth functions

Open Problem◦ The relationships between the surrogate functions and the

corresponding IR measures have not been sufficiently studies◦ Some of the proposed surrogate functions are not easy to

optimize

INTRODUCTION

2

Page 3: A general approximation framework for direct optimization of information retrieval measures

The main contributions of this work include◦ They set up a general framework for direct optimization

it is applicable to any position based IR measure◦ They take AP and NDCG as two examples to show how to

optimize the position based IR measures as surrogate functions in the framework

◦ They provide a theoretical justification to the direct optimization approach

INTRODUCTION

3

Page 4: A general approximation framework for direct optimization of information retrieval measures

Precision@k◦ Evaluating top k positions of a ranked list using two levels

(relevant and irrelevant) of relevance judgment

Average Precision (AP)

◦ e.g. relevant docs ranked at 1, 5, 10, precisions are 1/1, 2/5, 3/10, AP = (1/1+2/5+3/10)/3≈0.56

MAP is defined as the mean of AP over a set of queries

REVIEW ON IR MEASURES (1/3)

4

k

jjrk

k1

1@Prek denotes the truncation positionrj equals one if the doc in the jth position is relevant and zero otherwise

j

j jrD

@Pre1AP |D+ | denotes the number of relevant documents w.r.t. the query

Page 5: A general approximation framework for direct optimization of information retrieval measures

Normalized Discounted Cumulated Gain (NDCG)◦ It is designed for multiple levels of relevance judgments◦ Uses graded relevance as a measure of the usefulness, or gain,

from examining a document◦ Discounted Cumulative Gain (DCG) is the total gain accumulated at

a particular rank k

e.g. 10 ranked documents judged on 0-3 relevance scale3, 3, 2, 2, 1, 1, 17, 7, 3, 3, 1, 1, 11, 0.63, 0.5, 0.43, 0.39, 0.36, 0.33 7, 11.41, 12.91, 14.2, 14.59, 14.95, 15.28

REVIEW ON IR MEASURES (2/3)

5

k

j

r

jk

j

1 2 1log12@DCG

rank j : rj

gain 2rj-1

discount 1/log2(1+j)

DCG

Page 6: A general approximation framework for direct optimization of information retrieval measures

◦NDCG is defined as

REVIEW ON IR MEASURES (3/3)

6

k

j

r

k jNk

j

1 2

1

1log12@NDCG

Nk is a constant depending on a Query to make the maximum value of NDCG@k of they query is 1

Page 7: A general approximation framework for direct optimization of information retrieval measures

The framework consists of four steps:◦ Reformulating an IR measure from ‘indexed by positions’ to

‘indexed by documents’◦ Approximating the position function with a logistic function

of ranking scores of documents◦ Approximating the truncation with a logistic function of

positions of documents◦ Applying a global optimization technique to optimize the

approximated measure (surrogate function)

A GENERAL APPROXIMATION FRAMEWORK

7

Page 8: A general approximation framework for direct optimization of information retrieval measures

Most of the IR measures, for example, Precision@k, AP and NDCG are position based◦ The summations in the definitions of IR measures are taken

over positions◦ The position of a document may change during the training

process, which makes the optimization of the IR measures difficult

When indexed by documents, Precision@k can be re-written as below

STEP1: Measure Reformulation (1/2)

8

X

1x

kxxrk

k 1@Pre

X is a set of documentsr(x) equals one for relevant document and zero otherwiseπ(x) denotes the position of x in the ranked list π1{} is a truncation function

Page 9: A general approximation framework for direct optimization of information retrieval measures

9

With documents as indexes, AP can be re-written as

Combining above two equations yields

So far, this measurements are non-continuous and non-differentiable

STEP1: Measure Reformulation (2/2)

yyrD y

X

@Pre1AP

X X

X X

1

1

y yxx

y x

yyxxryr

yyr

D

yxxry

yrD

,

1

11AP

Page 10: A general approximation framework for direct optimization of information retrieval measures

10

The position function can be represented as a function of ranking scores

Due to the indication function in it, the position function is still non-continuous and non-differentiable◦ They propose approximating the indicator function

using a logistic function

STEP 2: Position Function Approximation (1/2)

xyy

yxsx,

, 01X1 yxyx sss ,where

0, yxs1

xyy yx

yx

ss

x, ,

,

exp1exp

1ˆX

α is a scaling constant and α>0

Page 11: A general approximation framework for direct optimization of information retrieval measures

11

Examples of position approximation

◦ The approximation is very accurate in this case

STEP 2: Position Function Approximation (2/2)

Page 12: A general approximation framework for direct optimization of information retrieval measures

12

Some measures have truncation functions in definitions, such as Precision@k, AP, and NDCG@k. These measures need further approximations on the truncation functions

To approximate the truncation function , a simple way is to use the logistic function once again

Thus, we obtain the approximation of AP as follow

STEP3: Truncation Function Approximation

yx 1

xy

xyyx

ˆˆexp1

ˆˆexp

1 β is a scaling constant and β >0

X Xy yxx xy

xyyxryr

yyr

D , ˆˆexp1ˆˆexp

ˆˆ1AP

Page 13: A general approximation framework for direct optimization of information retrieval measures

13

With the aforementioned approximation technique, the surrogate objective functions become continuous and differentiable with respect to the parameter in the ranking model

However, considering that the original IR measures contain a lot of local optima, the approximations of them will also contain local optima◦ One should better choose those global optimization

methods such as random restart and simulated annealing in order to avoid being trapped to local optima

STEP4: Surrogate Function Optimization (1/3)

;xf

Page 14: A general approximation framework for direct optimization of information retrieval measures

Gradient of ApproxAP

where

by chain rule

14

STEP4: Surrogate Function Optimization (2/3)

Page 15: A general approximation framework for direct optimization of information retrieval measures

15

STEP4: Surrogate Function Optimization (3/3)

Page 16: A general approximation framework for direct optimization of information retrieval measures

16

In general, we would like to create a ranking model that maximize the accuracy in terms of an IR measure on training data,

or equivalently, minimizes the loss function defined as follows

Directly optimizing techniques try to minimize the above function

Comparisons with other directly optimizing techniques

m

iiiE

1,max y

m

iii

m

iiiii EEE

11

* ,1min,,min yyy

πi is the permutation selected for query qi E(πi , yi ) is evaluation of πi w.r.t. yi for qi

Page 17: A general approximation framework for direct optimization of information retrieval measures

17

From the viewpoint of loss function optimization, these methods fall into three categories◦ One can minimize upper bounds of the basic loss function

defined on the IR measures AdaRank, SVMmap

◦ One can approximate the IR measures with functions that are easy to handle this paper, SoftRank

◦ One can use specially designed technologies for optimizing the non-smooth IR measures

Comparisons with other directly optimizing techniques (cont.)

Page 18: A general approximation framework for direct optimization of information retrieval measures

18

Minimize upper bounds of the basic loss function◦ Type one bound

the logistic function

the exponential function

Comparisons with other directly optimizing techniques (cont.)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

11-x

exp(-x)

log(1+exp(-x))

Since e-x ≥ 1-x

Page 19: A general approximation framework for direct optimization of information retrieval measures

19

◦ Type two bound

The loss function measures the loss when the worst prediction is made

Comparisons with other directly optimizing techniques (cont.)

[[.]] is one if the condition is satisfied, otherwise zero

Page 20: A general approximation framework for direct optimization of information retrieval measures

20

Comparisons with other directly optimizing techniques (cont.)

Page 21: A general approximation framework for direct optimization of information retrieval measures

21

Datasets◦ LETOR 3.0 datasets

a benchmark collection for the research on learning to rank for information retrieval

TD2003, TD2004 and OHSUMED

Retrieval method◦ Use linear ranking model for ApproxAP and ApproxNDCG in

the experiments

EXPERIMENTAL SETUP

Page 22: A general approximation framework for direct optimization of information retrieval measures

22

On the approximation of IR measures

◦ The approximation accuracy is very high and it becomes more accurate as increasing α or β

EXPERIMENTAL RESULTS (1/3)

Qq

qqQ

APAP1Approximate error:

Page 23: A general approximation framework for direct optimization of information retrieval measures

23

On the performance of ApproxAP◦ Five fold cross validation as suggested in LETOR for both

TD2003 and TD2004 datasets α = {50, 100, 150, 200, 250, 300}, β= {1,10, 20, 50, 100}

δ=0.001, η=0.01, K=10

◦ The result clearly shows the advantage of using the proposed method for direct optimization

EXPERIMENTAL RESULTS (2/3)

Page 24: A general approximation framework for direct optimization of information retrieval measures

24

◦ It also can be found that AdaRank.MAP and SVMmap are not as good as Ranking SVM and ListNet AdaRank.MAP and SVMmap optimize the upper bound of AP

and it is not clear whether the bound is tight. If the bound is very loose, optimization of the bound cannot

always lead to the optimization of AP, and so they may not perform well on some datasets.

EXPERIMENTAL RESULTS (3/3)

Page 25: A general approximation framework for direct optimization of information retrieval measures

25

In this paper, they have set up a general framework to approximate position based IR measures◦ The key part of the framework is to approximate the positions

of documents by logistic functions of their scores There are several advantages of this framework◦ The way of approximating position based measures is simple

yet general◦ Many existing techniques can be directly applied to the

optimization and the optimization process itself is measure independent

◦ It is easy to conduct analysis on the accuracy of the approach and high approximation accuracy can be achieved by setting appropriate parameters

CONCLUSIONS AND FUTURE WORK (1/2)

Page 26: A general approximation framework for direct optimization of information retrieval measures

26

There are still some issues that need to be further studied◦ The approximated measures are not convex, and there may

be many local optima in training◦ Conduct experiments to test the algorithms with other

function classes

CONCLUSIONS AND FUTURE WORK (2/2)