fine-tuning ranking models:

Fine-tuning Ranking Models:a two-step optimization approach

Vitor

Jan 29, 2008

Text Learning Meeting - CMU

With invaluable ideas from ….

Motivation

• Rank, Rank, Rank…– Web retrieval, movie recommendation, NFL draft, etc.– Einat’s contextual search– Richard’s set expansion (SEAL)– Andy’s context sensitive spelling correction algorithm– Selecting seeds in Frank’s political blog classification

algorithm– Ramnath’s thunderbird extension for

• Email Leak prediction• Email Recipient suggestion

Help your brothers!

• Try Cut Once!, our Thunderbird extension– Works well with Gmail accounts

• It’s working reasonably well• We need feedback.

Leak warnings: hit x to remove recipient

Pause or cancel send of message

Timer: msg is sent after 10sec by default

Suggestions:hit + to add

Thunderbird plug-in

Classifier/rankers written in JavaScript

Email Recipient Recommendation


0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

TOCCBCC CCBCC

MA

P

Frequency

Recency

M1uc

M2uc

TFIDF

KNN

36 Enron users


0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

TOCCBCC CCBCC

MA

P

Frequency

Recency

M1uc

M2uc

TFIDF

KNN

Threaded

[Carvalho & Cohen, ECIR-08]

Aggregating Rankings

• Many “Data Fusion” methods– 2 types:

• Normalized scores: CombSUM, CombMNZ, etc.• Unnormalized scores: BordaCount, Reciprocal Rank Sum, etc.

• Reciprocal Rank:– The sum of the inverse of the rank of document in each

ranking.

Rankingsq iq

i drankdRR

)(

1)(

[Aslam & Montague, 2001]; [Ogilvie & Callan, 2003]; [Macdonald & Ounis, 2006]

Aggregated Ranking Results

[Carvalho & Cohen, ECIR-08]

Intelligent Email Auto-completionTOCCBCC

CCBCC

Can we do better?

• Not using other features, but better ranking methods

• Machine learning to improve ranking: Learning to rank: – Many (recent) methods:

• ListNet, Perceptrons, RankSvm, RankBoost, AdaRank, Genetic Programming, Ordinal Regression, etc.

– Mostly supervised– Generally small training sets– Workshop in SIGIR-07 (Einat was in the PC)

Pairwise-based Ranking

)()( jiji dfdfdd

mimiiii xwxwxwddf ...,w)( 2211

Rank q

d1

d2

d3

d4

d5

d6

...

dT

We assume a linear function f

0, jiji ddwdd

Goal: induce a ranking function f(d) s.t.

Therefore, constraints are:),...,,( 62616 mxxx

Ranking with Perceptrons

• Nice convergence properties and mistake bounds– bound on the number of mistakes/misranks

• Fast and scalable

• Many variants [Collins 2002, Gao et al 2005, Elsas et al 2008]

– Voting, averaging, committee, pocket, etc.

– General update rule:

– Here: Averaged version of perceptron

)]()([1NRR

tt dfdfWW

Rank SVM

• Equivalent to maximing AUC

,2

1min

2

RPi

iranksvmw

CwL

RP

NRRranksvmw

ddwwL ],1[min2

[Joachims, KDD-02],

[Herbrich et al, 2000]

)},{(,1,,0 subject to i NRRiNRR ddRPddw

.2C

1 where,

Equivalent to:

Loss Function

NRR ddw ,

0

0.5

1

1.5

2

-3 -2 -1 0 1 2 3

Lo

ss

Loss Function

NRR ddwx ,

0

0.5

1

1.5

2

-3 -2 -1 0 1 2 3

Lo

ss

)(11

11

1

xsigmoidee

exx

x

Loss Functions

• SigmoidRank

• SVMrank

RP

NRRranksvmw

ddwwL ],1[min2

RP

NRRkSigmoidRanw

ddwsigmoidwL )],(1[min2

xexsigmoid

1

1)(

Not convex

Fine-tuning Ranking Models

Base Ranker

Sigmoid Rank

Non-convex:

Minimizing a very close approximation for the number of misranks

Final model

Base ranking model

e.g., RankSVM, Perceptron, etc.

Gradient Descent

)( )()(

)()()1(

kdrankSigmoi

k

kk

kk

wLw

www

)( )](1[))(( Since xsigmoidxsigmoidxsigmoidx

)],(1)[,( 2)( )(NRRNRR

RP

kdrankSigmoi ddwsigmoidddwsigmoidwwL

Results in CC prediction

0.472

0.516

0.479

0.524 0.521

0.480

0.25

0.3

0.35

0.4

0.45

0.5

0.55

TOCCBCC CCBCC

MA

P

Frequency

Recency

TFIDF

KNN

Percep

Percep+Sigmoid

RankSVM

RankSVM+Sigmoid

36 Enron users

Set Expansion (SEAL) Results

0.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

SEAL-1 SEAL-2 SEAL-3

MA

P

Percep

Percep+Sigmoid

RankSVM

RankSVM+Sigmoid

ListNet

ListNet+Sigmoid

[Listnet: Cao et al. , ICML-07]

[Wang & Cohen, ICDM-2007]

Results in Letor

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Ohsumed Trec3 Trec4

MA

P

Percep

Percep+Sigmoid

RankSVM

RankSVM+Sigmoid

ListNet

ListNet+Sigmoid

Learning Curve

0.900

0.905

0.910

0.915

0.920

0 5 10 15 20 25 30

epoch

AU

C

Perceptron

RankSVM

TOCCBCC Enron: user lokay-m

Learning Curve

CCBCC Enron: user campbel-m

0.94

0.95

0.96

0.97

0.98

0 5 10 15 20 25 30 35 40

epochs

AU

C

Perceptron

RankSVM

Regularization Parameter

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

0.50

RankSVM RankSVM +Sigmoid



MA

P

C=10 C=1 C=0.1 C=0.01 C=0.001

TREC3 TREC4 Ohsumed=2

Some Ideas

• Instead of number of misranks, optimize other loss functions:– Mean Average Precision, MRR, etc.– Rank Term:

– Some preliminary results with Sigmoid-MAP

• Does it work for classification?

}|}{}{{

)],(1[1)(iNRRj

jii ddwsigmoiddRank

Thanks

fine-tuning ranking models:

Documents

better ranking methodsmachine

reciprocal rank sum

rank of document

finetuning ranking models

ranking function fd

leak warnings

unnormalized scores

gao et