fine-tuning ranking models:
DESCRIPTION
Fine-tuning Ranking Models:. Vitor Jan 29, 2008 Text Learning Meeting - CMU. a two-step optimization approach. With invaluable ideas from …. Motivation. Rank, Rank, Rank… Web retrieval, movie recommendation, NFL draft, etc. Einat ’s contextual search Richard ’s set expansion (SEAL) - PowerPoint PPT PresentationTRANSCRIPT
Fine-tuning Ranking Models:a two-step optimization approach
Vitor
Jan 29, 2008
Text Learning Meeting - CMU
With invaluable ideas from ….
Motivation
• Rank, Rank, Rank…– Web retrieval, movie recommendation, NFL draft, etc.– Einat’s contextual search– Richard’s set expansion (SEAL)– Andy’s context sensitive spelling correction algorithm– Selecting seeds in Frank’s political blog classification
algorithm– Ramnath’s thunderbird extension for
• Email Leak prediction• Email Recipient suggestion
Help your brothers!
• Try Cut Once!, our Thunderbird extension– Works well with Gmail accounts
• It’s working reasonably well• We need feedback.
Leak warnings: hit x to remove recipient
Pause or cancel send of message
Timer: msg is sent after 10sec by default
Suggestions:hit + to add
Thunderbird plug-in
Classifier/rankers written in JavaScript
Email Recipient Recommendation
Email Recipient Recommendation
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
TOCCBCC CCBCC
MA
P
Frequency
Recency
M1uc
M2uc
TFIDF
KNN
36 Enron users
Email Recipient Recommendation
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
TOCCBCC CCBCC
MA
P
Frequency
Recency
M1uc
M2uc
TFIDF
KNN
Threaded
[Carvalho & Cohen, ECIR-08]
Aggregating Rankings
• Many “Data Fusion” methods– 2 types:
• Normalized scores: CombSUM, CombMNZ, etc.• Unnormalized scores: BordaCount, Reciprocal Rank Sum, etc.
• Reciprocal Rank:– The sum of the inverse of the rank of document in each
ranking.
Rankingsq iq
i drankdRR
)(
1)(
[Aslam & Montague, 2001]; [Ogilvie & Callan, 2003]; [Macdonald & Ounis, 2006]
Aggregated Ranking Results
[Carvalho & Cohen, ECIR-08]
Intelligent Email Auto-completionTOCCBCC
CCBCC
Can we do better?
• Not using other features, but better ranking methods
• Machine learning to improve ranking: Learning to rank: – Many (recent) methods:
• ListNet, Perceptrons, RankSvm, RankBoost, AdaRank, Genetic Programming, Ordinal Regression, etc.
– Mostly supervised– Generally small training sets– Workshop in SIGIR-07 (Einat was in the PC)
Pairwise-based Ranking
)()( jiji dfdfdd
mimiiii xwxwxwddf ...,w)( 2211
Rank q
d1
d2
d3
d4
d5
d6
...
dT
We assume a linear function f
0, jiji ddwdd
Goal: induce a ranking function f(d) s.t.
Therefore, constraints are:),...,,( 62616 mxxx
Ranking with Perceptrons
• Nice convergence properties and mistake bounds– bound on the number of mistakes/misranks
• Fast and scalable
• Many variants [Collins 2002, Gao et al 2005, Elsas et al 2008]
– Voting, averaging, committee, pocket, etc.
– General update rule:
– Here: Averaged version of perceptron
)]()([1NRR
tt dfdfWW
Rank SVM
• Equivalent to maximing AUC
,2
1min
2
RPi
iranksvmw
CwL
RP
NRRranksvmw
ddwwL ],1[min2
[Joachims, KDD-02],
[Herbrich et al, 2000]
)},{(,1,,0 subject to i NRRiNRR ddRPddw
.2C
1 where,
Equivalent to:
Loss Function
NRR ddw ,
0
0.5
1
1.5
2
-3 -2 -1 0 1 2 3
Lo
ss
Loss Function
NRR ddw ,
0
0.5
1
1.5
2
-3 -2 -1 0 1 2 3
Lo
ss
Loss Function
NRR ddwx ,
0
0.5
1
1.5
2
-3 -2 -1 0 1 2 3
Lo
ss
)(11
11
1
xsigmoidee
exx
x
Loss Functions
• SigmoidRank
• SVMrank
RP
NRRranksvmw
ddwwL ],1[min2
RP
NRRkSigmoidRanw
ddwsigmoidwL )],(1[min2
xexsigmoid
1
1)(
Not convex
Fine-tuning Ranking Models
Base Ranker
Sigmoid Rank
Non-convex:
Minimizing a very close approximation for the number of misranks
Final model
Base ranking model
e.g., RankSVM, Perceptron, etc.
Gradient Descent
)( )()(
)()()1(
kdrankSigmoi
k
kk
kk
wLw
www
)( )](1[))(( Since xsigmoidxsigmoidxsigmoidx
)],(1)[,( 2)( )(NRRNRR
RP
kdrankSigmoi ddwsigmoidddwsigmoidwwL
Results in CC prediction
0.472
0.516
0.479
0.524 0.521
0.480
0.25
0.3
0.35
0.4
0.45
0.5
0.55
TOCCBCC CCBCC
MA
P
Frequency
Recency
TFIDF
KNN
Percep
Percep+Sigmoid
RankSVM
RankSVM+Sigmoid
36 Enron users
Set Expansion (SEAL) Results
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
SEAL-1 SEAL-2 SEAL-3
MA
P
Percep
Percep+Sigmoid
RankSVM
RankSVM+Sigmoid
ListNet
ListNet+Sigmoid
[Listnet: Cao et al. , ICML-07]
[Wang & Cohen, ICDM-2007]
Results in Letor
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Ohsumed Trec3 Trec4
MA
P
Percep
Percep+Sigmoid
RankSVM
RankSVM+Sigmoid
ListNet
ListNet+Sigmoid
Learning Curve
0.900
0.905
0.910
0.915
0.920
0 5 10 15 20 25 30
epoch
AU
C
Perceptron
RankSVM
TOCCBCC Enron: user lokay-m
Learning Curve
CCBCC Enron: user campbel-m
0.94
0.95
0.96
0.97
0.98
0 5 10 15 20 25 30 35 40
epochs
AU
C
Perceptron
RankSVM
Regularization Parameter
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
RankSVM RankSVM +Sigmoid
RankSVM RankSVM +Sigmoid
RankSVM RankSVM +Sigmoid
MA
P
C=10 C=1 C=0.1 C=0.01 C=0.001
TREC3 TREC4 Ohsumed=2
Some Ideas
• Instead of number of misranks, optimize other loss functions:– Mean Average Precision, MRR, etc.– Rank Term:
– Some preliminary results with Sigmoid-MAP
• Does it work for classification?
}|}{}{{
)],(1[1)(iNRRj
jii ddwsigmoiddRank
Thanks