Self-Paced Learning forSemantic Segmentation
M. Pawan Kumar
Self-Paced Learning forLatent Structural SVM
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Daphne Koller
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Benjamin Packer
M. Pawan Kumar
AimTo learn accurate parameters for latent structural SVM
Input x
Output y Y
“Deer”
Hidden Variableh H
Y = {“Bison”, “Deer”, ”Elephant”, “Giraffe”, “Llama”, “Rhino” }
AimTo learn accurate parameters for latent structural SVM
Feature (x,y,h)(HOG, BoW)
(y*,h*) = maxyY,hH wT(x,y,h)
Parameters w
Motivation
Real Numbers
Imaginary Numbers
eiπ+1 = 0
Math is forlosers !!
FAILURE … BAD LOCAL MINIMUM
Motivation
Real Numbers
Imaginary Numbers
eiπ+1 = 0
Euler wasa Genius!!
SUCCESS … GOOD LOCAL MINIMUM
Motivation
Start with “easy” examples, then consider “hard” ones
Easy vs. Hard
Expensive
Easy for human Easy for machine
Simultaneously estimate easiness and parametersEasiness is property of data sets, not single instances
Outline
• Latent Structural SVM
• Concave-Convex Procedure
• Self-Paced Learning
• Experiments
Latent Structural SVM
Training samples xi
Ground-truth label yi
Loss Function(yi, yi(w), hi(w))
Felzenszwalb et al, 2008, Yu and Joachims, 2009
Latent Structural SVM
(yi(w),hi(w)) = maxyY,hH wT(x,y,h)
min ||w||2 + C∑i(yi, yi(w), hi(w))
Non-convex Objective
Minimize an upper bound
Latent Structural SVM
min ||w||2 + C∑i i
maxhiwT(xi,yi,hi) - wT(xi,y,h)
≥ (yi, y, h) - i
Still non-convex Difference of convex
CCCP Algorithm - converges to a local minimum
(yi(w),hi(w)) = maxyY,hH wT(x,y,h)
Outline
• Latent Structural SVM
• Concave-Convex Procedure
• Self-Paced Learning
• Experiments
Concave-Convex Procedure
Start with an initial estimate w0
Update
Update wt+1 by solving a convex problem
min ||w||2 + C∑i i
wT(xi,yi,hi) - wT(xi,y,h)≥ (yi, y, h) - i
hi = maxhH wtT(xi,yi,h)
Concave-Convex Procedure
Looks at all samples simultaneously
“Hard” samples will cause confusion
Start with “easy” samples, then consider “hard” ones
Outline
• Latent Structural SVM
• Concave-Convex Procedure
• Self-Paced Learning
• Experiments
Self-Paced Learning
REMINDER
Simultaneously estimate easiness and parametersEasiness is property of data sets, not single instances
Self-Paced Learning
Start with an initial estimate w0
Update
Update wt+1 by solving a convex problem
min ||w||2 + C∑i i
wT(xi,yi,hi) - wT(xi,y,h)≥ (yi, y, h) - i
hi = maxhH wtT(xi,yi,h)
Self-Paced Learning
min ||w||2 + C∑i i
wT(xi,yi,hi) - wT(xi,y,h)≥ (yi, y, h) - i
Self-Paced Learning
min ||w||2 + C∑i vii
wT(xi,yi,hi) - wT(xi,y,h)≥ (yi, y, h) - i
vi {0,1}
Trivial Solution
Self-Paced Learning
vi {0,1}
Large K Medium K Small K
min ||w||2 + C∑i vii - ∑ivi/K
wT(xi,yi,hi) - wT(xi,y,h)≥ (yi, y, h) - i
Self-Paced Learning
vi [0,1]
min ||w||2 + C∑i vii - ∑ivi/K
wT(xi,yi,hi) - wT(xi,y,h)≥ (yi, y, h) - i
Large K Medium K Small K
BiconvexProblem
AlternatingConvex Search
Self-Paced LearningStart with an initial estimate w0
Update
Update wt+1 by solving a convex problem
min ||w||2 + C∑i vii - ∑i vi/K
wT(xi,yi,hi) - wT(xi,y,h)≥ (yi, y, h) - i
hi = maxhH wtT(xi,yi,h)
Decrease K K/
Outline
• Latent Structural SVM
• Concave-Convex Procedure
• Self-Paced Learning
• Experiments
Object Detection
Feature (x,y,h) - HOG
Input x - Image
Output y Y
Latent h - Box
- 0/1 Loss
Y = {“Bison”, “Deer”, ”Elephant”, “Giraffe”, “Llama”, “Rhino” }
Object Detection
271 images, 6 classes
90/10 train/test split
4 folds
Mammals Dataset
Object DetectionCCCP Self-Paced
Object DetectionCCCP Self-Paced
Object DetectionCCCP Self-Paced
Object DetectionCCCP Self-Paced
Objective value Test error
Object Detection
4
4.2
4.4
4.6
4.8
5
Fold1 Fold2 Fold3 Fold4
CCCP
SPL
0
5
10
15
20
25
Fold1 Fold2 Fold3 Fold4
CCCP
SPL
Handwritten Digit Recognition
Feature (x,y,h) - PCA + Projection
Input x - Image
Output y Y
Y = {0, 1, … , 9}
Latent h - Rotation
MNIST Dataset
- 0/1 Loss
Handwritten Digit Recognition
- Significant Difference
C
C
C
SPL
Handwritten Digit Recognition
- Significant Difference
C
C
C
SPL
Handwritten Digit Recognition
- Significant Difference
C
C
C
SPL
Handwritten Digit Recognition
- Significant Difference
C
C
C
SPL
Motif Finding
Feature (x,y,h) - Ng and Cardie, ACL 2002
Input x - DNA Sequence
Output y Y
Y = {0, 1}
Latent h - Motif Location
- 0/1 Loss
Motif Finding
40,000 sequences
50/50 train/test split
5 folds
UniProbe Dataset
Motif FindingAverage Hamming Distance of Inferred Motifs
SPL SPL
SPLSPL
Motif Finding
020406080
100120140160
Fold 1 Fold 2 Fold 3 Fold 4 Fold 5
CCCPCurr
Objective Value
SPL
Motif Finding
0
10
20
30
40
50
Fold1
Fold2
Fold3
Fold4
Fold5
CCCPCurr
Test Error
SPL
Noun Phrase Coreference
Feature (x,y,h) - Yu and Joachims, ICML 2009
Input x - Nouns Output y - Clustering
Latent h - Spanning Forest over Nouns
Noun Phrase Coreference60 documents
50/50 train/test split 1 predefined fold
MUC6 Dataset
Noun Phrase Coreference
- Significant Improvement
- Significant Decrement
MITRELoss
PairwiseLoss
Noun Phrase Coreference
MITRELoss
PairwiseLoss
SPL
SPL
Noun Phrase Coreference
MITRELoss
PairwiseLoss
SPL
SPL
Summary
• Automatic Self-Paced Learning
• Concave-Biconvex Procedure
• Generalization to other Latent models– Expectation-Maximization– E-step remains the same– M-step includes indicator variables vi
Kumar, Packer and Koller, NIPS 2010