loss-based learning with weak supervisionmpawankumar.info/tutorials/cvpr2013/slides/cvpr... ·...
TRANSCRIPT
![Page 1: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/1.jpg)
Loss-based Learning with Weak Supervision
M. Pawan Kumar
![Page 2: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/2.jpg)
Segmentation
Information
Log
(Siz
e)
~ 2000
Computer Vision Data
![Page 3: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/3.jpg)
Segmentation
Log
(Siz
e)
~ 2000
Information
Bounding Box
~ 1 M
Computer Vision Data
![Page 4: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/4.jpg)
Segmentation
Log
(Siz
e)
Bounding Box Image-Level ~ 2000
~ 1 M > 14 M
“Car” “Chair” Information
Computer Vision Data
![Page 5: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/5.jpg)
Segmentation
Log
(Siz
e)
Image-Level Noisy Label
~ 2000
> 14 M
> 6 B
Information
Bounding Box
~ 1 M
Computer Vision Data
![Page 6: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/6.jpg)
Learn with missing information (latent variables)
Detailed annotation is expensive
Sometimes annotation is impossible
Desired annotation keeps changing
Computer Vision Data
![Page 7: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/7.jpg)
• Two Types of Problems
• Part I – Annotation Mismatch
• Part II – Output Mismatch
Outline
![Page 8: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/8.jpg)
Annotation Mismatch
Input x
Annotation y
Latent h
x
y = “jumping”
h
Action Classification
Mismatch between desired and available annotations
Exact value of latent variable is not “important”
Desired output during test time is y
![Page 9: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/9.jpg)
Output Mismatch
Input x
Annotation y
Latent h
x
y = “jumping”
h
Action Classification
![Page 10: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/10.jpg)
Output Mismatch
Input x
Annotation y
Latent h
x
y = “jumping”
h
Action Detection
Mismatch between output and available annotations
Exact value of latent variable is important
Desired output during test time is (y,h)
![Page 11: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/11.jpg)
Part I
![Page 12: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/12.jpg)
• Latent SVM
• Optimization
• Practice
• Extensions
Outline – Annotation Mismatch
Andrews et al., NIPS 2001; Smola et al., AISTATS 2005; Felzenszwalb et al., CVPR 2008; Yu and Joachims, ICML 2009
![Page 13: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/13.jpg)
Weakly Supervised Data
Input x
Output y ∈ {-1,+1}
Hidden h
x
y = +1
h
![Page 14: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/14.jpg)
Weakly Supervised Classification
Feature Φ(x,h)
Joint Feature Vector
Ψ(x,y,h)
x
y = +1
h
![Page 15: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/15.jpg)
Weakly Supervised Classification
Feature Φ(x,h)
Joint Feature Vector
Ψ(x,+1,h) Φ(x,h)
0 =
x
y = +1
h
![Page 16: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/16.jpg)
Weakly Supervised Classification
Feature Φ(x,h)
Joint Feature Vector
Ψ(x,-1,h) 0
Φ(x,h) =
x
y = +1
h
![Page 17: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/17.jpg)
Weakly Supervised Classification
Feature Φ(x,h)
Joint Feature Vector
Ψ(x,y,h)
Score f : Ψ(x,y,h) è (-∞, +∞)
Optimize score over all possible y and h
x
y = +1
h
![Page 18: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/18.jpg)
Scoring function
wTΨ(x,y,h)
Prediction
y(w),h(w) = argmaxy,h wTΨ(x,y,h)
Latent SVM
Parameters
![Page 19: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/19.jpg)
Learning Latent SVM
Δ(yi, yi(w)) Σi
Empirical risk minimization
minw
No restriction on the loss function
Annotation mismatch
Training data {(xi,yi), i = 1,2,…,n}
![Page 20: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/20.jpg)
Learning Latent SVM
Δ(yi, yi(w)) Σi
Empirical risk minimization
minw
Non-convex
Parameters cannot be regularized
Find a regularization-sensitive upper bound
![Page 21: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/21.jpg)
Learning Latent SVM
- wTΨ(xi,yi(w),hi(w))
Δ(yi, yi(w)) wTΨ(xi,yi(w),hi(w)) +
![Page 22: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/22.jpg)
Learning Latent SVM
Δ(yi, yi(w)) wTΨ(xi,yi(w),hi(w)) +
- maxhi wTΨ(xi,yi,hi)
y(w),h(w) = argmaxy,h wTΨ(x,y,h)
![Page 23: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/23.jpg)
Learning Latent SVM
Δ(yi, y) wTΨ(xi,y,h) + maxy,h
- maxhi wTΨ(xi,yi,hi) ≤ ξi
minw ||w||2 + C Σiξi
Parameters can be regularized
Is this also convex?
![Page 24: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/24.jpg)
Learning Latent SVM
Δ(yi, y) wTΨ(xi,y,h) + maxy,h
- maxhi wTΨ(xi,yi,hi) ≤ ξi
minw ||w||2 + C Σiξi
Convex Convex -
Difference of convex (DC) program
![Page 25: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/25.jpg)
minw ||w||2 + C Σiξi
wTΨ(xi,y,h) + Δ(yi,y) - maxhi wTΨ(xi,yi,hi) ≤ ξi
Scoring function
wTΨ(x,y,h)
Prediction
y(w),h(w) = argmaxy,h wTΨ(x,y,h)
Learning
Recap
![Page 26: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/26.jpg)
• Latent SVM
• Optimization
• Practice
• Extensions
Outline – Annotation Mismatch
![Page 27: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/27.jpg)
Learning Latent SVM
Δ(yi, y) wTΨ(xi,y,h) + maxy,h
- maxhi wTΨ(xi,yi,hi) ≤ ξi
minw ||w||2 + C Σiξi
Difference of convex (DC) program
![Page 28: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/28.jpg)
Concave-Convex Procedure
+
Δ(yi, y) wTΨ(xi,y,h) +
maxy,h
wTΨ(xi,yi,hi)
- maxhi
Linear upper-bound of concave part
![Page 29: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/29.jpg)
Concave-Convex Procedure
+
Δ(yi, y) wTΨ(xi,y,h) +
maxy,h
wTΨ(xi,yi,hi)
- maxhi
Optimize the convex upper bound
![Page 30: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/30.jpg)
Concave-Convex Procedure
+
Δ(yi, y) wTΨ(xi,y,h) +
maxy,h
wTΨ(xi,yi,hi)
- maxhi
Linear upper-bound of concave part
![Page 31: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/31.jpg)
Concave-Convex Procedure
+
Δ(yi, y) wTΨ(xi,y,h) +
maxy,h
wTΨ(xi,yi,hi)
- maxhi
Until Convergence
![Page 32: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/32.jpg)
Concave-Convex Procedure
+
Δ(yi, y) wTΨ(xi,y,h) +
maxy,h
wTΨ(xi,yi,hi)
- maxhi
Linear upper bound?
![Page 33: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/33.jpg)
Linear Upper Bound
- maxhi wTΨ(xi,yi,hi)
-wTΨ(xi,yi,hi*)
hi* = argmaxhi wt
TΨ(xi,yi,hi)
Current estimate = wt
≥ - maxhi wTΨ(xi,yi,hi)
![Page 34: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/34.jpg)
CCCP for Latent SVM Start with an initial estimate w0
Update
Update wt+1 as the ε-optimal solution of
min ||w||2 + C∑i ξi
wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y) - ξi
hi* = argmaxhi∈H wtTΨ(xi,yi,hi)
Repeat until convergence
![Page 35: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/35.jpg)
• Latent SVM
• Optimization
• Practice
• Extensions
Outline – Annotation Mismatch
![Page 36: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/36.jpg)
Action Classification Input x
Output y = “Using Computer”
PASCAL VOC 2011
80/20 Train/Test Split 5 Folds
Jumping
Phoning
Playing Instrument
Reading
Riding Bike
Riding Horse
Running
Taking Photo
Using Computer
Walking
Train Input xi Output yi
![Page 37: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/37.jpg)
• 0-1 loss function
• Poselet-based feature vector
• 4 seeds for random initialization
• Code + Data
• Train/Test scripts with hyperparameter settings
Setup
http://www.centrale-ponts.fr/tutorials/cvpr2013/
![Page 38: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/38.jpg)
Objective
![Page 39: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/39.jpg)
Train Error
![Page 40: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/40.jpg)
Test Error
![Page 41: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/41.jpg)
Time
![Page 42: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/42.jpg)
• Latent SVM
• Optimization
• Practice – Annealing the Tolerance – Annealing the Regularization – Self-Paced Learning – Choice of Loss Function
• Extensions
Outline – Annotation Mismatch
![Page 43: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/43.jpg)
Start with an initial estimate w0
Update
Update wt+1 as the ε-optimal solution of
min ||w||2 + C∑i ξi
wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y) - ξi
hi* = argmaxhi∈H wtTΨ(xi,yi,hi)
Repeat until convergence
Overfitting in initial iterations
![Page 44: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/44.jpg)
Repeat until convergence
ε’ = ε/K
and ε’ = ε
Start with an initial estimate w0
Update
Update wt+1 as the ε’-optimal solution of
min ||w||2 + C∑i ξi
wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y) - ξi
hi* = argmaxhi∈H wtTΨ(xi,yi,hi)
![Page 45: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/45.jpg)
Objective
![Page 46: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/46.jpg)
Objective
![Page 47: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/47.jpg)
Train Error
![Page 48: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/48.jpg)
Train Error
![Page 49: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/49.jpg)
Test Error
![Page 50: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/50.jpg)
Test Error
![Page 51: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/51.jpg)
Time
![Page 52: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/52.jpg)
Time
![Page 53: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/53.jpg)
• Latent SVM
• Optimization
• Practice – Annealing the Tolerance – Annealing the Regularization – Self-Paced Learning – Choice of Loss Function
• Extensions
Outline – Annotation Mismatch
![Page 54: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/54.jpg)
Start with an initial estimate w0
Update
Update wt+1 as the ε-optimal solution of
min ||w||2 + C∑i ξi
wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y) - ξi
hi* = argmaxhi∈H wtTΨ(xi,yi,hi)
Repeat until convergence
Overfitting in initial iterations
![Page 55: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/55.jpg)
Repeat until convergence
C’ = C x K
and C’ = C
Start with an initial estimate w0
Update
Update wt+1 as the ε-optimal solution of
min ||w||2 + C’∑i ξi
wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y) - ξi
hi* = argmaxhi∈H wtTΨ(xi,yi,hi)
![Page 56: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/56.jpg)
Objective
![Page 57: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/57.jpg)
Objective
![Page 58: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/58.jpg)
Train Error
![Page 59: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/59.jpg)
Train Error
![Page 60: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/60.jpg)
Test Error
![Page 61: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/61.jpg)
Test Error
![Page 62: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/62.jpg)
Time
![Page 63: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/63.jpg)
Time
![Page 64: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/64.jpg)
• Latent SVM
• Optimization
• Practice – Annealing the Tolerance – Annealing the Regularization – Self-Paced Learning – Choice of Loss Function
• Extensions
Outline – Annotation Mismatch
Kumar, Packer and Koller, NIPS 2010
![Page 65: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/65.jpg)
1 + 1 = 2
1/3 + 1/6 = 1/2
eiπ+1 = 0
Math is for losers !!
FAILURE … BAD LOCAL MINIMUM
CCCP for Human Learning
![Page 66: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/66.jpg)
Euler was a Genius!!
SUCCESS … GOOD LOCAL MINIMUM
1 + 1 = 2
1/3 + 1/6 = 1/2
eiπ+1 = 0
Self-Paced Learning
![Page 67: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/67.jpg)
Start with “easy” examples, then consider “hard” ones
Easy vs. Hard
Expensive
Easy for human ≠ Easy for machine
Self-Paced Learning
Simultaneously estimate easiness and parameters Easiness is property of data sets, not single instances
![Page 68: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/68.jpg)
Start with an initial estimate w0
Update
Update wt+1 as the ε-optimal solution of
min ||w||2 + C∑i ξi
wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y) - ξi
hi* = argmaxhi∈H wtTΨ(xi,yi,hi)
CCCP for Latent SVM
![Page 69: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/69.jpg)
min ||w||2 + C∑i ξi
wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y, h) - ξi
Self-Paced Learning
![Page 70: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/70.jpg)
min ||w||2 + C∑i viξi
wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y, h) - ξi
vi ∈ {0,1}
Trivial Solution
Self-Paced Learning
![Page 71: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/71.jpg)
vi ∈ {0,1}
Large K Medium K Small K
min ||w||2 + C∑i viξi - ∑ivi/K
wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y, h) - ξi
Self-Paced Learning
![Page 72: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/72.jpg)
vi ∈ [0,1]
min ||w||2 + C∑i viξi - ∑ivi/K
wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y, h) - ξi
Large K Medium K Small K
Biconvex Problem
Alternating Convex Search
Self-Paced Learning
![Page 73: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/73.jpg)
Start with an initial estimate w0
Update
min ||w||2 + C∑i ξi - ∑i vi/K
wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y) - ξi
hi* = argmaxhi∈H wtTΨ(xi,yi,hi)
Decrease K ← K/µ
SPL for Latent SVM
Update wt+1 as the ε-optimal solution of
![Page 74: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/74.jpg)
Objective
![Page 75: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/75.jpg)
Objective
![Page 76: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/76.jpg)
Train Error
![Page 77: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/77.jpg)
Train Error
![Page 78: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/78.jpg)
Test Error
![Page 79: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/79.jpg)
Test Error
![Page 80: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/80.jpg)
Time
![Page 81: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/81.jpg)
Time
![Page 82: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/82.jpg)
• Latent SVM
• Optimization
• Practice – Annealing the Tolerance – Annealing the Regularization – Self-Paced Learning – Choice of Loss Function
• Extensions
Outline – Annotation Mismatch
Behl, Jawahar and Kumar, In Preparation
![Page 83: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/83.jpg)
Ranking Rank 1 Rank 2 Rank 3
Rank 4 Rank 5 Rank 6
Average Precision = 1
![Page 84: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/84.jpg)
Ranking Rank 1 Rank 2 Rank 3
Rank 4 Rank 5 Rank 6
Average Precision = 1 Accuracy = 1 Average Precision = 0.92 Accuracy = 0.67 Average Precision = 0.81
![Page 85: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/85.jpg)
Ranking
During testing, AP is frequently used
During training, a surrogate loss is used
Contradictory to loss-based learning
Optimize AP directly
![Page 86: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/86.jpg)
Results
Statistically significant improvement
![Page 87: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/87.jpg)
Speed – Proximal Regularization Start with an good initial estimate w0
Update
Update wt+1 as the ε-optimal solution of
min ||w||2 + C∑i ξi + Ct ||w - wt||2
wTΨ(xi,yi,hi*) - wTΨ(xi,y,h) ≥ Δ(yi, y) - ξi
hi* = argmaxhi∈H wtTΨ(xi,yi,hi)
Repeat until convergence
![Page 88: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/88.jpg)
Speed – Cascades
Weiss and Taskar, AISTATS 2010 Sapp, Toshev and Taskar, ECCV 2010
![Page 89: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/89.jpg)
Accuracy – (Self) Pacing
Pacing the sample complexity – NIPS 2010
Pacing the model complexity
Pacing the problem complexity
![Page 90: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/90.jpg)
Building Accurate Systems
Model
Inference
Learning
85%
5%
10%
Learning cannot provide huge gains without a good model
Inference cannot provide huge gains without a good model
![Page 91: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/91.jpg)
• Latent SVM
• Optimization
• Practice
• Extensions – Latent Variable Dependent Loss – Max-Margin Min-Entropy Models
Outline – Annotation Mismatch
Yu and Joachims, ICML 2009
![Page 92: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/92.jpg)
Latent Variable Dependent Loss
- wTΨ(xi,yi(w),hi(w))
Δ(yi, yi(w), hi(w)) wTΨ(xi,yi(w),hi(w)) +
![Page 93: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/93.jpg)
Latent Variable Dependent Loss
Δ(yi, yi(w), hi(w)) wTΨ(xi,yi(w),hi(w)) +
- maxhi wTΨ(xi,yi,hi)
y(w),h(w) = argmaxy,h wTΨ(x,y,h)
![Page 94: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/94.jpg)
Latent Variable Dependent Loss
Δ(yi, y, h) wTΨ(xi,y,h) + maxy,h
- maxhi wTΨ(xi,yi,hi) ≤ ξi
minw ||w||2 + C Σiξi
![Page 95: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/95.jpg)
Optimizing Precision@k
Input X = {xi, i = 1, …, n}
Annotation Y = {yi, i = 1, …, n} ∈ {-1,+1}n
Latent H = ranking
Δ(Y*, Y, H)
1-Precision@k
![Page 96: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/96.jpg)
• Latent SVM
• Optimization
• Practice
• Extensions – Latent Variable Dependent Loss – Max-Margin Min-Entropy (M3E) Models
Outline – Annotation Mismatch
Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012
![Page 97: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/97.jpg)
Running vs. Jumping Classification
0.00 0.00 0.25 0.00 0.25 0.00 0.25 0.00 0.00
Score wTΨ(x,y,h) è (-∞, +∞)
wTΨ(x,y1,h)
![Page 98: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/98.jpg)
0.00 0.24 0.00 0.00 0.00 0.00 0.01 0.00 0.00
wTΨ(x,y2,h) 0.00 0.00 0.25 0.00 0.25 0.00 0.25 0.00 0.00
wTΨ(x,y1,h)
Only maximum score used
No other useful cue?
Score wTΨ(x,y,h) è (-∞, +∞)
Uncertainty in h
Running vs. Jumping Classification
![Page 99: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/99.jpg)
Scoring function
Pw(y,h|x) = exp(wTΨ(x,y,h))/Z(x)
Prediction
y(w) = argminy Hα(Pw(h|y,x)) – log Pw(y|x)
Partition Function
Marginalized Probability
Rényi Entropy
Rényi Entropy of Generalized Distribution Gα(y;x,w)
M3E
![Page 100: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/100.jpg)
Gα(y;x,w) = 1
1-α log
Σh Pw(y,h|x)α
Σh Pw(y,h|x)
α = 1. Shannon Entropy of Generalized Distribution
- Σh Pw(y,h|x) log(Pw(y,h|x))
Σh Pw(y,h|x)
Rényi Entropy
![Page 101: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/101.jpg)
Gα(y;x,w) = 1
1-α log
Σh Pw(y,h|x)α
Σh Pw(y,h|x)
α = Infinity. Minimum Entropy of Generalized Distribution
- maxh log(Pw(y,h|x))
Rényi Entropy
![Page 102: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/102.jpg)
Gα(y;x,w) = 1
1-α log
Σh Pw(y,h|x)α
Σh Pw(y,h|x)
α = Infinity. Minimum Entropy of Generalized Distribution
- maxh wTΨ(x,y,h)
Same prediction as latent SVM
Rényi Entropy
![Page 103: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/103.jpg)
Training data {(xi,yi), i = 1,2,…,n}
Highly non-convex in w
Cannot regularize w to prevent overfitting
w* = argminw Σi Δ(yi,yi(w))
Learning M3E
![Page 104: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/104.jpg)
Δ(yi,yi(w)) Gα(yi(w);xi,w) +
Training data {(xi,yi), i = 1,2,…,n}
- Gα(yi(w);xi,w)
Δ(yi,yi(w)) ≤ Gα(yi;xi,w) + - Gα(yi(w);xi,w)
maxy{Δ(yi,y) ≤ Gα(yi;xi,w) + - Gα(y;xi,w)}
Learning M3E
![Page 105: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/105.jpg)
Training data {(xi,yi), i = 1,2,…,n}
minw ||w||2 + C Σiξi
Gα(yi;xi,w) + Δ(yi,y) – Gα(y;xi,w) ≤ ξi
When α tends to infinity, M3E = Latent SVM
Other values can give better results
Learning M3E
![Page 106: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/106.jpg)
Motif + Markov Background Model. Yu and Joachims, 2009
Motif Finding Results
![Page 107: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/107.jpg)
Part II
![Page 108: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/108.jpg)
Output Mismatch
Input x
Annotation y
Latent h
x
y = “jumping”
h
Action Detection
Mismatch between output and available annotations
Exact value of latent variable is important
Desired output during test time is (y,h)
![Page 109: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/109.jpg)
• Problem Formulation
• Dissimilarity Coefficient Learning
• Optimization
• Experiments
Outline – Output Mismatch
![Page 110: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/110.jpg)
Weakly Supervised Data
Input x
Output y ∈ {0,1,…,C}
Hidden h
x
y = 0
h
![Page 111: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/111.jpg)
Weakly Supervised Detection
Feature Φ(x,h)
Joint Feature Vector
Ψ(x,y,h)
x
y = 0
h
![Page 112: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/112.jpg)
Weakly Supervised Detection
Feature Φ(x,h)
Joint Feature Vector
Ψ(x,0,h) Φ(x,h)
0 =
x
y = 0
h
0
.
.
.
![Page 113: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/113.jpg)
Weakly Supervised Detection
Feature Φ(x,h)
Joint Feature Vector
Ψ(x,1,h) 0
Φ(x,h) =
x
y = 0
h
0
.
.
.
![Page 114: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/114.jpg)
Weakly Supervised Detection
Feature Φ(x,h)
Joint Feature Vector
Ψ(x,C,h) 0
0 =
x
y = 0
h
Φ(x,h)
.
.
.
![Page 115: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/115.jpg)
Weakly Supervised Detection
Feature Φ(x,h)
Joint Feature Vector
Ψ(x,y,h)
Score f : Ψ(x,y,h) è (-∞, +∞)
Optimize score over all possible y and h
x
y = 0
h
![Page 116: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/116.jpg)
Scoring function
wTΨ(x,y,h)
Prediction
y(w),h(w) = argmaxy,h wTΨ(x,y,h)
Linear Model
Parameters
![Page 117: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/117.jpg)
Minimizing General Loss
minw Σi Δ(yi,hi,yi(w),hi(w))
Unknown latent variable values
Supervised Samples
+ Σi Δ’(yi,yi(w),hi(w)) Weakly
Supervised Samples
![Page 118: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/118.jpg)
Minimizing General Loss
minw Σi Δ(yi,hi,yi(w),hi(w))
A single distribution to achieve two objectives
Pw(hi|xi,yi) Σhi
![Page 119: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/119.jpg)
• Problem Formulation
• Dissimilarity Coefficient Learning
• Optimization
• Experiments
Outline – Output Mismatch
Kumar, Packer and Koller, ICML 2012
![Page 120: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/120.jpg)
Problem
Model Uncertainty in Latent Variables
Model Accuracy of Latent Variable Predictions
![Page 121: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/121.jpg)
Solution
Model Uncertainty in Latent Variables
Model Accuracy of Latent Variable Predictions
Use two different distributions for the two different tasks
![Page 122: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/122.jpg)
Solution
Model Accuracy of Latent Variable Predictions
Use two different distributions for the two different tasks
Pθ(hi|yi,xi)
hi
![Page 123: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/123.jpg)
Solution Use two different distributions for the two different tasks
hi
Pw(yi,hi|xi)
(yi,hi) (yi(w),hi(w))
Pθ(hi|yi,xi)
![Page 124: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/124.jpg)
The Ideal Case No latent variable uncertainty, correct prediction
hi
Pw(yi,hi|xi)
(yi,hi) (yi,hi(w))
Pθ(hi|yi,xi)
hi(w)
![Page 125: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/125.jpg)
In Practice Restrictions in the representation power of models
hi
Pw(yi,hi|xi)
(yi,hi) (yi(w),hi(w))
Pθ(hi|yi,xi)
![Page 126: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/126.jpg)
Our Framework Minimize the dissimilarity between the two distributions
hi
Pw(yi,hi|xi)
(yi,hi) (yi(w),hi(w))
Pθ(hi|yi,xi)
User-defined dissimilarity measure
![Page 127: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/127.jpg)
Our Framework Minimize Rao’s Dissimilarity Coefficient
hi
Pw(yi,hi|xi)
(yi,hi) (yi(w),hi(w))
Pθ(hi|yi,xi)
Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi)
![Page 128: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/128.jpg)
Our Framework Minimize Rao’s Dissimilarity Coefficient
hi
Pw(yi,hi|xi)
(yi,hi) (yi(w),hi(w))
Pθ(hi|yi,xi)
- β Σh,h’ Δ(yi,h,yi,h’)Pθ(h|yi,xi)Pθ(h’|yi,xi)
Hi(w,θ)
![Page 129: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/129.jpg)
Our Framework Minimize Rao’s Dissimilarity Coefficient
hi
Pw(yi,hi|xi)
(yi,hi) (yi(w),hi(w))
Pθ(hi|yi,xi)
- (1-β) Δ(yi(w),hi(w),yi(w),hi(w))
- β Hi(θ,θ) Hi(w,θ)
![Page 130: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/130.jpg)
Our Framework Minimize Rao’s Dissimilarity Coefficient
hi
Pw(yi,hi|xi)
(yi,hi) (yi(w),hi(w))
Pθ(hi|yi,xi)
- β Hi(θ,θ) Hi(w,θ) minw,θ Σi
![Page 131: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/131.jpg)
• Problem Formulation
• Dissimilarity Coefficient Learning
• Optimization
• Experiments
Outline – Output Mismatch
![Page 132: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/132.jpg)
Optimization
minw,θ Σi Hi(w,θ) - β Hi(θ,θ)
Initialize the parameters to w0 and θ0
Repeat until convergence
End
Fix w and optimize θ
Fix θ and optimize w
![Page 133: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/133.jpg)
Optimization of θ
minθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)
hi
Pθ(hi|yi,xi)
Case I: yi(w) = yi
hi(w)
![Page 134: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/134.jpg)
Optimization of θ
minθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)
hi
Pθ(hi|yi,xi)
Case I: yi(w) = yi
hi(w)
![Page 135: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/135.jpg)
Optimization of θ
minθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)
hi
Pθ(hi|yi,xi)
Case II: yi(w) ≠ yi
![Page 136: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/136.jpg)
Optimization of θ
minθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)
hi
Pθ(hi|yi,xi)
Case II: yi(w) ≠ yi
Stochastic subgradient descent
![Page 137: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/137.jpg)
Optimization of w
minw Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi)
Expected loss, models uncertainty
Form of optimization similar to Latent SVM
Δ independent of h, implies latent SVM
Concave-Convex Procedure (CCCP)
![Page 138: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/138.jpg)
• Problem Formulation
• Dissimilarity Coefficient Learning
• Optimization
• Experiments
Outline – Output Mismatch
![Page 139: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/139.jpg)
Action Detection Input x
Output y = “Using Computer” Latent Variable h
PASCAL VOC 2011
60/40 Train/Test Split 5 Folds
Jumping
Phoning
Playing Instrument
Reading
Riding Bike
Riding Horse
Running
Taking Photo
Using Computer
Walking
Train Input xi Output yi
![Page 140: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/140.jpg)
Results – 0/1 Loss
0
0.2
0.4
0.6
0.8
1
1.2
Fold 1 Fold 2 Fold 3 Fold 4 Fold 5
Average Test Loss
LSVM Our
Statistically Significant
![Page 141: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/141.jpg)
Results – Overlap Loss
0.63 0.64 0.65 0.66 0.67 0.68 0.69
0.7 0.71 0.72 0.73 0.74
Fold 1 Fold 2 Fold 3 Fold 4 Fold 5
Average Test Loss
LSVM Our
Statistically Significant
![Page 142: Loss-based Learning with Weak Supervisionmpawankumar.info/tutorials/cvpr2013/slides/CVPR... · Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h)](https://reader034.vdocuments.site/reader034/viewer/2022051908/5ffc7e57851a3d6a65169e00/html5/thumbnails/142.jpg)
Questions?
http://www.centrale-ponts.fr/personnel/pawan