andrea frome, yoram singer, fei sha, jitendra malik
DESCRIPTION
Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification. Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik. Goal. Nearest neighbor classification. D ( , ). Learning a Distance Metric from Relative Comparisons. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/1.jpg)
Learning Globally-Consistent Local Distance Functions for Shape-Based
Image Retrieval and Classification
Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik
![Page 2: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/2.jpg)
Goal
![Page 3: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/3.jpg)
Nearest neighbor classification
D ( , )
![Page 4: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/4.jpg)
Learning a Distance Metric from Relative Comparisons
[Schulz & Joachims, NIPS ’03]
D ( , ) D ( , )
D ( , ) = ( - )T ( - )
![Page 5: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/5.jpg)
![Page 6: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/6.jpg)
Approach
image j
image i
![Page 7: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/7.jpg)
Approach
image j
image i
dji,m
![Page 8: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/8.jpg)
Approach
image j
image i
Dji =Σ wj,mdji,m
image k
![Page 9: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/9.jpg)
Approach
image j
image i
Dki
image k
Dji <
![Page 10: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/10.jpg)
Core
image j
wj,m ?
image j
image i
Dki
image k
Dji<
![Page 11: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/11.jpg)
Derivations
• Notation• Large-margin formulation• Dual problem• Solution
![Page 12: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/12.jpg)
NotationsDji =Σ wj,mdji,m Dji =wj ·dji
Dki > Dji wk ·dki > wj ·dji
for triplet i, j, k
wk ·dki - wj ·dji ≥ 1
W w1w2…wk…wj…
Xijk 0 0 … dki…-dji…
wk ·dki - wj ·dji ≥ 1 W·Xijk ≥ 1
kji
ijkXW,,
]1[
![Page 13: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/13.jpg)
Large-margin formulation
kji
ijkXW,,
]1[
kji
ijkXWCW,,
2 ]1[||||21
![Page 14: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/14.jpg)
SVM
![Page 15: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/15.jpg)
SVM
![Page 16: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/16.jpg)
SVM
![Page 17: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/17.jpg)
SVM
![Page 18: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/18.jpg)
Soft-margin SVM
![Page 19: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/19.jpg)
Derivation
0,0,0
)1(||||21),,,,( 2
ijk
ijkijkijkijk
ijkijkijk
ijk WXWCWWL
Cijk 0ijkijkijk
CL
ijkijkijk XW
WL
ijk
ijkijk XW
![Page 20: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/20.jpg)
Dual
0
1||||
1)(
2
abcabcabcabc
abcijkijkijk
abcijk
ijkijkabc
XXXX
XXF
22 ||||
)(1
||||
)(1
abc
abcabcijk
ijkijk
abc
abcabcabcijk
ijkijk
abc X
XX
X
XXX
![Page 21: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/21.jpg)
Details – Features and descriptors
• Find ~400 features per image• Compute geometric blur descriptor
![Page 22: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/22.jpg)
Descriptors
• Geometric blur
![Page 23: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/23.jpg)
Descriptors
• Two sizes of geometric blur (42 pixels and 70 pixels)– Each is 204 dimensions (4 orientations and 51 samples each)
• HSV histograms of 42-pixel patches
![Page 24: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/24.jpg)
Choosing triplets
• Caltech101 – at 15 images per class– 31.8 million triplets– Many are easy to satisfy
• For each image j, for each feature– Find the N images I with closest features– For each negative example i in I, form triplets (j, k, i)
• Eliminates ~ half of triplets
![Page 25: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/25.jpg)
Choosing C
![Page 26: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/26.jpg)
Choosing C• Train with multiple values of C, testing on a held-
out part of the training set• Choose whichever gives the best results
• For each C, run online version of the training algorithm– Make one sweep through training triplets– For each misclassified triplet (i,j,k), update weights for
the three images– Choose C which gets the most right answers
![Page 27: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/27.jpg)
Results
• At 15 training examples per class: 63.2% (~3% improvement)• At 20 training examples per class: 66.6% (~5% improvement)
![Page 28: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/28.jpg)
Results
• Confusion matrix
Hardest categories: crocodile, cougar_body, cannon, bass
![Page 29: Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik](https://reader036.vdocuments.site/reader036/viewer/2022070419/56815b13550346895dc8c31b/html5/thumbnails/29.jpg)
Questions
• Is there any disadvantage to a non-metric distance function?
• Could the images be embedded in a metric space?• Why not learn everything?
– Include a feature for each image pixel– Include multiple types of descriptors
• Could this be used for to do unsupervised learning for sets of tagged images (e.g., for image segmentation)?
• Can you learn a single distance per class?