metric learning for large-scale image...

Metric Learning for Large-Scale ImageClassification:Generalizing to New Classes at Near-Zero Cost

Florent Perronnin1

work published at ECCV 2012 with:Thomas Mensink1,2 Jakob Verbeek2 Gabriela Csurka1

1 Xerox Research Centre Europe, 2 INRIA

NIPS BigVision WorkshopDecember 7, 2012

1

Motivation

Real-life image datasets are always evolving:• new images are added every second• new labels, tags, faces and products appear over time• for example: Facebook, Flickr, Twitter, Amazon. . .

Need to annotate these items for indexing and retrieval

Therefore, we are interested in methods for large-scalevisual classification where we can add new images andnew classes at near-zero cost on the fly

2

Outline

1. Introduction

2. Distance Based Classifiers

3. Metric learning for NCM Classifier

4. Experimental Evaluation

5. Conclusion

3

IntroductionRecent focus on large-scale image classification

• ImageNet data set [1]• Currently over 14 million images, and 20 thousand classes

Standard large-scale classification pipeline:• High dim. features: Super Vector [3] & Fisher Vector [4]• Linear 1-vs-Rest SVM classifiers [2,3,4]• Stochastic Gradient Descent (SGD) training [3,4]

→ In this work, we take features for granted and focus on thelearning problem.

1. Deng et al., ImageNet: A large-scale hierarchical image database, CVPR’092. Deng et al., What does classifying 10,000 image categories tell us?, ECCV’103. Lin et al., Large-scale image classification: Fast feature extraction, CVPR’114. Sanchez and Perronnin, High-dimensional signature compression for large-scale

image classification, CVPR’11

4

Challenges of open-ended datasets1-vs-Rest + SGD might look ideal for our problem:

• 1-vs-Rest: classes are trained independently• SGD: online algorithm can accomodate new data

Still several issues need to be addressed:• Given a new sample, feed it to all classifiers?→ costly and suboptimal [1]

• How to balance the negatives and positives?• How to regularize (and choose the step-size)?

→We turn to distance-based classifiers.

1. Perronnin et al., Towards good practice in large-scale learning for imageclassification, CVPR’12

5

Outline

1. Introduction




5. Conclusion

6

Distance Based Classifiers

Classify based on the distance between images, orbetween image and class-representatives:

• k-Nearest Neighbors• Nearest Class Mean Classification

Trivial addition of new images or new classes

Critically depends on the distance function

7

k-Nearest Neighbor ClassifierAssign an image i to the most common class among the kclosest images from the training set

3 Very flexible non-linear model

3 Easy to integrate new images

3 Easy to integrate new classes

7 Expensive at test time!

Metric Learning: Large Margin Nearest Neighbors [1]

1. Weinberger et al., Distance Metric Learning for LMNN Classification, NIPS’06

8

Nearest Class Mean ClassifierAssign an image i to the class with the closest class mean

µc =1

Nc

∑i:yi=c

x i

c∗ = argminc

d(x ,µc)

3 Very fast at test time: linear model

3 Easy to integrate new images

3 Easy to integrate new classes

7 Class only represented with mean,not flexible enough?

We introduce metric learning

9

Outline

1. Introduction




5. Conclusion

10

Mahalanobis Distance Learning

d(x ,x ′) = (x − x ′)>M(x − x ′)

dW (x ,x ′) = ||Wx −Wx ′||22

1. M = I Euclidean distance• Likely to be suboptimal

2. M : D × D Full Mahalanobis distance• Huge number of parameters for large D• Expensive to compute distances in O

(D2)

3. M = W>W Low-Rank Projection W : m × D• Controllable number of parameters: m × D• Allows for compression of images to only m dimensions• Cheap computation of distances in O

(m2)

11

NCM Metric Learning (NCMML)

Probabilistic formulation using the soft-min function:

p(c|x) =exp−dW (x ,µc)∑C

c′=1 exp−dW (x ,µc′)

Corresponds to class posterior in generative model:→ p(x |c) = N (x ; µc ,Σ), with shared covariance matrix

Crucial point: parameters W and {µc , c = 1, . . . ,C} can belearned independently on different data subsets.

12

NCM Metric Learning (NCMML)

Discriminative maximum likelihood training:• We maximize with respect to W :

L(W ) =N∑

i=1

ln p(yi |x i )

• Implicit regularization through the rank of W

Stochastic Gradient Descent (SGD): at time t• Pick a random sample (x t , yt )• Update:

W (t) = W (t−1) + ηt∇W=W (t−1) ln p(yt |xt )

→ mini-batch more efficient

13

Illustration of Learned Distances

14

Relationship to FDAThree non-linearly separable classes

15

Relationship to FDAFisher Discriminant Analysis: maximizes variance betweenall class means

15

Relationship to FDANCMML: maximizes variance between nearby class means

15

Relation to other linear classifiers

fc(x) = bc + wc>x

Linear SVM• Learn {bc ,wc} per class

WSABIE [1]• wc = vcW W ∈ Rd×D

• Learn {vc} per class and shared W

Nearest Class Mean• bc = ||Wµc ||22, wc = −2

(µc>W>W

)• Learn shared W

1. Weston et al., Scaling up to large vocabulary image annotation, IJCAI’11

16

Outline

1. Introduction




5. Conclusion

17

Experimental Evaluation

Data sets:• ILSVRC’10: classes = 1,000, images = 1.2M training + 50K

validation + 150K test• INET10K: classes ≈ 10K, images = 4.5M training + 50K

validation + 4.5M test

Features:• 4K and 64K dimensional Fisher Vectors [1]• PQ Compression on 64K features [2]

1. Perronnin et al., Improving the Fisher kernel for image classification, ECCV’102. Jegou et al., Product quantization for nearest neighbor search, PAMI’11

18

Evaluation: ILSVRC’10 (Top 5 acc.)k-NN & NCM improve with metric learningNCM outperforms more flexible k-NN

NCM competitive with SVM and WSABIE

4K Fisher VectorsProjection dimensionality 256 512 1024 `2

k-NN, LMNN [1] - dynamic 61.0 60.9 59.6 44.1NCM, learned metric 62.6 63.0 63.0 32.0

Baseline: 1-vs-Rest SVM 61.8

1. Weinberger et al., Distance Metric Learning for LMNN Classification, NIPS’06

2. Weston et al., Scaling up to large vocabulary image annotation, IJCAI’11

19

Evaluation: ILSVRC’10 (Top 5 acc.)k-NN & NCM improve with metric learningNCM outperforms more flexible k-NNNCM competitive with SVM and WSABIE

4K Fisher VectorsProjection dimensionality 256 512 1024 `2

k-NN, LMNN [1] - dynamic 61.0 60.9 59.6 44.1NCM, learned metric 62.6 63.0 63.0 32.0WSABIE [2] 61.6 61.3 61.5

Baseline: 1-vs-Rest SVM 61.8

1. Weinberger et al., Distance Metric Learning for LMNN Classification, NIPS’062. Weston et al., Scaling up to large vocabulary image annotation, IJCAI’11

19

Generalization on INET10K (Top 1 acc.)Nearest Class Mean Classifier

• Compute means of 10K classes, in about 1 CPU hour• Re-use metric learned on ILSVRC’10

1-vs-Rest SVM baseline• Train 10K SVM classifiers, in about 280 CPU days

Feat. dim. 64K 21K 128K ≈ 60KMethod NCM SVM SVM [1] SVM [2] DL [3]

Flat top-1 13.9 21.9 6.4 19.1 19.2

1. Deng et al., What does classifying 10,000 image categories tell us?, ECCV’102. Perronnin et al., Good practice in large-scale image classification, CVPR’123. Le et al., Building high-level features using large scale unsupervised learning,

ICML’12

20

Transfer Learning - Zero-Shot PriorUse ImageNet class hiearchy to estimate a mean, [1]

Internal nodes — Training nodes — New class

1. Rohrbach et al., Evaluating knowledge transfer and zero-shot learning in alarge-scale setting, CVPR’11

21

Transfer Learning - Results ILSVRC’10

Step 1 Metric learning on 800 classesStep 2 Estimate means for remaining 200 for evaluation:

• Data mean (Maximum Likelihood)• Zero-Shot prior + data mean (Maximum a Posteriori)

0 1 10 100 10000

20

40

60

80

Number of samples per class

Top-5

accura

cy

22

Outline

1. Introduction




5. Conclusion

23

ConclusionNearest Class Mean (NCM) Classification

We proposed NCM Metric LearningOutperforms k-NN, on par with SVM and WSABIE

Advantages of NCM over alternatives:Allows adding new images and classes at near zero costShows competitive results on unseen classesCan benefit from class priors for small sample sizes

Further improvementsExtension using multiple class centroids [1]

1. Mensink et al., Large Scale Metric Learning for Distance-Based ImageClassification, Tech-report, 2012

24

Metric Learning for Large-Scale ImageClassification:Generalizing to New Classes at Near-Zero Cost

Florent Perronnin1

work published at ECCV 2012 with:Thomas Mensink1,2 Jakob Verbeek2 Gabriela Csurka1

1 Xerox Research Centre Europe, 2 INRIA

NIPS BigVision WorkshopDecember 7, 2012

25

metric learning for large-scale image...

Documents