clusterfit: improving generalization of visual...

ClusterFit: Improving Generalization of Visual Representations

Xueting Yan*, Ishan Misra*, Abhinav Gupta, Deepti Ghadiyaram†, Dhruv Mahajan† CVPR 2020

STRUCT Group Seminar Presenter: Wenjing Wang

2020.05.17

OUTLINE

➤ Authorship

➤ Background

➤ Proposed Method

➤ Experimental Results

➤ Conclusion

OUTLINE

➤ Authorship

➤ Background

➤ Proposed Method

➤ Conclusion

BACKGROUND

➤ Background

➤ Overview of the proposed method

➤ Compared with existing methods

BACKGROUND

➤ Background

BACKGROUND

➤ Weak or self-supervision pre-training

BACKGROUND

➤ Weakly supervised learning

• Defining the proxy tasks using the associated meta-data

• Hashtags predictions

• Search queries prediction

• GPS

• Word or n-grams predictions

BACKGROUND

➤ Self-supervised Learning

• Defining the proxy tasks without extra data

• Domain agnostic

• Domain-specific information, e.g. spatial structure

• Color and illumination

• Temporal structure

BACKGROUND

➤ Weak or self-supervision pre-training

• Pre-training proxy not well-aligned with the transfer tasks

• Label noise: polysemy (apple the fruit vs. Apple Inc.), linguistic ambiguity, lack of visualness of tags (#love)

• The last layer is more “aligned” with the proxy objective

➤ This paper: avoid overfitting to the proxy objective

• Smoothing the feature space learned via proxy objectives

BACKGROUND

➤ Background

BACKGROUND

➤ Proposed method: ClusterFit

• Step 1. Cluster: feature clustering

• Step 2.Fit: predict cluster assignments

BACKGROUND

➤ Proposed method: ClusterFit

BACKGROUND

➤ Background

➤ Cluster-based self-supervised learning

• DeepCluster [1]

• DeeperCluster [2]

BACKGROUND

[1] Mathilde Caron, Piotr Bojanowski, Armand Joulin, Matthijs Douze: Deep Clustering for Unsupervised Learning of Visual Features. ECCV 2018 [2] Mathilde Caron, Piotr Bojanowski, Julien Mairal, Armand Joulin: Unsupervised Pre-Training of Image Features on Non-Curated Data. ICCV 2019

BACKGROUND

• DeepCluster [1]

• DeeperCluster [2]

BACKGROUND

• DeepCluster [1], DeeperCluster [2]

• Require alternate optimization

➤ This paper

• No alternate optimization

• More stable and computationally efficient

BACKGROUND

➤ Model Distillation

• Transferring knowledge from a teacher to a student

➤ This paper

• Distilling knowledge from a higher capacity teacher model Npre to a lower-capacity student model Ncf

OUTLINE

➤ Authorship

➤ Background

➤ Proposed Method

➤ Conclusion

PROPOSED METHOD

➤ ClusterFit

• Use the second-last layer of to extract features

• Cluster features using k-means into K groups

• Train a new network from scratch with the K-labels

PROPOSED METHOD

➤ Why?

• ClusterFit: a lossy compression scheme

• Captures the essential visual invariances in the feature space

• Gives the ‘re-learned’ network an opportunity to learn features that are less sensitive to the original pre-training objective → making them more transferable.

PROPOSED METHOD

➤ Notes

• Npre is trained on Dpre

• Ncf is trained on Dcf

• Dtar is the target dataset

PROPOSED METHOD

➤ Control Experiment using Synthetic Noise

• Adding varying amounts (p%) of uniform random label noise

• Npre: pre-train on noisy label, then fixed and train linear classifiers

• Dpre = Dcf = ImageNet-1K

• Npre = Ncf = ResNet-50

• Dtar = ImageNet-1K, ImageNet-9K, iNaturalist

PROPOSED METHOD

OUTLINE

➤ Authorship

➤ Background

➤ Proposed Method

➤ Conclusion

EXPERIMENTAL RESULTS

➤ Benchmarking

➤ Analysis of ClusterFit

➤ Benchmarking

➤ Compared Methods

• Distillation

• A weighted average of 2 loss functions:

• (a) cross-entropy with soft targets computed using Npre

• (b) cross-entropy with labels in weakly-supervised setup

• Prototype

• Unlike random cluster initialization

• Use label information in Dcf to initialize cluster centers

• Longer pre-training (Npre 2×)

➤ Benchmarking

• Weakly-supervised images

• Weakly-supervised videos

• Self-supervised images

➤ Weakly-Supervised Images

• Dpre = Dcf = IG-ImageNet-1B

• Dtar = ImageNet-1K, ImageNet-9K, Places365, iNaturalist

➤ Weakly-Supervised Images

• Results

• ImageNet-1K: the hand-crafted label alignment of the IG-ImageNet-1B with ImageNet-1K

➤ Weakly-Supervised Videos

• Dpre = Dcf = IG-Verb-19M

• Npre = Ncf = R(2+1)D-34 [1]

• Dtar = Kinetics, Sports1M, Something-Something V1

[1] Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, Manohar Paluri: A Closer Look at Spatiotemporal Convolutions for Action Recognition. CVPR 2018

➤ Weakly-Supervised Videos

• Results

➤ Self-Supervised Images

• Dpre = Dcf = ImageNet-1K, JigSaw & Rotation

• Dtar = VOC07, ImageNet-1K, Places205, iNaturalist

• Results

• Layer-wise results

➤ Benchmarking

➤ Relative model capacity of Npre and Ncf

• Dpre = IG-Verb-19M, Dcf = IG-Verb-62M

• Ncf = R(2+1)D-18 (33M parameters)

• Npre = R(2+1)D-18, R(2+1)D-34 (64M parameters)

➤ Relative model capacity of Npre and Ncf

• Results

➤ Unsupervised vs. Per-Label Clustering

• Per-label clustering:

• Cluster videos belonging to each label into kl clusters

➤ Properties of Dpre

• Number of labels

• IG-Verb-62M (438 weak verb labels)

• Label number: 10, 30, 100, 438

• Reducing the number of labels implies reduced content diversity

➤ Properties of Dpre

• Results

OUTLINE

➤ Authorship

➤ Background

➤ Proposed Method

➤ Conclusion

CONCLUSION

➤ First clustering the original feature space and re-learning a new model on cluster assignments

➤ Improves the generalizability for weakly- and self-supervised learning

clusterfit: improving generalization of visual...

Documents

test-time training with self-supervision for generalization...

cap2det: learning to amplify weak caption supervision for...

generalization from qualitative inquiry -...

weak lower semicontinuity of integral functionals and...

a weak supervision approach to detecting visual anomalies...

lecture notes on weak supervision - machine...

multi-source weak supervision for saliency...

loss-based learning with weak supervision m. pawan kumar

large-scale object recognition with weak supervision

loss-based learning with weak supervision

quality assessment in the framework of map...

learning object detectors with weak supervision

weak supervision for fake news detection via reinforcement...

knowledge-based weak supervision for information extraction...

promoting generalization of positive behavior change ·...

learning to segment under various forms of weak supervision...

early detection of fake news with multi-source weak social...

test-time training with self-supervision for generalization...

boosting: foundations and algorithmsconsequences of margins...

tok 11 unit 4 - reason presentation 1516 (actual) ·...