clusterfit: improving generalization of visual...

45
ClusterFit: Improving Generalization of Visual Representations Xueting Yan*, Ishan Misra*, Abhinav Gupta, Deepti Ghadiyaram†, Dhruv Mahajan† CVPR 2020 STRUCT Group Seminar Presenter: Wenjing Wang 2020.05.17

Upload: others

Post on 12-Mar-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

ClusterFit: Improving Generalization of Visual Representations

Xueting Yan*, Ishan Misra*, Abhinav Gupta, Deepti Ghadiyaram†, Dhruv Mahajan† CVPR 2020

STRUCT Group Seminar Presenter: Wenjing Wang

2020.05.17

Page 2: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

2

OUTLINE

➤ Authorship

➤ Background

➤ Proposed Method

➤ Experimental Results

➤ Conclusion

Page 3: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

9

OUTLINE

➤ Authorship

➤ Background

➤ Proposed Method

➤ Experimental Results

➤ Conclusion

Page 4: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

10

BACKGROUND

➤ Background

➤ Overview of the proposed method

➤ Compared with existing methods

Page 5: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

11

BACKGROUND

➤ Background

➤ Overview of the proposed method

➤ Compared with existing methods

Page 6: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

12

BACKGROUND

➤ Weak or self-supervision pre-training

Page 7: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

13

BACKGROUND

➤ Weakly supervised learning

• Defining the proxy tasks using the associated meta-data

• Hashtags predictions

• Search queries prediction

• GPS

• Word or n-grams predictions

Page 8: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

14

BACKGROUND

➤ Self-supervised Learning

• Defining the proxy tasks without extra data

• Domain agnostic

• Domain-specific information, e.g. spatial structure

• Color and illumination

• Temporal structure

Page 9: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

15

BACKGROUND

➤ Weak or self-supervision pre-training

• Pre-training proxy not well-aligned with the transfer tasks

• Label noise: polysemy (apple the fruit vs. Apple Inc.), linguistic ambiguity, lack of visualness of tags (#love)

• The last layer is more “aligned” with the proxy objective

➤ This paper: avoid overfitting to the proxy objective

• Smoothing the feature space learned via proxy objectives

Page 10: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

16

BACKGROUND

➤ Background

➤ Overview of the proposed method

➤ Compared with existing methods

Page 11: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

17

BACKGROUND

➤ Proposed method: ClusterFit

• Step 1. Cluster: feature clustering

• Step 2.Fit: predict cluster assignments

Page 12: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

18

BACKGROUND

➤ Proposed method: ClusterFit

Page 13: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

19

BACKGROUND

➤ Background

➤ Overview of the proposed method

➤ Compared with existing methods

Page 14: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

➤ Cluster-based self-supervised learning

• DeepCluster [1]

• DeeperCluster [2]

20

BACKGROUND

[1] Mathilde Caron, Piotr Bojanowski, Armand Joulin, Matthijs Douze: Deep Clustering for Unsupervised Learning of Visual Features. ECCV 2018 [2] Mathilde Caron, Piotr Bojanowski, Julien Mairal, Armand Joulin: Unsupervised Pre-Training of Image Features on Non-Curated Data. ICCV 2019

Page 15: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

21

BACKGROUND

➤ Cluster-based self-supervised learning

• DeepCluster [1]

• DeeperCluster [2]

[1] Mathilde Caron, Piotr Bojanowski, Armand Joulin, Matthijs Douze: Deep Clustering for Unsupervised Learning of Visual Features. ECCV 2018 [2] Mathilde Caron, Piotr Bojanowski, Julien Mairal, Armand Joulin: Unsupervised Pre-Training of Image Features on Non-Curated Data. ICCV 2019

Page 16: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

22

BACKGROUND

➤ Cluster-based self-supervised learning

• DeepCluster [1], DeeperCluster [2]

• Require alternate optimization

➤ This paper

• No alternate optimization

• More stable and computationally efficient

[1] Mathilde Caron, Piotr Bojanowski, Armand Joulin, Matthijs Douze: Deep Clustering for Unsupervised Learning of Visual Features. ECCV 2018 [2] Mathilde Caron, Piotr Bojanowski, Julien Mairal, Armand Joulin: Unsupervised Pre-Training of Image Features on Non-Curated Data. ICCV 2019

Page 17: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

23

BACKGROUND

➤ Model Distillation

• Transferring knowledge from a teacher to a student

➤ This paper

• Distilling knowledge from a higher capacity teacher model Npre to a lower-capacity student model Ncf

Page 18: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

24

OUTLINE

➤ Authorship

➤ Background

➤ Proposed Method

➤ Experimental Results

➤ Conclusion

Page 19: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

25

PROPOSED METHOD

➤ ClusterFit

• Use the second-last layer of to extract features

• Cluster features using k-means into K groups

• Train a new network from scratch with the K-labels

Page 20: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

26

PROPOSED METHOD

➤ Why?

• ClusterFit: a lossy compression scheme

• Captures the essential visual invariances in the feature space

• Gives the ‘re-learned’ network an opportunity to learn features that are less sensitive to the original pre-training objective → making them more transferable.

Page 21: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

27

PROPOSED METHOD

➤ Notes

• Npre is trained on Dpre

• Ncf is trained on Dcf

• Dtar is the target dataset

Page 22: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

28

PROPOSED METHOD

➤ Control Experiment using Synthetic Noise

• Adding varying amounts (p%) of uniform random label noise

• Npre: pre-train on noisy label, then fixed and train linear classifiers

• Dpre = Dcf = ImageNet-1K

• Npre = Ncf = ResNet-50

• Dtar = ImageNet-1K, ImageNet-9K, iNaturalist

Page 23: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

29

PROPOSED METHOD

➤ Control Experiment using Synthetic Noise

Page 24: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

30

PROPOSED METHOD

➤ Control Experiment using Synthetic Noise

Page 25: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

31

OUTLINE

➤ Authorship

➤ Background

➤ Proposed Method

➤ Experimental Results

➤ Conclusion

Page 26: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

32

EXPERIMENTAL RESULTS

➤ Benchmarking

➤ Analysis of ClusterFit

Page 27: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

33

EXPERIMENTAL RESULTS

➤ Benchmarking

➤ Analysis of ClusterFit

Page 28: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

34

EXPERIMENTAL RESULTS

➤ Compared Methods

• Distillation

• A weighted average of 2 loss functions:

• (a) cross-entropy with soft targets computed using Npre

• (b) cross-entropy with labels in weakly-supervised setup

• Prototype

• Unlike random cluster initialization

• Use label information in Dcf to initialize cluster centers

• Longer pre-training (Npre 2×)

Page 29: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

35

EXPERIMENTAL RESULTS

➤ Benchmarking

• Weakly-supervised images

• Weakly-supervised videos

• Self-supervised images

Page 30: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

36

EXPERIMENTAL RESULTS

➤ Weakly-Supervised Images

• Dpre = Dcf = IG-ImageNet-1B

• Npre = Ncf = ResNet-50

• Dtar = ImageNet-1K, ImageNet-9K, Places365, iNaturalist

Page 31: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

37

EXPERIMENTAL RESULTS

➤ Weakly-Supervised Images

• Results

• ImageNet-1K: the hand-crafted label alignment of the IG-ImageNet-1B with ImageNet-1K

Page 32: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

38

EXPERIMENTAL RESULTS

➤ Weakly-Supervised Videos

• Dpre = Dcf = IG-Verb-19M

• Npre = Ncf = R(2+1)D-34 [1]

• Dtar = Kinetics, Sports1M, Something-Something V1

[1] Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, Manohar Paluri: A Closer Look at Spatiotemporal Convolutions for Action Recognition. CVPR 2018

Page 33: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

39

EXPERIMENTAL RESULTS

➤ Weakly-Supervised Videos

• Results

Page 34: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

40

EXPERIMENTAL RESULTS

➤ Self-Supervised Images

• Dpre = Dcf = ImageNet-1K, JigSaw & Rotation

• Npre = Ncf = ResNet-50

• Dtar = VOC07, ImageNet-1K, Places205, iNaturalist

Page 35: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

41

EXPERIMENTAL RESULTS

➤ Self-Supervised Images

• Results

Page 36: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

42

Page 37: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

43

EXPERIMENTAL RESULTS

➤ Self-Supervised Images

• Layer-wise results

Page 38: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

44

EXPERIMENTAL RESULTS

➤ Benchmarking

➤ Analysis of ClusterFit

Page 39: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

45

EXPERIMENTAL RESULTS

➤ Relative model capacity of Npre and Ncf

• Dpre = IG-Verb-19M, Dcf = IG-Verb-62M

• Ncf = R(2+1)D-18 (33M parameters)

• Npre = R(2+1)D-18, R(2+1)D-34 (64M parameters)

Page 40: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

46

EXPERIMENTAL RESULTS

➤ Relative model capacity of Npre and Ncf

• Results

Page 41: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

47

EXPERIMENTAL RESULTS

➤ Unsupervised vs. Per-Label Clustering

• Per-label clustering:

• Cluster videos belonging to each label into kl clusters

Page 42: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

48

EXPERIMENTAL RESULTS

➤ Properties of Dpre

• Number of labels

• IG-Verb-62M (438 weak verb labels)

• Label number: 10, 30, 100, 438

• Reducing the number of labels implies reduced content diversity

Page 43: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

49

EXPERIMENTAL RESULTS

➤ Properties of Dpre

• Results

Page 44: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

50

OUTLINE

➤ Authorship

➤ Background

➤ Proposed Method

➤ Experimental Results

➤ Conclusion

Page 45: ClusterFit: Improving Generalization of Visual Representations39.96.165.147/Seminar/WenjingWang_200517.pdf · 2020. 5. 18. · 15. BACKGROUND. Weak or self-supervision pre-training

51

CONCLUSION

➤ First clustering the original feature space and re-learning a new model on cluster assignments

➤ Improves the generalizability for weakly- and self-supervised learning