few-shot and zero-shot learning

47
Few-Shot and Zero-Shot Learning Xiaolong Wang

Upload: others

Post on 24-May-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Few-Shot and Zero-Shot Learning

Few-Shot and Zero-Shot Learning

Xiaolong Wang

Page 2: Few-Shot and Zero-Shot Learning

This Class

• Few-shot learning

• Meta-learning for few-shot learning

• Zero-shot learning

Page 3: Few-Shot and Zero-Shot Learning

Few-shot Learning

Page 4: Few-Shot and Zero-Shot Learning

The Problem

• Humans can learn a novel concept from a few samples

• Goal: let machine learning algorithms learn from a few samples

(Saiga Antelope)

❌ ✔

Page 5: Few-Shot and Zero-Shot Learning

Introduction

• Issue: learning with insufficient data causes overfitting

Page 6: Few-Shot and Zero-Shot Learning

Introduction

• Intuition: humans can learn quickly, as they have a lot of relevant experience

… …

Page 7: Few-Shot and Zero-Shot Learning

Introduction

• Solution: transfer learning

• Base classes: classes with sufficient samples (training set)

• Novel classes: classes with a few samples… …

Page 8: Few-Shot and Zero-Shot Learning

N-way K-shot Task

• N novel classes• Support-set: N×K images (training set)• Query-set: images to classify, typically N×Q

• Common evaluation protocol• Sampled from dataset to evaluate

• “task” / “FSL task” denotes this task by default…

Page 9: Few-Shot and Zero-Shot Learning

Main Types of FSL Algorithms

• Transferring standard classification model• Nearest neighbor/centroid• Fine-tuning

• Meta-learning• Metric-based• Optimization-based

Page 10: Few-Shot and Zero-Shot Learning

Transferring Standard Classification Model

Page 11: Few-Shot and Zero-Shot Learning

Baseline #1: Nearest centroid

1. Train a classifier for all base classes (1st stage)

Page 12: Few-Shot and Zero-Shot Learning

Baseline #1: Nearest centroid

1. Train a classifier for all base classes (1st stage)

2. Remove the last FC layer and get a feature encoder

Page 13: Few-Shot and Zero-Shot Learning

Baseline #1: Nearest centroid

1. Train a classifier for all base classes (1st stage)

2. Remove the last FC layer and get a feature encoder

3. In a FSL task: compute the mean feature for each class in support set, classify query set by nearest neighbor

Page 14: Few-Shot and Zero-Shot Learning

Baseline #1: Nearest centroid

• mean feature: “prototype” of a class

• Can be also viewed as estimated weights of a FC layer

• Distance: square Euclidean / cosine similarity

Page 15: Few-Shot and Zero-Shot Learning

Improving #1: Cosine Classifier

Use cosine similarity for both:• Training (1st stage): replace the last FC layer

• Inference: cosine distance to prototypes

scaler 𝜏 is a learnable parameter

Gidaris et al. Dynamic Few-Shot Visual Learning without Forgetting. CVPR 2018

Page 16: Few-Shot and Zero-Shot Learning

Improving #1: Cosine Classifier

Gidaris et al. Dynamic Few-Shot Visual Learning without Forgetting. CVPR 2018

Page 17: Few-Shot and Zero-Shot Learning

Improving #1: Cosine Classifier

(ablation study on validation set with generalized FSL setting, focus Novel only)

Page 18: Few-Shot and Zero-Shot Learning

Baseline #2: Fine-tuning

1. Train a classifier for all base classes (1st stage)2. In a FSL task: Fine-tune with support set

Fine-tuning may cause overfittingOption: Fine-tuning the last FC layer

Page 19: Few-Shot and Zero-Shot Learning

Improving #2: “Baseline++”

• Use cosine-classifier for both the 1st stage and fine-tuning

Chen et al. A Closer Look at Few-shot Classification. ICLR 2019

Page 20: Few-Shot and Zero-Shot Learning

Improving #2: “Baseline++”

Page 21: Few-Shot and Zero-Shot Learning

How to get a good representation for FSL?Idea: let the learning objective describe our goal

—— directly optimize towards the FSL tasks

Page 22: Few-Shot and Zero-Shot Learning

Meta-Learning for FSL

• Learning the model by optimizing towards FSL tasks sampled from images in training set (base classes)

Page 23: Few-Shot and Zero-Shot Learning

Meta-Learning for FSL• A Task: N-way K-shot (and Q-query). N*(K+Q) images1. Sample a task (support-set + query-set) from base classes2. A model processes the support-set, then classifies samples in query-set3. Compute the loss of query-set classification (using ground-truths), optimize towards the loss

http://web.stanford.edu/class/cs330/

Page 24: Few-Shot and Zero-Shot Learning

Meta-Learning for FSL1. Sample a task (support-set + query-set) from base classes2. A model processes the support-set, then classifies samples in query-set3. Compute the loss of query-set classification (using ground-truths), optimize towards the loss

The key is Step 2: a differentiable algorithm

• A differentiable algorithm↔ A meta-learning method

Page 25: Few-Shot and Zero-Shot Learning

Meta-Learning for FSL1. Sample a task (support-set + query-set) from base classes2. A model processes the support-set, then classifies samples in query-set3. Compute the loss of query-set classification (using ground-truths), optimize towards the loss

• Metric-based: Get features of the support-set, classify the query-set by feature comparison

• Optimization-based: the model optimizes towards the support-set for a few steps, then classifies the query-set

Page 26: Few-Shot and Zero-Shot Learning

Metric-Based Meta-Learning

Page 27: Few-Shot and Zero-Shot Learning

Matching Network

• Get features of support / query images• Classify query images by nearest-neighbor (cosine distance)

Vinyals et al. Matching Networks for One Shot Learning. NeurIPS 2016

Page 28: Few-Shot and Zero-Shot Learning

Prototypical Network

• Get mean features for each class in support set• Classify query images to nearest class center

Snell et al. Prototypical Networks for Few-shot Learning. NeurIPS 2017

Page 29: Few-Shot and Zero-Shot Learning

Prototypical Network

Simplifies Matching Network.

Differences:1. Merge class features by averaging, instead of 1-to-1 matching

2. Squared Euclidean distance

Page 30: Few-Shot and Zero-Shot Learning

Relation Network

• Relation net g: learnable comparing module

Sung et al. Learning to Compare: Relation Network for Few-Shot Learning. CVPR 2018

Page 31: Few-Shot and Zero-Shot Learning

Results

Metric-based meta-learning:

Learning how to do matching.

Page 32: Few-Shot and Zero-Shot Learning

Optimization-Based Meta-Learning

Page 33: Few-Shot and Zero-Shot Learning

MAML

• Learn an initialization θ that works well for per-task fine-tuning

Finn et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. ICML 2017

Page 34: Few-Shot and Zero-Shot Learning

MAML

• The computation of the fine-tuning process is differentiable

• θ ← θ − 𝛽 𝛻!𝐿(𝜃 − 𝛼𝛻! 𝐿 θ, 𝑆 , 𝑄)• 𝑆 : support-set• 𝑄 : query-set• 2nd order gradient

Finn et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. ICML 2017

Page 35: Few-Shot and Zero-Shot Learning

MAML

• Works only for small networks

• Very task dependent, performance can vary a lot depending on tasks

• Can perform worse than simple fine-tuning on larger dataset and networks

Page 36: Few-Shot and Zero-Shot Learning

Summary with Few-Shot learning

• Few-Shot learning is an important problem

• Meta-Learning makes the form of training/testing consistent

• Challenges• Scalability of the meta-learning algorithms• More practical settings: generalized FSL, any-shot / higher-shot• Discrepancy between base classes and novel classes…

Page 37: Few-Shot and Zero-Shot Learning

Zero-Shot Learning

Page 38: Few-Shot and Zero-Shot Learning

Word2vec Embeddings

Mikolov et al. Distributed Representations of Words and Phrases and their Compositionality. 2013

Page 39: Few-Shot and Zero-Shot Learning

Skip-gram model

Page 40: Few-Shot and Zero-Shot Learning

Word2vec Embeddings

Page 41: Few-Shot and Zero-Shot Learning

DeViSE

Frome et al. DeViSE: A Deep Visual-Semantic Embedding Model

Page 42: Few-Shot and Zero-Shot Learning

DeViSE

• Use the implicit relation between words with word embeddings

• How about using explicit relation in a knowledge graph?

Page 43: Few-Shot and Zero-Shot Learning

Using Knowledge Graphs• Never Ending Language Learning (NELL) Knowledge Graph

https://rtw.ml.cmu.edu

Page 44: Few-Shot and Zero-Shot Learning

Zero-Shot Recognition

• Word Embedding + Knowledge Graph• Graph Convolutional Network (GCN)• Training class: 𝑥!, 𝑥" ; Testing class: 𝑥#

300-d 2048-d

Page 45: Few-Shot and Zero-Shot Learning

Zero-Shot Recognition

300-d 2048-d

Page 46: Few-Shot and Zero-Shot Learning

Zero-Shot Recognition

2.5K classes

8.8K classes

21K classes

Page 47: Few-Shot and Zero-Shot Learning

This Class

• Few-shot learning

• Meta-learning for few-shot learning

• Zero-shot learning