l24 less labels - georgia institute of technology€¦ · meta-learning for few-shot recognition...

34
CS 4803 / 7643: Deep Learning Zsolt Kira Georgia Tech Topics: (Continue) Low-label ML Formulations

Upload: others

Post on 17-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

CS 4803 / 7643: Deep Learning

Zsolt KiraGeorgia Tech

Topics: – (Continue) Low-label ML Formulations

Page 2: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Administrative• Projects!

– Poster details out on piazza– Note: No late days for anything project related!– Also note: Keep track of your GCP usage and costs! Set

limits on spending

(C) Dhruv Batra & Zsolt Kira 31

Page 3: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Meta-Learning for Few-Shot Recognition

• Key idea: We want to learn from a few examples (called the support set) to make predictions on query set for novel classes– Assume: We have larger labeled dataset for a different set of

categories (base classes)

• How do we test this?– N-way k-shot test– k: Number of examples in support set– N: Number of “confusers” that we have to choose target

class among

(C) Dhruv Batra & Zsolt Kira 32

Target

Query Set

Page 4: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Normal Approach• Do what we always do: Fine-tuning

– Train classifier on base classes

– Freeze features– Learn classifier weights for new classes using few amounts

of labeled data (during “inference” time!)

(C) Dhruv Batra & Zsolt Kira 33A Closer Look at Few-shot Classification, Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, Jia-Bin Huang

Page 5: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Cons of Normal Approach• The training we do on the base classes does not

factor the task into account

• No notion that we will be performing a bunch of N-way tests

• Idea: simulate what we will see during test time

(C) Dhruv Batra & Zsolt Kira 34

Page 6: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Meta-Training Approach• Set up a set of smaller tasks during training which

simulates what we will be doing during testing

– Can optionally pre-train features on held-out base classes (not typical)

• Testing stage is now the same, but with new classes

(C) Dhruv Batra & Zsolt Kira 35

Page 7: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Meta-Learning Approaches• Learning a model conditioned on support set

(C) Dhruv Batra & Zsolt Kira 36

Page 8: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Meta-Learner• How to parametrize learning algorithms?

• Two approaches to defining a meta-learner– Take inspiration from a known learning algorithm

• kNN/kernel machine: Matching networks (Vinyals et al. 2016)• Gaussian classifier: Prototypical Networks (Snell et al. 2017)• Gradient Descent: Meta-Learner LSTM (Ravi & Larochelle, 2017) ,

MAML (Finn et al. 2017)

– Derive it from a black box neural network• MANN (Santoro et al. 2016)• SNAIL (Mishra et al. 2018)

(C) Dhruv Batra & Zsolt Kira 37Slide Credit: Hugo Larochelle

Page 9: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Meta-Learner• How to parametrize learning algorithms?

• Two approaches to defining a meta-learner– Take inspiration from a known learning algorithm

• kNN/kernel machine: Matching networks (Vinyals et al. 2016)• Gaussian classifier: Prototypical Networks (Snell et al. 2017)• Gradient Descent: Meta-Learner LSTM (Ravi & Larochelle, 2017) ,

MAML (Finn et al. 2017)

– Derive it from a black box neural network• MANN (Santoro et al. 2016)• SNAIL (Mishra et al. 2018)

(C) Dhruv Batra & Zsolt Kira 38Slide Credit: Hugo Larochelle

Page 10: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Matching Networks

(C) Dhruv Batra & Zsolt Kira 39Slide Credit: Hugo Larochelle

Page 11: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Prototypical Networks

(C) Dhruv Batra & Zsolt Kira 40Slide Credit: Hugo Larochelle

Page 12: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Prototypical Networks

(C) Dhruv Batra & Zsolt Kira 41Slide Credit: Hugo Larochelle

Page 13: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

More Sophisticated Meta-Learning Approaches

• Learn gradient descent: – Parameter initialization and update rules

• Learn just an initialization and use normal gradient descent (MAML)

(C) Dhruv Batra & Zsolt Kira 42

Page 14: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Meta-Learner LSTM

(C) Dhruv Batra & Zsolt Kira 43Slide Credit: Hugo Larochelle

Page 15: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Meta-Learner LSTM

(C) Dhruv Batra & Zsolt Kira 44Slide Credit: Hugo Larochelle

Page 16: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Meta-Learner LSTM

(C) Dhruv Batra & Zsolt Kira 45Slide Credit: Hugo Larochelle

Page 17: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Meta-Learner LSTM

(C) Dhruv Batra & Zsolt Kira 46Slide Credit: Hugo Larochelle

Page 18: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Meta-Learning Algorithm

(C) Dhruv Batra & Zsolt Kira 47

Page 19: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Meta-Learner LSTM

(C) Dhruv Batra & Zsolt Kira 48Slide Credit: Hugo Larochelle

Page 20: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Model-Agnostic Meta-Learning (MAML)

(C) Dhruv Batra & Zsolt Kira 49Slide Credit: Hugo Larochelle

Page 21: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Model-Agnostic Meta-Learning (MAML)

(C) Dhruv Batra & Zsolt Kira 50

Page 22: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Model-Agnostic Meta-Learning (MAML)

(C) Dhruv Batra & Zsolt Kira 51Slide Credit: Sergey Levine

Page 23: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Model-Agnostic Meta-Learning (MAML)

(C) Dhruv Batra & Zsolt Kira 52Slide Credit: Sergey Levine

Page 24: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Comparison

(C) Dhruv Batra & Zsolt Kira 53Slide Credit: Sergey Levine

Page 25: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Meta-Learner• How to parametrize learning algorithms?

• Two approaches to defining a meta-learner– Take inspiration from a known learning algorithm

• kNN/kernel machine: Matching networks (Vinyals et al. 2016)• Gaussian classifier: Prototypical Networks (Snell et al. 2017)• Gradient Descent: Meta-Learner LSTM (Ravi & Larochelle, 2017) ,

MAML (Finn et al. 2017)

– Derive it from a black box neural network• MANN (Santoro et al. 2016)• SNAIL (Mishra et al. 2018)

(C) Dhruv Batra & Zsolt Kira 54Slide Credit: Hugo Larochelle

Page 26: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Experiments

(C) Dhruv Batra & Zsolt Kira 55Slide Credit: Hugo Larochelle

Page 27: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Memory-Augmented Neural Network

(C) Dhruv Batra & Zsolt Kira 56Slide Credit: Hugo Larochelle

Page 28: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

But beware

(C) Dhruv Batra & Zsolt Kira 57Slide Credit: Hugo Larochelle

A Closer Look at Few-shot Classification,

Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, Jia-Bin Huang

Page 29: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

(C) Dhruv Batra & Zsolt Kira 58

Page 30: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Distribution Shift• What if there is a

distribution shift (cross-domain)?

• Lesson: Methods that are successful within-domainmight be worse across domains!

(C) Dhruv Batra & Zsolt Kira 59

Page 31: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Distribution Shift

(C) Dhruv Batra & Zsolt Kira 60

Page 32: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Random Task Proposals

(C) Dhruv Batra & Zsolt Kira 61

Page 33: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Does it Work?

(C) Dhruv Batra & Zsolt Kira 62

Page 34: L24 less labels - Georgia Institute of Technology€¦ · Meta-Learning for Few-Shot Recognition • Key idea: We want to learn from a few examples ... Matching networks (Vinyalset

Discussions

(C) Dhruv Batra & Zsolt Kira 63

• What is the right definition of distributions over problems?– varying number of classes / examples per class (meta-

training vs. meta-testing) ?– semantic differences between meta-training vs. meta-testing

classes ?– overlap in meta-training vs. meta-testing classes (see recent

“low-shot” literature) ?• Move from static to interactive learning

– how should this impact how we generate episodes ?– meta-active learning ? (few successes so far)

Slide Credit: Hugo Larochelle