what’s wrong with meta-learningmetalearning.ml/2018/slides/meta_learning_2018_levine.pdf ·...

34
What’s Wrong with Meta-Learning (and how we might fix it) Sergey Levine UC Berkeley Google Brain

Upload: others

Post on 09-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

What’s Wrong with Meta-Learning(and how we might fix it)

Sergey LevineUC Berkeley Google Brain

Page 2: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images
Page 3: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

Yahya, Li, Kalakrishnan, Chebotar, Levine, ‘16

Page 4: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

Kalashnikov, Irpan, Pastor, Ibarz, Herzong, Jang, Quillen, Holly, Kalakrishnan, Vanhoucke, Levine. QT-Opt: Scalable Deep Reinforcement Learning of Vision-Based Robotic Manipulation Skills

Page 5: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

Kalashnikov, Irpan, Pastor, Ibarz, Herzong, Jang, Quillen, Holly, Kalakrishnan, Vanhoucke, Levine. QT-Opt: Scalable Deep Reinforcement Learning of Vision-Based Robotic Manipulation Skills

Page 6: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

can we transfer past experience in order to learn how to learn?

people can learn new skills extremely quickly

about four hours about four weeks, nonstop

how?

we never learn from scratch!

Page 7: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

The meta-learning/few-shot learning problem

A simpler, model-agnostic, meta-learning method

Unsupervised meta-learning

Page 8: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

The meta-learning/few-shot learning problem

A simpler, model-agnostic, meta-learning method

Unsupervised meta-learning

Page 9: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

Few-shot learning: problem formulation in pictures

image credit: Ravi & Larochelle ‘17

Page 10: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

Few-shot learning: problem formulation in equations

(few shot) training set

input (e.g., image) output (e.g., label)

training set

• How to read in training set?• Many options, RNNs can work

test input

test label

Page 11: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

Some examples of representations

Santoro et al. “Meta-Learning with Memory-Augmented Neural Networks.”

Vinyals et al. “Matching Networks for One-Shot Learning”

Snell et al. “Prototyping Networks for Few-Shot Learning”

…and many many many others!

Page 12: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

this implements the “learned learning algorithm”

test input

test label

RNN-based meta-learning

• Does it converge?

• Kind of?

• What does it converge to?

• Who knows…

• What to do if it’s not good enough?

• Nothing…

What kind of algorithm is learned?

Page 13: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

The meta-learning/few-shot learning problem

A simpler, model-agnostic, meta-learning method

Unsupervised meta-learning

Page 14: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

Let’s step back a bit…

is pretraining a type of meta-learning?

better features = faster learning of new task!

Page 15: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

Model-agnostic meta-learning

a general recipe:

* in general, can take more than one gradient step here

** we often use 4 – 10 steps“meta-loss” for task i

Finn et al., “Model-Agnostic Meta-Learning”

Chelsea Finn

Page 16: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

What did we just do?

Just another computation graph…

Can implement with any autodiffpackage (e.g., TensorFlow)

Page 17: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

Why does it work?

this implements the “learned learning algorithm”

test input

test label

RNN-based meta-learning

• Does it converge?

• Kind of?

• What does it converge to?

• Who knows…

• What to do if it’s not good enough?

• Nothing…

MAML

• Does it converge?

• Yes (it’s gradient descent…)

• What does it converge to?

• A local optimum (it’s gradient descent…)

• What to do if it’s not good enough?

• Keep taking gradient steps (it’s gradient descent…)

Page 18: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

Universality

Did we lose anything?

Universality: meta-learning can learn any “algorithm”

Finn & Levine. “Meta-Learning and Universality”

Page 19: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

Model-agnostic meta-learning: forward/backward locomotion

after MAML trainingafter 1 gradient step

(forward reward)

after 1 gradient step

(backward reward)

Page 20: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

Related work

…and many many many others!

Andrychowicz et al. “Learning to learn by gradient descent by gradient descent.”

Li & Malik. “Learning to optimize”Maclaurin et al. “Gradient-based hyperparameter optimization”

Ravi & Larochelle. “Optimization as a model for few-shot learning”

Page 21: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

Follow-up work

…and the results keep getting better

MiniImagenet few-shot benchmark: 5-shot 5-way

Finn et al. ‘17: 63.11%

Kim et al. ‘18 (AutoMeta): 76.29%

Li et al. ‘17: 64.03%

Page 22: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

The meta-learning/few-shot learning problem

A simpler, model-agnostic, meta-learning method

Unsupervised meta-learning

Page 23: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

Let’s Talk about Meta-Overfitting

• Meta learning requires task distributions

• When there are too few meta-training tasks, we can meta-overfit

• Specifying task distributions is hard, especially for meta-RL!

• Can we propose tasks automatically?

after MAML training after 1 gradient step

Page 24: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

A General Recipe for Unsupervised Meta-RL

environment

Unsupervised Meta-RL

Meta-learned

environment-specific

RL algorithm

reward-maximizing

policy

reward

function

Unsupervised

Task AcquisitionMeta-RL

Fast

Adaptation

Gupta, Eysenbach, Finn, Levine. Unsupervised Meta-Learning for Reinforcement Learning.

Chelsea FinnAbhishek Gupta Ben Eysenbach

environment

Unsupervised Meta-RL

Meta-learned

environment-specific

RL algorithm

reward-maximizing

policy

reward

function

Unsupervised

Task AcquisitionMeta-RL

Fast

Adaptationenvironment

Unsupervised Meta-RL

Meta-learned

environment-specific

RL algorithm

reward-maximizing

policy

reward

function

Unsupervised

Task AcquisitionMeta-RL

Fast

Adaptation

Page 25: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

Random Task Proposals

◼ Use randomly initialize discriminators for reward functions

◼ Important: Random functions over state space, not random policies

D → randomly initialized network

Page 26: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

Diversity-Driven Proposals

Policy(Agent)

Discriminator(D)

Skill (z)

Environment

Action State

Predict Skill

◼ Policy → visit states which are discriminable

◼ Discriminator → predict skill from state

Task Reward for UML:

Eysenbach, Gupta, Ibarz, Levine. Diversity is All You Need.

Page 27: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

Examples of Acquired Tasks

Eysenbach, Gupta, Ibarz, Levine. Diversity is All You Need.

Cheetah Ant

Page 28: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

Does it work?

Gupta, Eysenbach, Finn, Levine. Unsupervised Meta-Learning for Reinforcement Learning.

2D Navigation CheetahAnt

Meta-test performance with rewards

Page 29: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

What about supervised learning?

Page 30: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

Can we meta-train on only unlabeled images?

Hsu, Levine, Finn. Unsupervised Learning via Meta-Learning.

unsupervised learning task proposals

trainingimages

testimages

Class 1

Class 2

Class 1

Class 2

trainingimages

testimages

Class 1

Class 2

MAML

meta-learning

Chelsea FinnKyle Hsu

But... does it outperform unsupervised learning?

Page 31: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

Results: unsupervised meta-learning

no truelabelsat all!

unsupervised learning task proposals meta-learning

a few choices:

BiGAN – Donahue et al. ’17DeepCluster – Caron et al. ‘18

miniImageNet: 5 shot, 5 way

method accuracy

MAML with labels 62.13%

BiGAN kNN 31.10%

BiGAN logistic 33.91%

BiGAN MLP + dropout 29.06%

BiGAN cluster matching 29.49%

BiGAN CACTUs 51.28%

DeepCluster CACTUs 53.97%

Hsu, Levine, Finn. Unsupervised Learning via Meta-Learning.

Same story across:

• 3 different embedding methods

• 4 datasets (Omniglot, miniImageNet, CelebA, MNIST)

Clustering to Automatically Construct Tasks for Unsupervised

Meta-Learning (CACTUs)

Page 32: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

The meta-learning/few-shot learning problem

A simpler, model-agnostic, meta-learning method

Unsupervised meta-learning

Page 33: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

What’s next?

Meta-learning online learning & continual learning

Meta-learning to interpret weak supervision and natural language

Probabilistic meta-learning: learn to sample multiple hypotheses

Finn*, Xu*, Levine. Probabilistic Model-Agnostic Meta-Learning. 2018.

Nagabandi, Finn, Levine. Deep Online Learning via Meta-Learning: Continual Adaptation via Model-Based RL. 2018.

Co-Reyes, Gupta, Sanjeev, Altieri, DeNero, Abbeel, Levine. Meta-Learning Language-Guided Policy Learning. 2018.

Yu*, Finn*, Xie, Dasari, Abbeel, Levine. One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning. 2018.

Correction 1: Enter the blue room.

Correction 2: Enter the red room.

Instruction: Move blue triangle to green goal.

Page 34: What’s Wrong with Meta-Learningmetalearning.ml/2018/slides/meta_learning_2018_Levine.pdf · experience in order to learn how to learn? people can learn new skills ... training images

RAIL

Robotic AI & Learning Lab

website: http://rail.eecs.berkeley.edu

source code: http://rail.eecs.berkeley.edu/code.html