click to edit master title style efficient rl model-based sampleebrun/15889e/lectures/sample... ·...

43
2/18/15 1 Click to edit Master title style Click to edit Master subtitle style 2/18/15 1 Model-based Sample Efficient RL Emma Brunskill 15-889e Fall 2015

Upload: others

Post on 20-Aug-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 1

Click to edit Master title style

Click to edit Master subtitle style

2/18/15 1

Model-based Sample Efficient RL

Emma Brunskill15-889eFall 2015

Page 2: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 2

Sample Efficient RL

• Objectives• Probably Approximately Correct• Minimizing regret• Bayes-optimal RL

• Today: Model-based data efficient RL

Page 3: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 3

Model-based Sample Efficient RL

• What objective is algorithm optimizing?• Using function approximation for the model• Planning with complex models • Computational constraints

Page 4: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 4

Model Based Approaches

• Linear representations are fairly limited• Lots of powerful function approximators, e.g.

• Gaussian processes• Random forests• Neural networks

Page 5: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 5

Exploration / Exploitation when Using Function Approximation for Models

• When learning a single policy from a batch of data, we didn’t have to address exploration vs exploitation

• Now we are doing online RL• If using function approximator to represent

the transition/reward models, how should we address exploration/exploitation?

Page 6: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 6

s’ = + s

Figure adjusted from Wilson et al. JMLR 2014

s

Gaussian Process to Model MDP

Page 7: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 7

s’ = + s

Figure adjusted from Wilson et al. JMLR 2014

s

Gaussian Process: Explicit Representation of Uncertainty Over Model Parameters

Page 8: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 8

Slide from Ghahramanihttp://www.eurandom.tue.nl/events/workshops/2010/YESIV/Prog-Abstr_files/Ghahramani-lecture2.pdf

Page 9: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 9

Can Exploit Structure in Dynamics

Slide modified from Jung & Stone ECML 2010

Page 10: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 10

Gaussian

Processes for

Sample Efficient

Reinforcement

Learning with

RMAX-like

Exploration

(Jung & Stone,

ECML 2010)

Slide modified from Jung & Stone ECML 2010

Page 11: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 11

Model Learning

Slide modified from Jung & Stone ECML 2010

Page 12: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 12

Planning

Slide modified from Jung & Stone ECML 2010

Page 13: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 13

Computational Cost

This is expensive!

Hard to scale to large dim state

spaces

Slide modified from Jung & Stone ECML 2010

Page 14: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 14

Simulation Experiments

Slide modified from Jung & Stone ECML 2010

Page 15: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 15

Slide modified from Jung & Stone ECML 2010

Page 16: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 16

Slide modified from Jung & Stone ECML 2010

GP model with no exploration doing best

Page 17: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 17

GP model + no exploration also best in larger domains

Page 18: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 18

Summary

• GP models can be very useful for quickly learning a good dynamics model, especially if there’s structure in the domain

• Planning can be computationally expensive

• In domains considered here, leveraging GP’s representation of model parameter uncertainty not needed

Slide modified from Jung & Stone ECML 2010

Page 19: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 19

Model Based Approaches

• Linear representations are fairly limited• Lots of powerful function approximators, e.g.

• Gaussian processes• Random forests• Neural networks

Page 20: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 20

TEXPLORE

Slide modified from Todd Hester

Page 21: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 21

Slide modified from Todd Hester

Page 22: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 22

Decision Trees for MDP Model

Slide modified from Todd Hester

Page 23: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 23

• Assume actions have similar effect across states

• srel = = s’-s• in some cases may be independent of s

(or be shared by many states)• Brunskill et al. 2008, Leffler et al. 2007,

Jong & Stone 2007

Assumption: Relative Effects

Page 24: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 24

Using Decision Trees for Dynamics Model

Slide modified from Todd Hester

Page 25: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 25

Representing Uncertainty Over Model: Random Forest

Slide modified from Todd Hester

Page 26: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 26

Exploration/Exploitation with Random Forest Model of MDP

Slide modified from Todd Hester

Page 27: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 27

TEXPLORE Planning Using Random Forest of

Models

• Essentially, compute an average model (from random forest)

• Then use that for planning• Some computational advantages too

Equation from Hester & Stone JMLR 2013

Page 28: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 28

MCTS for TEXPLORE Planning

Slide modified from Todd Hester

Page 29: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 29

TEXPLORE: Planning using UCT

Figure from Hester & Stone JMLR 2013

Page 30: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 30

TEXPLORE: Reuse Tree Across Time Steps

Figure from Hester & Stone JMLR 2013

Page 31: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 31

TEXPLORE: UCT + lambda-returns

Figure from Hester & Stone JMLR 2013

Page 32: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 32

TEXPLORE

Slide modified from Todd Hester

Page 33: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 33

Simulations on Car Driving

Slide modified from Todd Hester

Page 34: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 34

Slide modified from Todd Hester

Page 35: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 35

Comparing to Other Approaches

Slide modified from Todd Hester

Page 36: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 36

Slide modified from Todd Hester

Page 37: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 37

Fuel World

Slide modified from Todd Hester

Page 38: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 38

Slide modified from Todd Hester

Page 39: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 39

Slide modified from Todd Hester

Page 40: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 40

Model Accuracy

Slide modified from Todd Hester

Page 41: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 41

Does it work on real car?

Slide modified from Todd Hester

Page 42: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 42

TEXPLORE

• Uses random forests to represent MDP dynamics & rewards

• Uses MCTS for planning

• In domains presented, little explicit exploration needed

Slide modified from Todd Hester

Page 43: Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 43

Summary: Model-based Sample Efficient RL

• What objective is algorithm optimizing?• Today, empirical performance. No formal guarantees

• Using function approximation for the model can greatly speed learning (can exploit structure in dynamics model)

• Exploration / exploitation• Do we need to explicitly explore? • We’ll always explore things that look promising• In results saw today, didn’t need much explicit exploration

• Planning with complex models • Can be computationally prohibitive• Approximate approaches, like MCTS, useful