Transcript
Page 1: Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynamic System

Model-based Reinforcement Learningwith Neural Networkson Hierarchical Dynamic System

Akihiko Yamaguchi and Christopher G. Atkeson

Robotics Institute, Carnegie Mellon Universityhttp://akihikoy.net/

Page 2: Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynamic System

http://reflectionsintheword.files.wordpress.com/2012/08/pouring-water-into-glass.jpg

http://schools.graniteschools.org/edtech-canderson/files/2013/01/heinz-ketchup-old-bottle.jpg

http://old.post-gazette.com/images2/20021213hosqueeze_230.jpg

http://img.diytrade.com/cdimg/1352823/17809917/0/1292834033/shampoo_bottle_bodywash_bottle.jpg

http://www.nescafe.com/upload/golden_roast_f_711.png

My pizza demonstration https://youtu.be/Wgj32blPGiE

Page 3: Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynamic System

https://youtu.be/GjwfbOur3CQ

Page 4: Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynamic System

Pouring: A Manipulation of Deformable Object

Planning actionsPlanning parameters of actions= Dynamic Programming (Opt ctrl, MPC, …)Dynamics are partially unknown

Reinforcement Learning ProblemRL in pouring

Adaptation: not much hardGeneralization: hardIs Deep NN useful in this problem? (How to use in RL framework?)4

Page 5: Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynamic System

Remarks of Reinforcement LearningGood to think about Model-free RL v.s. Model-based RLSuccessful robot-learning RL is model-free (direct policy search) [cf. Kober et al. 2013]

Good at fine-tuning, Less computation cost (at execution)Robust to PoMDPModel-based: Simulation biases

Model-based:1. Generalization ability2. Sharable / Reusable3. Capable to reward changes

2 and 3: Thanks to symbolic (hierarchical) representation

5

inputoutput

hidden

- u

update

FK ANN

[Magtanong et al. 2012]

Page 6: Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynamic System

How to deal with simulation biases?Do not learn dx/dt = F(x,u) (dt: small like xx ms)

Learn (sub)task-level dynamicsParameters F_grasp Grasp result

Parameters F_flow_ctrl Flow ctrl result

Use stochastic modelsGaussian F Gaussian

Stochastic Neural Networks [Yamaguchi, Atkeson, ICRA 2016]

Use stochastic dynamic programmingStochastic Differential Dynamic Programming[Yamaguchi, Atkeson, Humanoids 2015]

6 Model-based RL with Neural Networks for Hierarchical Dynamic System

Page 7: Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynamic System

Stochastic Neural Networks

Propagation of probability distribution from input to outputGradients of output expectation w.r.t. an inputDifficulty: Nonlinear activation functions

ReLU (f(x)=max(0,x))

7

Meanmodel

Errormodel

Input(shared)

Page 8: Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynamic System

Use Case

8 Independent neural networks for each (sub)dynamical system

Page 9: Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynamic System

Stochastic Differential Dynamic Programming

9

Page 10: Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynamic System

Results of Experiments

DNN+DDP was better than LWR+DDP

Using redundant features did not affect the learning performance

Worked in pouringwith PR2 robot

10Video: https://youtu.be/aM3hE1J5W98

Page 11: Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynamic System

More Informationhttp://akihikoy.net/https://www.youtube.com/AkihikoYamaguchiAkihiko Yamaguchi and Christopher G. Atkeson:Neural Networks and Differential Dynamic Programming for Reinforcement Learning Problems, in Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA2016), Stockholm, Sweden, May, 2016.https://www.researchgate.net/publication/294729454Akihiko Yamaguchi and Christopher G. Atkeson:Differential Dynamic Programming with Temporally Decomposed Dynamics, in Proceedings of the 15th IEEE-RAS International Conference on Humanoid Robots (Humanoids2015), pp. 696-703, Seoul, 2015.https://www.researchgate.net/publication/282157952Akihiko Yamaguchi, Christopher G. Atkeson, and Tsukasa Ogasawara:Pouring Skills with Planning and Learning Modeled from Human Demonstrations, International Journal of Humanoid Robotics, Vol.12, No.3, pp.1550030, July, 2015.https://www.researchgate.net/publication/280733055

11


Top Related