model-based reinforcement learning with neural networks on hierarchical dynamic system

Download Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynamic System

Post on 25-Jan-2017

279 views

Category:

Technology

1 download

Embed Size (px)

TRANSCRIPT

  • Model-based Reinforcement Learningwith Neural Networkson Hierarchical Dynamic System

    Akihiko Yamaguchi and Christopher G. Atkeson

    Robotics Institute, Carnegie Mellon Universityhttp://akihikoy.net/

    http://akihikoy.net/http://akihikoy.net/

  • http://reflectionsintheword.files.wordpress.com/2012/08/pouring-water-into-glass.jpg

    http://schools.graniteschools.org/edtech-canderson/files/2013/01/heinz-ketchup-old-bottle.jpg

    http://old.post-gazette.com/images2/20021213hosqueeze_230.jpg

    http://img.diytrade.com/cdimg/1352823/17809917/0/1292834033/shampoo_bottle_bodywash_bottle.jpg

    http://www.nescafe.com/upload/golden_roast_f_711.png

    My pizza demonstration https://youtu.be/Wgj32blPGiE

    https://youtu.be/Wgj32blPGiE

  • https://youtu.be/GjwfbOur3CQ

    https://youtu.be/GjwfbOur3CQ

  • Pouring: A Manipulation of Deformable Object

    Planning actionsPlanning parameters of actions= Dynamic Programming (Opt ctrl, MPC, )Dynamics are partially unknown

    Reinforcement Learning ProblemRL in pouring

    Adaptation: not much hardGeneralization: hardIs Deep NN useful in this problem? (How to use in RL framework?)4

  • Remarks of Reinforcement LearningGood to think about Model-free RL v.s. Model-based RLSuccessful robot-learning RL is model-free (direct policy search) [cf. Kober et al. 2013]

    Good at fine-tuning, Less computation cost (at execution)Robust to PoMDPModel-based: Simulation biases

    Model-based:1. Generalization ability2. Sharable / Reusable3. Capable to reward changes

    2 and 3: Thanks to symbolic (hierarchical) representation

    5

    inputoutput

    hidden

    u

    update

    FK ANN

    [Magtanong et al. 2012]

  • How to deal with simulation biases?Do not learn dx/dt = F(x,u) (dt: small like xx ms)

    Learn (sub)task-level dynamicsParameters F_grasp Grasp result

    Parameters F_flow_ctrl Flow ctrl result

    Use stochastic modelsGaussian F Gaussian

    Stochastic Neural Networks [Yamaguchi, Atkeson, ICRA 2016]

    Use stochastic dynamic programmingStochastic Differential Dynamic Programming[Yamaguchi, Atkeson, Humanoids 2015]

    6 Model-based RL with Neural Networks for Hierarchical Dynamic System

    https://www.researchgate.net/publication/294729454https://www.researchgate.net/publication/282157952

  • Stochastic Neural Networks

    Propagation of probability distribution from input to outputGradients of output expectation w.r.t. an inputDifficulty: Nonlinear activation functions

    ReLU (f(x)=max(0,x))

    7

    Meanmodel

    Errormodel

    Input(shared)

  • Use Case

    8 Independent neural networks for each (sub)dynamical system

  • Stochastic Differential Dynamic Programming

    9

  • Results of Experiments

    DNN+DDP was better than LWR+DDP

    Using redundant features did not affect the learning performance

    Worked in pouringwith PR2 robot

    10Video: https://youtu.be/aM3hE1J5W98

    https://youtu.be/aM3hE1J5W98

  • More Informationhttp://akihikoy.net/https://www.youtube.com/AkihikoYamaguchiAkihiko Yamaguchi and Christopher G. Atkeson:Neural Networks and Differential Dynamic Programming for Reinforcement Learning Problems, in Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA2016), Stockholm, Sweden, May, 2016.https://www.researchgate.net/publication/294729454Akihiko Yamaguchi and Christopher G. Atkeson:Differential Dynamic Programming with Temporally Decomposed Dynamics, in Proceedings of the 15th IEEE-RAS International Conference on Humanoid Robots (Humanoids2015), pp. 696-703, Seoul, 2015.https://www.researchgate.net/publication/282157952Akihiko Yamaguchi, Christopher G. Atkeson, and Tsukasa Ogasawara:Pouring Skills with Planning and Learning Modeled from Human Demonstrations, International Journal of Humanoid Robotics, Vol.12, No.3, pp.1550030, July, 2015.https://www.researchgate.net/publication/280733055

    11

    http://akihikoy.net/https://www.youtube.com/AkihikoYamaguchihttps://www.researchgate.net/publication/294729454https://www.researchgate.net/publication/282157952https://www.researchgate.net/publication/280733055

Recommended

View more >