robotics: science and systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 rss...

24
Machine Learning for Robot Control Robotics: Science and Systems Zhibin (Alex) Li School of Informatics University of Edinburgh

Upload: others

Post on 12-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

Machine Learning for Robot Control

Robotics: Science and Systems

Zhibin (Alex) LiSchool of Informatics

University of Edinburgh

Page 2: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

Content

2

● Why learning?

● Common learning paradigms

● Deep learning for robot control

● A brief summary

* only cover the machine learning cases for robot control only

Page 3: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

Why learning?

3

Page 4: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

What are missing in DRC?

4

1. Operators manually put footsteps for robots, which introduces human errors

2. No use of hands during locomotion

3. No reaction while falling, no autonomous behaviors

4. ⋮⋮

© DARPAtv

Page 5: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

Lesson learned from DRC: Robots need to be more autonomous

How to fill the gap?

5

High level human supervision

Low level control for task execution

Limited communication

High level human supervision

Low level control for task execution

Limited communication

Machine learning

Page 6: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

Why learning?Suitable to handle cases that are:1. stochastic, uncertainty2. a system or process that is hard to be modeled, strongly

nonlinear or state-dependant (time-varying)3. multiple scenarios (difficult to be numerated), multi-modality4. high dimensional action space (multiple actions can lead towards

the same goal, need to predict future)

6

Page 7: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

Why learning?Advantage:1. automate the process of designing policies, free people from

tedious routines of coding;2. explore some new solutions, uncover our blind spots, create more

inventions;3. potentially to facilitate progress of science and technology, if

used appropriately.

7

Page 8: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

● Supervised learning: know how to classify data (labeled), train neural network to learn how to classify it.

● Unsupervised learning: don’t know how to classify data (unlabeled), train neural network to find out the underlying principles how to classify it.

● Reinforcement learning: don’t know how to classify data, but a reward will be given to if end result is good, or a penalty if the result is bad (eg, Atari video game).

Common learning paradigms

8

Page 9: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

Go beyond video games, a case study of using deep learning for robot balancing and locomotion control.

Here the example uses Deep Deterministic Policy Gradient (DDPG), a model free, actor-critic reinforcement learning algorithm.

Deep learning for robot control

9Chuanyu Yang, Taku Komura, Zhibin Li, ”Emergence of Human-Comparable Balancing Behaviors by Deep Reinforcement Learning ,” in IEEE-RAS International Conference on Humanoid Robots, 2017.

Page 10: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

Deep learningIt consists two networks:

1. Actor network learns the policy that generates the optimal action;2. Critic network evaluates the performance of the policy learnt by

the actor network.

10

Actor networkCritic network

Page 11: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

Deep Deterministic Policy Gradient (DDPG)

11

Stateinput

Actor network

Actionoutput

Stateinput

Input from Action

Critic network

Q-Valueoutput

Page 12: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

Learn an optimal policy

12

Actor networkCritic network

Page 13: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

Deep learning (DL) for push recovery

13

Page 14: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

DL for rough terrain locomotionNext step:

● from balancing to walking, stepping, running, jumping, and leaping;● using perception for coordinating control strategies.

Adding recurrent neural network, mix DDPG with Recurrent Deterministic Policy Gradients (RDPG).

14

Page 15: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

Feedback: body orientation, horizontal & vertical speed, joint position & velocity, foot-ground contact.Perception: partially observable height map by 10 lidar scans.

DL for rough terrain locomotion

15Doo Re Song, Chuanyu Yang, Christopher McGreavy, Zhibin Li, “Recurrent Network-based Deterministic Policy Gradient for Solving Bipedal Walking Challenge on Rugged Terrains”, submitted to 'IEEE International Conference on Robotics and Automation, 2018.

Page 16: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

Learning optimal policies

16

Actor networkCritic network

Page 17: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

DL locomotion: obstacles

17

Page 18: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

DL locomotion: stairs

18

Page 19: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

DL locomotion: pitfalls

19

Page 20: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

QuizPreviously, in the inverse kinematics (IK) problem, we haveObjective function for optimization

So what the reward should be, if we would like to use reinforcement learning to solve the IK problem?

20

Page 21: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

QuizPreviously, in the inverse kinematics (IK) problem, we haveObjective function for optimization

21

Page 22: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

Is learning everything? A brief summary

22

Page 23: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

A right attitude towards machine learningWhat a future super intelligence would think about nowadays’ learning and math- or model-based tools?

23

● Mathematical tools, physics laws and so on are created by the top intellectuals in human history, it is unwise not to use these physics rules, from a super intelligence point of view, to use a week AI instead to learn from scratch in a very primitive way.

Page 24: Robotics: Science and Systemswcms.inf.ed.ac.uk/ipab/rss/lecture-notes-2018-2019/17 RSS Machine... · 2. explore some new solutions, uncover our blind spots, create more inventions;

A right attitude towards machine learning● Learning from scratch can be okay for non-physical learning process,

eg computer graphics or playing GO inside computer, 30 iteration and 30,000 or 30 million iteration does not make a big difference, if time is not a problem.

● However, for real physical systems, eg robots, it can be hard because robots do break before learning is done. We, resilient biological systems, also can be injured before reaching successful performance.

24

Right tool for the right thing! It is complementary to other control tools we have learned.