makers of matlab and simulink - deep learning workshop ...€¦ · deep neural network policy...
TRANSCRIPT
![Page 1: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/1.jpg)
1© 2019 The MathWorks, Inc.
Reinforcement Learning:
Leveraging Deep Learning for Controls
Christoph Stockhammer MathWorks Application Engineering
![Page 2: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/2.jpg)
2
Final Deployment
![Page 3: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/3.jpg)
3
Goal: We hope you walk away knowing the answer to these questions
▪ What is reinforcement learning and why should I care about it?
▪ How do I set it up and solve it? [from an engineer’s perspective]
▪ What are some benefits and drawbacks?
![Page 4: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/4.jpg)
4
Let’s try to solve this problem the traditional way
Motor
Control
Leg &
Trunk
Trajectories
Balance
Motor
Commands
Observations
Sensors
Camera
Data
Feature
Extraction
State
Estimation
Plant
Model
+
Control
System
Observations
Motor
Commands
![Page 5: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/5.jpg)
5
What is the alternative approach?
Sensors
Camera
Data
Feature
Extraction
State
Estimation
Plant
Model
+
Control
System
Observations
Motor
Commands
Sensors
Camera
DataBlack Box
Controller
Motor
Commands
![Page 6: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/6.jpg)
6
What is reinforcement learning?
![Page 7: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/7.jpg)
7
Unsupervised
Learning[No Labeled Data]
Clustering
Machine Learning
Reinforcement Learning vs Machine Learning vs Deep Learning
![Page 8: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/8.jpg)
8
Unsupervised
Learning[No Labeled Data]
Supervised Learning
[Labeled Data]
Clustering Classification Regression
Machine Learning
Reinforcement Learning vs Machine Learning vs Deep Learning
![Page 9: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/9.jpg)
9
Unsupervised
Learning[No Labeled Data]
Supervised Learning
[Labeled Data]
Clustering Classification Regression
Machine Learning
Reinforcement Learning vs Machine Learning vs Deep Learning
Deep Learning
Supervised learning typically involves
feature extraction
Deep learning typically simplifies
feature extraction
![Page 10: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/10.jpg)
10
Unsupervised
Learning[No Labeled Data]
Supervised Learning
[Labeled Data]
Clustering Classification Regression
Deep Learning
Machine Learning
Reinforcement
Learning
[Interaction Data]
Decision
MakingControl
Reinforcement Learning vs Machine Learning vs Deep Learning
Reinforcement learning:
▪ Learning through trial & error
[interaction]
▪ Complex problems typically
need deep learning
[Deep Reinforcement
Learning]
▪ It’s about learning a
behavior or accomplishing a
task
![Page 11: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/11.jpg)
11
Reinforcement Learning vs Machine Learning vs Deep Learning
![Page 12: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/12.jpg)
12
Reinforcement Learning vs Machine Learning vs Deep Learning
![Page 13: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/13.jpg)
13
Reinforcement Learning vs Machine Learning vs Deep Learning
![Page 14: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/14.jpg)
14
Reinforcement Learning vs Machine Learning vs Deep Learning
![Page 15: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/15.jpg)
15
A Practical Example of Reinforcement LearningTraining a Self-Driving Car
▪ Vehicle’s computer learns how to drive…
(agent)
▪ using sensor readings from LIDAR, cameras,…
(state)
▪ that represent road conditions, vehicle position,…
(environment)
▪ by generating steering, braking, throttle commands,…
(action)
▪ based on an internal state-to-action mapping…
(policy)
▪ that tries to get you from A to B without an accident while
possibly optimizing driver comfort & fuel efficiency…
(reward).
▪ The policy is updated through repeated trial-and-error by a
reinforcement learning algorithm
AGENT
Reinforcement
Learning
Algorithm
Policy
ENVIRONMENT
ACTION
REWARD
STATE
Policy update
![Page 16: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/16.jpg)
16
A Practical Example of Reinforcement Learning A Trained Self-Driving Car Only Needs A Policy To Operate
▪ Vehicle’s computer uses the final state-to-action mapping…
(policy)
▪ to generate steering, braking, throttle commands,…
(action)
▪ based on sensor readings from LIDAR, cameras,…
(state)
▪ that represent road conditions, vehicle position,…
(environment)
POLICY
ENVIRONMENT
ACTIONSTATE
By definition, this trained policy
puts into practice what is in the
reward function
![Page 17: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/17.jpg)
17
A deep neural network trained using reinforcement learning is a
black-box model that determines the best possible action
Current State
(Image, Radar,
Sensor, etc.)
Previous
Action
(optional)
Next
Action
Deep Neural Network Policy
(captures environment
dynamics…somehow)
By representing policies using deep neural networks, we can solve problems
for complex, non-linear systems (continuous or discrete) by directly using
data that traditional approaches cannot use easily
![Page 18: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/18.jpg)
18
How do I set it up and solve it?
![Page 19: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/19.jpg)
19
Steps in the Reinforcement Learning Workflow
Environment Reward Policy Training Deployment
![Page 20: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/20.jpg)
20
Environment
![Page 21: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/21.jpg)
21
Video of Simulink model
![Page 22: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/22.jpg)
22
Steps in the Reinforcement Learning Workflow
![Page 23: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/23.jpg)
23
S0
S1
S2
S3
S4
T=0 T=2
T=1
T=1
T=1
a1 (5)
a2 (10)
a3 (40)
Reward Function
Long-Term Reward (Q)
▪ Q(S0,a1) = +105 (optimal)
▪ Q(S0,a2) = +60
▪ Q(S0,a3) = +30
b1 (100)
b2 (50)
b3 (-10)
The logic you used right now to
decide you should take action a1
is a policy
![Page 24: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/24.jpg)
24
Policy & Agent
![Page 25: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/25.jpg)
25
![Page 26: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/26.jpg)
26
Reinforcement Learning vs Controls
Control system Reinforcement learning system
PLANTCONTROLLER
REFERENCE
MEASUREMENT
MANIPULATED
VARIABLE
+
-
ERROR
Reinforcement learning has parallels to control system design
Controller Policy
Plant Environment
Measurement Observation
Manipulated variable
Error/Cost function
Adaptation mechanism
Action
Reward
RL Algorithm
![Page 27: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/27.jpg)
27
When would you use Reinforcement Learning?
Reinforcement learning might be a good fit if
▪ An environment model is available (trial & error on hardware can be expensive), and
▪ Training/tuning time is not critical for the application, and
▪ Uncertain environments or nonlinear environments
Controller
Capability
Computational Cost
in Training/Tuning
Computational Cost
in Deployment
PID Low Low Low
Model Pred Control High Low High
Reinforcement Learning Very High High Medium
![Page 28: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/28.jpg)
28
Reinforcement Learning Toolbox
New in R2019a
▪ Built-in and custom algorithms for reinforcement
learning
▪ Environment modeling in MATLAB and Simulink
▪ Deep Learning Toolbox support for designing policies
▪ Training acceleration through GPUs and cloud
resources
▪ Deployment to embedded devices and production
systems
▪ Reference examples for getting started
![Page 29: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/29.jpg)
29
Automotive Applications
▪ Controller Design
▪ Lane Keep Assist
▪ Adaptive Cruise Control
▪ Path Following Control
▪ Trajectory Planning
![Page 30: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/30.jpg)
30
Takeaways
![Page 31: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/31.jpg)
31
Simulation and virtual models are a key aspect of reinforcement
learning
▪ Reinforcement learning needs a lot of data
(sample inefficient)
– Training on hardware can be prohibitively
expensive and dangerous
▪ Virtual models allow you to simulate conditions
hard to emulate in the real world
– This can help develop a more robust
solution
▪ Many of you have already developed MATLAB
and Simulink models that can be reused
![Page 32: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/32.jpg)
32
Resources
▪ Examples for automotive and
autonomous system applications
▪ Documentation written for
engineers and domain experts
▪ Tech Talk video series on
reinforcement learning concepts for
engineers
![Page 33: Makers of MATLAB and Simulink - Deep Learning Workshop ...€¦ · Deep Neural Network Policy (captures environment dynamics ... Environment Reward Policy Training Deployment. 20](https://reader033.vdocuments.site/reader033/viewer/2022050607/5fae280e7a00c81514469317/html5/thumbnails/33.jpg)
33
Thank You!