introduction to deep reinforcement learning · pdf fileintroduction to deep reinforcement...
TRANSCRIPT
![Page 1: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/1.jpg)
Introduction to Deep Reinforcement Learning
Shenglin Zhao Department of Computer Science & Engineering
The Chinese University of Hong Kong
![Page 2: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/2.jpg)
Outline
• Background • Deep Learning • Reinforcement Learning • Deep Reinforcement Learning • Conclusion
![Page 3: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/3.jpg)
Outline
• Background • Deep Learning • Reinforcement Learning • Deep Reinforcement Learning • Conclusion
![Page 4: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/4.jpg)
Milestone Issues
• NIPS 2013, DeepMind, Playing Atari with Deep Reinforcement Learning, https://arxiv.org/abs/1312.5602
• Nature cover paper 2015, DeepMind, Human-level control through deep reinforcement learning, www.nature.com/articles/nature14236
• Nature cover paper 2016, DeepMind, Mastering the game of Go with deep neural networks and tree search, www.nature.com/articles/nature16961
![Page 5: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/5.jpg)
Reinforcement Learning in a nutshell
http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf
![Page 6: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/6.jpg)
Deep Learning in a nutshell
http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf
![Page 7: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/7.jpg)
Deep Reinforcement Learning: AI = RL + DL
http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf
![Page 8: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/8.jpg)
Examples of Deep RL@DeepMind
http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf
![Page 9: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/9.jpg)
Outline
• Background • Deep Learning • Reinforcement Learning • Deep Reinforcement Learning • Conclusion
![Page 10: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/10.jpg)
Deep Learning = Learning Representations/Features
• Traditional Model
• Deep Learning
http://www.cs.nyu.edu/~yann/talks/lecun-ranzato-icml2013.pdf
![Page 11: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/11.jpg)
Deep Representations
http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf
![Page 12: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/12.jpg)
Deep Neural Network
http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf
![Page 13: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/13.jpg)
Example-CNN
http://xrds.acm.org/blog/2016/06/convolutional-neural-networks-cnns-illustrated-explanation/
Convolutional Operator
Discrete Form
Matrix Element-wise Multiplication
1 2 0
0 1 0
0 0 1
2 2 7
5 0 7
1 2 1 = 7
![Page 14: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/14.jpg)
Example-CNN
https://adeshpande3.github.io/adeshpande3.github.io/A-Beginner's-Guide-To-Understanding-Convolutional-Neural-Networks/
Pixel representation of filter Visualization
![Page 15: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/15.jpg)
Example-CNN
https://adeshpande3.github.io/adeshpande3.github.io/A-Beginner's-Guide-To-Understanding-Convolutional-Neural-Networks/
Original image What we find
![Page 16: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/16.jpg)
Example-CNN
http://www.cs.nyu.edu/~yann/talks/lecun-ranzato-icml2013.pdf
Low-Level Feature
Mid-Level Feature
High-Level Feature
Trainable Classifier
![Page 17: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/17.jpg)
Outline
• Background • Deep Learning • Reinforcement Learning • Deep Reinforcement Learning • Conclusion
![Page 18: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/18.jpg)
Reinforcement Learning
http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf
![Page 19: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/19.jpg)
Agent and Environment
http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf
Agent
Environment
![Page 20: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/20.jpg)
State
http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf
![Page 21: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/21.jpg)
Major Components
http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf
![Page 22: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/22.jpg)
Policy
http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf
![Page 23: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/23.jpg)
Value Function
Bellman equation
http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf
![Page 24: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/24.jpg)
Optimal Case
http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf
![Page 25: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/25.jpg)
Approaches to Reinforcement Learning
http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf
![Page 26: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/26.jpg)
RL Example • Assumption
• Suppose we have 5 rooms in a building connected by doors • The outside of the building can be thought of as one big room (5)
• Target • Put an agent in any room, and from that room, go outside the building
http://mnemstudio.org/path-finding-q-learning-tutorial.htm
![Page 27: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/27.jpg)
RL Example
![Page 28: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/28.jpg)
Q-learning for the RL Problem
• Assuming rewards for each step, the goal is to reach the state with the highest reward. • Terms— state: room, action: move decision, reward: 0 or 100
RL Problem Markov Decision Process
http://mnemstudio.org/path-finding-q-learning-tutorial.htm
![Page 29: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/29.jpg)
Q-learning for the RL Problem
Reward Table Q-value Table
![Page 30: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/30.jpg)
Q-learning for the RL Problem
• Q-table is the brain of our agent, representing the memory of what the agent has learned through experience. • The agent starts out knowing nothing, the matrix Q is initialized to
zero. • Simple transition rule of Q learning,
http://mnemstudio.org/path-finding-q-learning-tutorial.htm
![Page 31: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/31.jpg)
Q-learning for the RL Problem
http://mnemstudio.org/path-finding-q-learning-tutorial.htm
![Page 32: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/32.jpg)
Q-learning for the RL Problem
• Example • Initial state: room 1, action: move to 5
http://mnemstudio.org/path-finding-q-learning-tutorial.htm
![Page 33: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/33.jpg)
Q-learning for the RL Problem
• Example • Initial state: room 3, action: move to 1
Q(3, 1) = 0 + 0.8 * 100 = 80
http://mnemstudio.org/path-finding-q-learning-tutorial.htm
![Page 34: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/34.jpg)
Q-learning for the RL Problem
http://mnemstudio.org/path-finding-q-learning-tutorial.htm
![Page 35: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/35.jpg)
Outline
• Background • Deep Learning • Reinforcement Learning • Deep Reinforcement Learning • Conclusions
![Page 36: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/36.jpg)
Deep Reinforcement Learning
http://videolectures.net/rldm2015_silver_reinforcement_learning/
Deep Learning
Reinforcement Learning
Deep Reinforcement
Learning
![Page 37: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/37.jpg)
Deep Q-learning
Deep Q-Network (DQN)
![Page 38: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/38.jpg)
DQN Architecture
Naive formulation of deep Q-network DQN in DeepMind paper https://www.nervanasys.com/demystifying-deep-reinforcement-learning/
![Page 39: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/39.jpg)
DQN Architecture
https://www.nervanasys.com/demystifying-deep-reinforcement-learning/
![Page 40: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/40.jpg)
DQN
Loss function Q-table update algorithm
https://www.nervanasys.com/demystifying-deep-reinforcement-learning/
![Page 41: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/41.jpg)
Exploration-Exploitation
![Page 42: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/42.jpg)
Value Iteration
![Page 43: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/43.jpg)
Policy Iteration
![Page 44: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/44.jpg)
Stability Issues with Deep RL
http://videolectures.net/rldm2015_silver_reinforcement_learning/
![Page 45: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/45.jpg)
DQN
http://videolectures.net/rldm2015_silver_reinforcement_learning/
![Page 46: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/46.jpg)
Experience Replay
http://videolectures.net/rldm2015_silver_reinforcement_learning/
![Page 47: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/47.jpg)
Fixed Target Q-Network
http://videolectures.net/rldm2015_silver_reinforcement_learning/
![Page 48: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/48.jpg)
Reward/Value Range
http://videolectures.net/rldm2015_silver_reinforcement_learning/
![Page 49: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/49.jpg)
DQN Results in Atari
![Page 50: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/50.jpg)
DQN Atari Demo
![Page 51: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/51.jpg)
Conclusion
![Page 52: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/52.jpg)
Demo
![Page 53: Introduction to Deep Reinforcement Learning · PDF fileIntroduction to Deep Reinforcement Learning Shenglin Zhao](https://reader033.vdocuments.site/reader033/viewer/2022051105/5aa7b1a57f8b9ac5648c76ef/html5/thumbnails/53.jpg)
References
• http://karpathy.github.io/2016/05/31/rl/ • https://gym.openai.com/docs/rl • https://www.nervanasys.com/demystifying-deep-reinforcement-
learning/ • http://mnemstudio.org/path-finding-q-learning-tutorial.htm • http://videolectures.net/rldm2015_silver_reinforcement_learning/ • http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf