![Page 1: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/1.jpg)
Deep Reinforcement Learning, Decision Making, and Control
CS 285
![Page 2: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/2.jpg)
Course logistics
![Page 3: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/3.jpg)
Class Information & Resources
• Course website: http://rail.eecs.berkeley.edu/deeprlcourse
• Piazza: UC Berkeley, CS285
• Gradescope: UC Berkeley, CS285
• Subreddit (for non-enrolled students): www.reddit.com/r/berkeleydeeprlcourse/
• Office hours: check course website (mine are after class on Wed)
Sergey LevineInstructor
Frederik EbertHead GSI
Anusha NagabandiGSI
Avi SinghGSI
Kelvin XuGSI
![Page 4: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/4.jpg)
Prerequisites & Enrollment
• All enrolled students must have taken CS189, CS289, CS281A, or an equivalent course at your home institution• Please contact Sergey Levine if you haven’t
• If you are not eligible to enroll directly into the class, fill out the enrollment application form: http://rail.eecs.berkeley.edu/deeprlcourse/• We will enroll subject to availability based on responses to this form
• We will not use the official CalCentral wait list!
• Fill out an application before Friday!
• Lectures are recorded and live streamed (link on course website)
![Page 5: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/5.jpg)
What you should know
• Assignments will require training neural networks with standard automatic differentiation packages (TensorFlow by default)
• Review Section• Kelvin Xu will review TensorFlow on Mon in week 3 (Sep 9)
• You should be able to understand the overview here: https://www.tensorflow.org/guide/low_level_intro• If not, make sure to attend Kelvin’s lecture and ask questions!
![Page 6: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/6.jpg)
What we’ll cover
• Material will be similar to previous year: http://rail.eecs.berkeley.edu/deeprlcourse-fa18/
1. From supervised learning to decision making
2. Model-free algorithms: Q-learning, policy gradients, actor-critic
3. Advanced model learning and prediction
4. Transfer and multi-task learning, meta-learning
5. Exploration
6. Open problems, research talks, invited lectures
![Page 7: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/7.jpg)
![Page 8: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/8.jpg)
Assignments
1. Homework 1: Imitation learning (control via supervised learning)
2. Homework 2: Policy gradients (“REINFORCE”)
3. Homework 3: Q learning and actor-critic algorithms
4. Homework 4: Model-based reinforcement learning
5. Homework 5: Advanced model-free RL algorithms
6. Final project: Research-level project of your choice (form a group of up to 2-3 students, you’re welcome to start early!)
Grading: 50% homework (10% each), 50% project
5 late days total
![Page 9: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/9.jpg)
Your “Homework” Today
1. Sign up for Piazza (UC Berkeley CS285)
2. Start forming your final project groups, unless you want to work alone, which is fine
3. Review this: https://www.tensorflow.org/guide/low_level_intro
![Page 10: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/10.jpg)
What is reinforcement learning, and why should we care?
![Page 11: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/11.jpg)
How do we build intelligent machines?
![Page 12: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/12.jpg)
Intelligent machines must be able to adapt
![Page 13: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/13.jpg)
Deep learning helps us handle unstructured environments
![Page 14: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/14.jpg)
Reinforcement learning provides a formalism for behavior
decisions (actions)
consequencesobservationsrewards
Mnih et al. ‘13Schulman et al. ’14 & ‘15
Levine*, Finn*, et al. ‘16
![Page 15: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/15.jpg)
What is deep RL, and why should we care?
standardcomputer
vision
features(e.g. HOG)
mid-level features(e.g. DPM)
classifier(e.g. SVM)
deeplearning
Felzenszwalb ‘08
end-to-end training
standardreinforcement
learning
features more features linear policyor value func.
deepreinforcement
learning
end-to-end training
? ? action
action
![Page 16: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/16.jpg)
What does end-to-end learning mean for sequential decision making?
![Page 17: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/17.jpg)
Action(run away)
perception
action
![Page 18: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/18.jpg)
Action(run away)
sensorimotor loop
![Page 19: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/19.jpg)
Example: robotics
roboticcontrolpipeline
observationsstate
estimation(e.g. vision)
modeling & prediction
planninglow-level control
controls
![Page 20: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/20.jpg)
tiny, highly specialized “visual cortex”
tiny, highly specialized “motor cortex”
![Page 21: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/21.jpg)
The reinforcement learning problem is the AI problem!
decisions (actions)
consequencesobservationsrewards
Actions: muscle contractionsObservations: sight, smellRewards: food
Actions: motor current or torqueObservations: camera imagesRewards: task success measure (e.g., running speed)
Actions: what to purchaseObservations: inventory levelsRewards: profit
Deep models are what allow reinforcement learning algorithms to solve complex problems end to end!
![Page 22: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/22.jpg)
Complex physical tasks…
Rajeswaran, et al. 2018
![Page 23: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/23.jpg)
Unexpected solutions…
Mnih, et al. 2015
![Page 24: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/24.jpg)
In the real world…
Kalashnikov et al. ‘18
![Page 25: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/25.jpg)
In the real world…
Kalashnikov et al. ‘18
![Page 26: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/26.jpg)
Not just games and robots!
Cathy Wu
![Page 27: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/27.jpg)
Why should we study this now?
1. Advances in deep learning
2. Advances in reinforcement learning
3. Advances in computational capability
![Page 28: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/28.jpg)
Why should we study this now?
L.-J. Lin, “Reinforcement learning for robots using neural networks.” 1993
Tesauro, 1995
![Page 29: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/29.jpg)
Why should we study this now?
Atari games:Q-learning:V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, et al. “Playing Atari with Deep Reinforcement Learning”. (2013).
Policy gradients:J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel. “Trust Region Policy Optimization”. (2015).V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, et al. “Asynchronous methods for deep reinforcementlearning”. (2016).
Real-world robots:Guided policy search:S. Levine*, C. Finn*, T. Darrell, P. Abbeel. “End-to-end training of deep visuomotor policies”. (2015).
Q-learning:D. Kalashnikov et al. “QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation”. (2018).
Beating Go champions:Supervised learning + policy gradients + value functions + Monte Carlo tree search:D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, et al. “Mastering the game of Go with deep neural networks and tree search”. Nature (2016).
![Page 30: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/30.jpg)
What other problems do we need to solve to enable real-world sequential decision making?
![Page 31: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/31.jpg)
Beyond learning from reward
• Basic reinforcement learning deals with maximizing rewards
• This is not the only problem that matters for sequential decision making!
• We will cover more advanced topics• Learning reward functions from example (inverse reinforcement learning)
• Transferring knowledge between domains (transfer learning, meta-learning)
• Learning to predict and using prediction to act
![Page 32: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/32.jpg)
Where do rewards come from?
![Page 33: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/33.jpg)
Are there other forms of supervision?
• Learning from demonstrations• Directly copying observed behavior
• Inferring rewards from observed behavior (inverse reinforcement learning)
• Learning from observing the world• Learning to predict
• Unsupervised learning
• Learning from other tasks• Transfer learning
• Meta-learning: learning to learn
![Page 34: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/34.jpg)
Imitation learning
Bojarski et al. 2016
![Page 35: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/35.jpg)
More than imitation: inferring intentions
Warneken & Tomasello
![Page 36: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/36.jpg)
Inverse RL examples
Finn et al. 2016
![Page 37: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/37.jpg)
Prediction
![Page 38: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/38.jpg)
Ebert et al. 2017
Prediction for real-world control
![Page 39: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/39.jpg)
Xie et al. 2019
Using tools with predictive models
![Page 40: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/40.jpg)
Playing games with predictive models
Kaiser et al. 2019
realpredicted
But sometimes there are issues…
![Page 41: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/41.jpg)
How do we build intelligent machines?
![Page 42: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/42.jpg)
How do we build intelligent machines?
• Imagine you have to build an intelligent machine, where do you start?
![Page 43: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/43.jpg)
Learning as the basis of intelligence
• Some things we can all do (e.g. walking)
• Some things we can only learn (e.g. driving a car)
• We can learn a huge variety of things, including very difficult things
• Therefore our learning mechanism(s) are likely powerful enough to do everything we associate with intelligence• But it may still be very convenient to “hard-code” a few really important bits
![Page 44: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/44.jpg)
A single algorithm?
[BrainPort; Martinez et al; Roe et al.]
Seeing with your tongue
Auditory
Cortex
adapted from A. Ng
• An algorithm for each “module”?
• Or a single flexible algorithm?
![Page 45: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/45.jpg)
What must that single algorithm do?
• Interpret rich sensory inputs
• Choose complex actions
![Page 46: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/46.jpg)
Why deep reinforcement learning?
• Deep = can process complex sensory input▪ …and also compute really complex functions
• Reinforcement learning = can choose complex actions
![Page 47: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/47.jpg)
Some evidence in favor of deep learning
![Page 48: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/48.jpg)
Some evidence for reinforcement learning
• Percepts that anticipate reward become associated with similar firing patterns as the reward itself
• Basal ganglia appears to be related to reward system
• Model-free RL-like adaptation is often a good fit for experimental data of animal adaptation• But not always…
![Page 49: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/49.jpg)
What can deep learning & RL do well now?
• Acquire high degree of proficiency in domains governed by simple, known rules
• Learn simple skills with raw sensory inputs, given enough experience
• Learn from imitating enough human-provided expert behavior
![Page 50: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/50.jpg)
What has proven challenging so far?
• Humans can learn incredibly quickly• Deep RL methods are usually slow
• Humans can reuse past knowledge• Transfer learning in deep RL is an open problem
• Not clear what the reward function should be
• Not clear what the role of prediction should be
![Page 51: Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must](https://reader036.vdocuments.site/reader036/viewer/2022081606/5e013b8a153ac91b66572c22/html5/thumbnails/51.jpg)
Instead of trying to produce aprogram to simulate the adultmind, why not rather try toproduce one which simulates thechild's? If this were then subjectedto an appropriate course ofeducation one would obtain theadult brain.
- Alan Turing
general learning algorithm
environment
ob
serv
atio
ns
acti
on
s