deep reinforcement learning for robotics
TRANSCRIPT
Deep Reinforcement Learning for Robotics Pieter Abbeel -- UC Berkeley EECS
State-of-the-art object detection until 2012:
Deep Supervised Learning (Krizhevsky, Sutskever, Hinton 2012; also LeCun, Bengio, Ng, Darrell, …):
60 million learned parameters (since then, billions of parameters)
~1.2 million training images
Object Detection in Computer Vision
Input Image
Hand-engineered features (SIFT,
HOG, DAISY, …)
Support Vector
Machine (SVM)
“cat” “dog” “car” …
Input Image
8-layer neural network with 60 million parameters to learn
“cat” “dog” “car” …
Performance
graph credit Matt Zeiler, Clarifai
Performance
graph credit Matt Zeiler, Clarifai
Performance
graph credit Matt Zeiler, Clarifai
AlexNet
Performance
graph credit Matt Zeiler, Clarifai
AlexNet
Performance
graph credit Matt Zeiler, Clarifai
AlexNet
Speech Recognition
graph credit Matt Zeiler, Clarifai
History
Is deep learning 3, 30, or 60 years old?
2000s Sparse, Probabilistic, and Energy models (Hinton, Bengio, LeCun, Ng)
Rosenblatt’s Perceptron
(Olshausen, 1996)
based on history by K. Cho
Data
1.2M training examples
* 2048 (shifts)
* 90 (PCA re-coloring)
1.2M * 2k *90 ~ 0.216 trillion
Human eye: 1k frames/s
~6.84yrs
Compute power
Two NVIDIA GTX 580 GPUs
5-6 days of training time
What’s Changed Nonlinearity
Sigmoid
ReLU
Regularization
Drop-out
(Training data augmentation)
Exploration of model structure
Optimization know-how
State-of-the-art object detection until 2012:
Deep Supervised Learning (Krizhevsky, Sutskever, Hinton 2012; also LeCun, Bengio, Ng, Darrell, …):
60 million learned parameters (since then, billions of parameters)
~1.2 million training images
Object Detection in Computer Vision
Input Image
Hand-engineered features (SIFT,
HOG, DAISY, …)
Support Vector
Machine (SVM)
“cat” “dog” “car” …
Input Image
8-layer neural network with 60 million parameters to learn
“cat” “dog” “car” …
Current state-of-the-art robotics
Deep reinforcement learning
Robotics
Percepts Hand-
engineered state-
estimation
Many-layer neural network
with many parameters to learn
Hand-engineered
control policy class
Hand-tuned (or learned) 10’ish free parameters
Motor commands
Percepts Motor commands
Reinforcement Learning (RL)
Robotics
Marketing / Advertising
Dialogue
Optimizing operations / logistics
Queue management
…
Robot + Environment
probability of taking action a in state s
How About Deep RL?
Pong Enduro Beamrider Q*bert
Deep Q-learning [Mnih et al, 2013]
Monte Carlo Tree Search [Xiao-Xiao et al, 2014]
Trust Region Policy Optimization [Schulman, Levine, Moritz, Jordan, A., 2014]
Deep Reinforcement Learning for Atari Games
Pong Enduro Beamrider Q*bert
[Schulman, Levine, Moritz, Jordan, Abbeel, ICML 2015]
Experiments in Locomotion
How About Real Robotic Visuo-Motor Skills?
Architecture (92,000 parameters)
[Levine*, Finn*, Darrell, Abbeel, 2015, TR at: rll.berkeley.edu/deeplearningrobotics]
Block Stacking – Learning the Controller for a Single Instance
Learned Skills
Architectures for shared learning / transfer learning
Multiple robots and sensors (including simulation)
Multiple tasks
Simulation – Real world
Frontiers / Limitations Exploration
Controllers that require memory / estimation
Temporal hierarchy
Thank you