tutorial on safe exploration for reinforcement learning · 2019-07-10 · illustration of safe...
TRANSCRIPT
![Page 1: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/1.jpg)
Tutorial on Safe Exploration for Reinforcement Learning
Felix Berkenkamp, Angela P. Schoellig, Andreas Krause
@RL Summer SCOOL, July 10th 2019
![Page 2: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/2.jpg)
Reinforcement Learning (RL)
2
Need to trade exploration & exploitation
Reinforcement Learning: An IntroductionR. Sutton, A.G. Barto, 1998
Felix Berkenkamp, Andreas Krause
State
Action
Agent
Environment
Reward
![Page 3: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/3.jpg)
Towards Safe RL
3
...
Felix Berkenkamp, Andreas Krause
How can we learn to actsafely in unknown environments?
![Page 4: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/4.jpg)
Therapeutic Spinal Cord Stimulation
4
S. Harkema, The Lancet, Elsevier
Felix Berkenkamp, Andreas Krause
Safe Exploration for Optimization with Gaussian ProcessesY. Sui, A. Gotovos, J. W. Burdick, A. Krause
Stagewise Safe Bayesian Optimization with Gaussian ProcessesY. Sui, V. Zhuang, J. W. Burdick, Y. Yue
![Page 5: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/5.jpg)
Safe Controller Tuning
5Felix Berkenkamp, Andreas Krause
Safe Controller Optimization for Quadrotors with Gaussian ProcessesF. Berkenkamp, A. P. Schoellig, A. Krause, ICRA 2016
![Page 6: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/6.jpg)
Outline
6
Specifying safety requirements and quantify risk
Acting safely in known environments
Acting safely in unknown environments
Safe exploration (model-free and model-based)
Felix Berkenkamp, Andreas Krause
![Page 7: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/7.jpg)
Specifying safe behavior
7
Is this trajectory safe?
Monitoring temporal properties of continuous signalsO. Maler, D. Nickovic, FT, 2004
Safe Control under UncertaintyD. Sadigh, A. Kapoor, RSS, 2016
Felix Berkenkamp, Andreas Krause
![Page 8: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/8.jpg)
What does it mean to be safe?
8Felix Berkenkamp, Andreas Krause
Safety ≅ avoid bad trajectories (states/actions)
Fix a policy How do I quantify uncertainty and risk?
State
Action
Agent
Environment
Reward
![Page 9: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/9.jpg)
Stochastic environment / policy
9
Expected safety
Safety function is now a random variable
Felix Berkenkamp, Andreas Krause
![Page 10: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/10.jpg)
Expected safety can be misleading
10Felix Berkenkamp, Andreas Krause
Safe in expectation!
0
![Page 11: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/11.jpg)
Expected safety and variance
11Felix Berkenkamp, Andreas Krause
![Page 12: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/12.jpg)
Risk sensitivity
12
Even at low variance, a significant amount of trajectories may still be unsafe.
Felix Berkenkamp, Andreas Krause
![Page 13: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/13.jpg)
Value at Risk
13
Use confidence lower-bound instead!
Felix Berkenkamp, Andreas Krause
![Page 14: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/14.jpg)
Conditional Value at Risk
14Felix Berkenkamp, Andreas Krause
![Page 15: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/15.jpg)
Worst-case
15Felix Berkenkamp, Andreas Krause
![Page 16: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/16.jpg)
Notions of safety
16
Stochastic Worst--case
Expected risk
Moment penalized
Value at risk
Conditional value at risk
→ Robust Control
→ Formal verification
Felix Berkenkamp, Andreas Krause
![Page 17: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/17.jpg)
Acting in known model with safety constraints
17Felix Berkenkamp, Andreas Krause
Constrained Markov decision processesEitan Altman, CRC Press, 1999
Risk-sensitive Markov decision processesRonald A. Howard, James E. Matheson, 1972
Markov Decision Processes with Average-Value-at-Risk CriteriaNicole Bäuerle, Jonathan Ott
State
Action
Agent
Environment
Reward
![Page 18: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/18.jpg)
Reinforcement Learning
18Felix Berkenkamp, Andreas Krause
State
Action
Agent
Environment
Reward
Key challenge: Don’t know the consequences of actions!
![Page 19: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/19.jpg)
How to start acting safely?
19
No knowledge!Now what??
Felix Berkenkamp, Andreas Krause
![Page 20: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/20.jpg)
Initial policy
20Felix Berkenkamp, Andreas Krause
Can find an initial, safe policy based on domain knowledge.
State
Action
Agent
Environment
Reward
How to improve?
![Page 21: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/21.jpg)
Prior knowledge as backup for learning
21
Provably safe and robust learning-based model predictive controlA. Aswani, H. Gonzalez, S.S. Satry, C.Tomlin, Automatica, 2013
Safe Reinforcement Learning via Shielding M. Alshiekh, R. Bloem, R. Ehlers, B. Könighofer, S. Nickum, U.Topcu, AAAI, 2018
Linear Model Predictive Safety Certification for Learning-based ControlK.P. Wabersich, M.N. Zeilinger, CDC, 2018
Safety controller takes over
Know what is safeLearner is seen as a disturbance
Safe Exploration of State and Action Spaces in Reinforcement LearningJ. Garcia, F. Fernandez, JAIR, 2012
Safe Exploration in Continuous Action SpacesG. Dalai, K. Dvijotham, M. Veccerik, T. Hester, C. Paduraru, Y. Tassa, arXiv, 2018
Felix Berkenkamp, Andreas Krause
![Page 22: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/22.jpg)
Prior knowledge as backup for learning
22
Safety controller takes over
Know what is safeLearner is seen as a disturbance
Felix Berkenkamp, Andreas Krause
Need to know what is unsafe in advance.
Without learning, need significant prior knowledge.
The learner does not know what’s happening!
![Page 23: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/23.jpg)
Safety as improvement in performance (Expected safety)
23
Di Castro, Shie Mannor, ICML 2012]
Performance
Initial, stochastic policy
Safety constraint
Need to estimate based only on data from
Felix Berkenkamp, Andreas Krause
![Page 24: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/24.jpg)
Off-Policy Policy Evaluation
24Felix Berkenkamp, Andreas Krause
What does this tell me about a different policy ?
Importance sampling:
(there are better ways to do this)
Eligibility Traces for Off-Policy Policy EvaluationDoina Precup, Richard S. Sutton, S. Singh
![Page 25: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/25.jpg)
Guaranteeing improvement
25
Generate trajectories using
Unbiased estimate of . What about ?
With probability at least :
Use concentration inequality to obtain confidence intervals
Felix Berkenkamp, Andreas Krause
![Page 26: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/26.jpg)
Overview of expected safety pipeline
26
Trajectory dataTraining set Candidate policy
Evaluate safetyTest set
Use new policy if safeHigh Confidence Policy ImprovementPhilip S. Thomas, Georgios Theocharous, Mohammad Ghavamzadeh, ICML 2015
Constrained Policy OptimizationJoshua Achiam, David Held, Aviv Tamar, Pieter Abbeel, ICML, 2017
Felix Berkenkamp, Andreas Krause
![Page 27: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/27.jpg)
Summary part one
27
Stochastic- Expected risk- Moment penalized- VaR / CVaR
Worst-case- Formal verification- Robust optimization
Reviewed safety definitions
Requirement for prior knowledge
Reviewed a first method for safe learning in expectation
Second half: Explicit safe exploration
More model-free safe exploration
Model-based safe exploration without ergodicity
Felix Berkenkamp, Andreas Krause
![Page 28: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/28.jpg)
Policy
Reinforcement learning (recap)
28
Image: Plainicon, https://flaticon.com
Exploration
Policy update
Felix Berkenkamp, Andreas Krause
![Page 29: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/29.jpg)
Safe reinforcement learning
29
Image: Plainicon, https://flaticon.com
Statistical models to guarantee safety
Direct policy optimization Model-based
Estimateand optimize
Estimate/identify,then plan/control
Felix Berkenkamp, Andreas Krause
![Page 30: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/30.jpg)
Model-free reinforcement learning
30
Tracking performance
Safety constraint
Few, noisy experiments
Safety for all experiments
Felix Berkenkamp, Andreas Krause
![Page 31: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/31.jpg)
Safe policy optimization
31
(Noisy) Reward
(Noisy) Constraint
Goal:
Safety: with probability ≥ 1-ẟ
![Page 32: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/32.jpg)
Safe policy optimization illustration
32
J(𝛉)
𝛉
Safety threshold
X
Safe seed
*
Reachableoptimum O
Globaloptimum=g(𝛉)
![Page 33: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/33.jpg)
Starting Point: Bayesian Optimization
33Felix Berkenkamp, Andreas Krause
Expected/most prob. improvement [Močkus et al. ’78,’89]Information gain about maximum [Villemonteix et al. ’09Knowledge gradient [Powell et al. ’10]Predictive Entropy Search [Hernández-Lobato et al. ’14] TruVaR [Bogunovic et al’17]Max Value Entropy Search [Wang et al’17]
Constraints/Multiple Objectives [Snoek et al. ‘13, Gelbart et al. ’14, Gardner et al. ‘14, Zuluaga et al. ‘16]
Acquisition function
![Page 34: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/34.jpg)
Gaussian process
34Felix Berkenkamp, Andreas Krause
![Page 35: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/35.jpg)
SafeOPT: Constrained Bayesian optimization
35Felix Berkenkamp, Andreas Krause
![Page 36: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/36.jpg)
SafeOPT Guarantees
36
Theorem (informal):
Under suitable conditions on the kernel and on , there exists a function such that for any ε>0 and δ>0, it holds with probability at least that
1) SAFEOPT never makes an unsafe decision
2) After at most iterations, it found an ε-optimal reachable point
Safe Exploration for Active Learning with Gaussian ProcessesJ. Schreiter, D. Nguyen-Tuong, M. Eberts, B. Bischoff, H. Markert, M. Toussaint
Safe Exploration for Optimization with Gaussian ProcessesY. Sui, A. Gotovos, J.W. Burdick, A. Krause
Bayesian Optimization with Safety Constraints: Safe and Automatic Parameter Tuning in RoboticsF.Berkenkamp, A.P. Schoellig, A. Krause
Felix Berkenkamp, Andreas Krause
![Page 37: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/37.jpg)
37Felix Berkenkamp
![Page 38: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/38.jpg)
38Felix Berkenkamp
![Page 39: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/39.jpg)
Modelling context
39
Additional parameters
Felix Berkenkamp, Andreas Krause
![Page 40: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/40.jpg)
40Felix Berkenkamp
![Page 41: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/41.jpg)
Multiple sources of information
41
cheap, inaccurate expensive, accurate
Automatic tradeoff
Virtual vs. Real: Trading Off Simulations and Physical Experiments in Reinforcement Learning with Bayesian OptimizationA. Marco, F. Berkenkamp, P. Hennig, A. Schöllig, A. Krause, S. Schaal, S. Trimpe, ICRA'17
Felix Berkenkamp, Andreas Krause
![Page 42: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/42.jpg)
Modeling this in a Gaussian process
42
simulation experiment
Felix Berkenkamp, Andreas Krause
![Page 43: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/43.jpg)
43Felix Berkenkamp
![Page 44: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/44.jpg)
Safe reinforcement learning
44
Image: Plainicon, https://flaticon.com
Statistical models to guarantee safety
Direct policy optimization Model-based
Estimateand optimize
Estimate/identify,then plan/control
Felix Berkenkamp, Andreas Krause
![Page 45: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/45.jpg)
From bandits to Markov decision processes
45
Bandit Markov Decision Process
Can use the same Bayesian model to determine safety of states
Felix Berkenkamp, Andreas Krause
![Page 46: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/46.jpg)
Challenges with long-term action dependencies
46
Image: Plainicon, VectorsMarket, https://flaticon.com
Non-ergodic MDP
Felix Berkenkamp, Andreas Krause
![Page 47: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/47.jpg)
Rendering exploration safe
47
Safe Exploration in Markov Decision ProcessesT.M. Moldovan, P. Abbeel, ICML, 2012
Safe Exploration in Finite Markov Decision Processes with Gaussian ProcessesM. Turchetta, F. Berkenkamp, A. Krause, NIPS, 2016
Safe Exploration and Optimization of Constrained MDPs using Gaussian ProcessesAkifumi Wachi, Yanan Sui, Yisong Yue, Masahiro Ono, AAAI, 2018
Exploration:
Reduce model uncertainty
Only visit states from which the agent can recover safely
Felix Berkenkamp, Andreas Krause
Safe Control under UncertaintyD. Sadigh, A. Kapoor, RSS, 2016
![Page 48: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/48.jpg)
On a real robot
48Felix Berkenkamp, Andreas Krause
![Page 49: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/49.jpg)
From bandits to Markov decision processes
49
Bandit Markov Decision Process
Next: model-based reinforcement learning
Felix Berkenkamp, Andreas Krause
![Page 50: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/50.jpg)
Policy
Reinforcement learning (recap)
50
Image: Plainicon, https://flaticon.com
Exploration
Policy update
Felix Berkenkamp, Andreas Krause
![Page 51: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/51.jpg)
Policy
Model-based reinforcement learning
51
Image: Plainicon, https://flaticon.com
Exploration
Model learning
Felix Berkenkamp, Andreas Krause
![Page 52: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/52.jpg)
Policy
Safe model-based reinforcement learning
52
Image: Plainicon, https://flaticon.com
Safe exploration
Statistical model learning
Felix Berkenkamp, Andreas Krause
![Page 53: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/53.jpg)
A Bayesian dynamics model
53
Dynamics
Felix Berkenkamp, Andreas Krause
![Page 54: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/54.jpg)
Region of attraction
54
unsafe
Felix Berkenkamp, Andreas Krause
Baseline policy is safe
![Page 55: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/55.jpg)
Linear case
55
Safe and Robust Learning Control with Gaussian ProcessesF. Berkenkamp, A.P. Schoellig, ECC, 2015
Regret Bounds for Robust Adaptive Control of the Linear Quadratic RegulatorS. Dean, H. Mania, N. Matni, B. Recht, S. Tu, arXiv, 2018
Designing safe controllers for quadratic costs is a convex optimization problem
Uncertainty about entries
Felix Berkenkamp, Andreas Krause
![Page 56: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/56.jpg)
Outer approximation contains true dynamics for all time steps with probability at least
Forwards-propagating uncertain, nonlinear dynamics
56
Learning-based Model Predictive Control for Safe Exploration T. Koller, F. Berkenkamp, M. Turchetta, A. Krause, CDC, 2018
Felix Berkenkamp, Andreas Krause
![Page 57: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/57.jpg)
Region of attraction
57
unsafe safety trajectory
exploration trajectoryfirst step same
Theorem (informally):
Under suitableconditions can alwaysguarantee that we areable to return to thesafe set
Felix Berkenkamp, Andreas Krause
![Page 58: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/58.jpg)
Model predictive control references
58
Learning-based Model Predictive Control for Safe ExplorationT. Koller, F. Berkenkamp, M. Turchetta, A. Krause, CDC, 2018
Reachability-Based Safe Learning with Gaussian ProcessesA.K. Akametalu, J.F. Fisac, J.H. Gillula, S. Kaynama, M.N. Zeilinger, C.J. Tomlin, CDC, 2014
Robust constrained learning-based NMPC enabling reliable mobile robot path trackingC.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016
Data-Efficient Reinforcement Learning with Probabilistic Model Predictive ControlS. Kamthe, M.P. Deisenroth, AISTATS, 2018
Chance Constrained Model Predictive ControlA.T. Schwarm, M. Nikolaou, AlChE, 1999
Felix Berkenkamp, Andreas Krause
![Page 59: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/59.jpg)
Example
59Felix Berkenkamp, Andreas Krause
Robust constrained learning-based NMPC enabling reliable mobile robot path trackingC.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016
![Page 60: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/60.jpg)
Region of attraction
60
unsafe safety trajectory
exploration trajectoryfirst step same
Exploration limited by size of the safe set!
Felix Berkenkamp, Andreas Krause
![Page 61: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/61.jpg)
Region of attraction
61
unsafe
Theorem (informally):
Under suitable conditionscan identify (near-)maximal subset of X on which π is stable, while never leaving the safe set
Initial safe policy
Safe Model-based Reinforcement Learning with Stability GuaranteesF. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017
Felix Berkenkamp, Andreas Krause
![Page 62: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/62.jpg)
Lyapunov functions
62
[A.M. Lyapunov 1892]
Felix Berkenkamp, Andreas Krause
Lyapunov Design for Safe Reinforcement LearningT.J. Perkings, A.G. Barto, JMLR, 2002
![Page 63: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/63.jpg)
Lyapunov functions
63Felix Berkenkamp, Andreas Krause
![Page 64: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/64.jpg)
Illustration of safe learning
64
Policy
Need to safely explore!
Felix Berkenkamp, Andreas Krause
Safe Model-based Reinforcement Learning with Stability GuaranteesF. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017
![Page 65: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/65.jpg)
Illustration of safe learning
65
Policy
Felix Berkenkamp, Andreas Krause
Safe Model-based Reinforcement Learning with Stability GuaranteesF. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017
![Page 66: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/66.jpg)
Lyapunov function
66Felix Berkenkamp, Andreas Krause
Finding the right Lyapunov function is difficult!
The Lyapunov Neural Network: Adaptive Stability Certification for Safe Learning of Dynamic SystemsS.M. Richards, F. Berkenkamp, A. Krause
Weights - positive-definiteNonlinearities - trivial nullspace
Decision boundary
![Page 67: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/67.jpg)
Towards safe reinforcement learning
67
unsafe safety trajectory
exploration trajectoryfirst step same
Felix Berkenkamp, Andreas Krause
How to select exploration objective?
![Page 68: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/68.jpg)
Summary
68
Stochastic- Expected risk- Moment penalized- VaR / CVaR
Worst-case- Formal verification- Robust optimization
Reviewed safety definitions
Requirement for prior knowledge
Reviewed a first method for safe learning in expectation
Safe Bayesian optimization for safe exploration
How to transfer this intuition to the safe exploration in MDPs
Model-based methods (reachability=safety, certification, exploration)
Felix Berkenkamp, Andreas Krause
![Page 69: Tutorial on Safe Exploration for Reinforcement Learning · 2019-07-10 · Illustration of safe learning 64 Policy Need to safely explore! Felix Berkenkamp, Andreas Krause Safe Model-based](https://reader033.vdocuments.site/reader033/viewer/2022042308/5ed4006b8d46b66d22634054/html5/thumbnails/69.jpg)
Where to go from here?
69
Safe Reinforcement Learning
Machine learning
Control theory
Statistics
Decision theory
Scalability (computational & statistical)Safe learning in other contexts (e.g., imitation)Tradeoff safety and performance (theory & practice)Lower bounds; define function classes that are safely learnable
Formal methods
Felix Berkenkamp, Andreas Krause