introduction to deep reinforcement learning model-based
TRANSCRIPT
![Page 1: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/1.jpg)
Introduction to Deep Reinforcement Learning
Model-based Methods
Zhiao Huang
![Page 2: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/2.jpg)
Model-free Methods
• The model 𝑝(𝑠′|𝑠, 𝑎) is unknown
• we solve 𝑄, 𝑉 and the policy from the sampled
trajectories/transitions
![Page 3: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/3.jpg)
Model-based Method
• Learn the environment model directly
𝑝(𝑠′|𝑠, 𝑎)
• Learn 𝑅(𝑠, 𝑎, 𝑠′) if it’s unknown
![Page 4: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/4.jpg)
Model-based Method
• Learn the environment model directly by supervised
learning
𝑝(𝑠′|𝑠, 𝑎)
• Search the solution with the model directly
![Page 5: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/5.jpg)
Model-based Methods
Physics
Geometry
Probability model
Inverse Dynamics
Game Engine
….
MCTS
CEM
RL
iLQR
RRT/PRM
……
Model Search
• Model and search have broad meanings
![Page 6: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/6.jpg)
Model-Predictive Control
• Forward model with parameters 𝜃
𝑓𝜃 𝑠, 𝑎 : 𝑆 × 𝐴 → 𝑆
• Predicts what will happen if we execute the action
𝑎 at the state 𝑠
![Page 7: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/7.jpg)
Forward Model
• We may have a forward model in mind
![Page 8: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/8.jpg)
Learning the Forward Model
• Sample transitions (𝑠, 𝑎, 𝑠′) from the replay buffer
and train the model with supervised learning
min𝜃
𝐸 𝑓𝜃 𝑠, 𝑎 − 𝑠′ 2
![Page 9: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/9.jpg)
Rollout
• Predict a short trajectory 𝑠1, 𝑠2, … , 𝑠𝑇 if we start at 𝑠0and execute 𝑎0, 𝑎1, 𝑎2, … , 𝑎𝑇−1 sequentially
𝑠0 𝑠1 𝑠2 𝑠𝑇⋯𝑓𝜃(𝑠0, 𝑎0) 𝑓𝜃(𝑠1, 𝑎1) 𝑓𝜃(𝑠2, 𝑎2) 𝑓𝜃(𝑠𝑇−1, 𝑎𝑇−1)
“rollout” the forward model
![Page 10: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/10.jpg)
Model-Predictive Control
• Given the forward model 𝑓𝜃 and the current state 𝑠0,
find a sequence of action 𝑎0, 𝑎1, 𝑎2, … , 𝑎𝑇−1 such
that has the maximum reward
max𝑎0:𝑇−1
σ𝑖=1𝑇 𝑅(𝑠𝑖 , 𝑎𝑖 , 𝑠𝑖+1) s.t. 𝑠𝑖+1 = f𝜃(𝑠𝑖 , 𝑎𝑖)
![Page 11: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/11.jpg)
Random Shooting
• Sample 𝑁 random action sequences:
𝑎01, 𝑎1
1, 𝑎21, … , 𝑎𝑇−1
1
𝑎02, 𝑎1
2, 𝑎22, … , 𝑎𝑇−1
2
⋮
𝑎01, 𝑎1
1, 𝑎21, … , 𝑎𝑇−1
1
![Page 12: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/12.jpg)
Random Shooting
• Evaluate the reward of each action sequence by
simulating the model 𝑓𝜃
𝑠0
𝑠11
𝑠21 𝑠𝑇
1⋯
“rollout” the forward model 𝑓𝜃(𝑠, 𝑎)
𝑠12 𝑠2
2 𝑠𝑇2⋯
𝑠1𝑁
𝑠2𝑁
𝑠𝑇𝑁⋯
⋮
𝑎01
𝑎21
𝑎31
𝑎𝑇−11
𝑎02 𝑎1
2 𝑎22 𝑎𝑇−1
2
𝑎0N
𝑎1𝑁
𝑎2𝑁 𝑎𝑇−1
𝑁
𝑅1
𝑅2
𝑅𝑁
![Page 13: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/13.jpg)
Random Shooting
• Return the best action sequence 𝑎0:𝑇−1∗ and execute
in the real environment
𝑠0
𝑠11
𝑠21 𝑠𝑇
1⋯
𝑠12 𝑠2
2 𝑠𝑇2⋯
𝑠1𝑁
𝑠2𝑁
𝑠𝑇𝑁⋯
⋮
𝑎01
𝑎21
𝑎31
𝑎𝑇−11
𝑎02 𝑎1
2 𝑎22 𝑎𝑇−1
2
𝑎0N
𝑎1𝑁
𝑎2𝑁 𝑎𝑇−1
𝑁
𝑅1
𝑅2
𝑅𝑁
![Page 14: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/14.jpg)
Planning at Each Step
• The action sequence 𝑎1, 𝑎2, … , 𝑎𝑇−1 maximize the
reward from s1′ = 𝑓𝜃(𝑠0, 𝑎0) but not 𝑠1
• Search new actions for state 𝑠1 again!
𝑠0 𝑠1′
𝑠1
Model
Reality
If we execute the searched action sequence 𝑎0, 𝑎1, 𝑎2, …in the environment
![Page 15: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/15.jpg)
Model Predictive Control
• Repeat
• Observe the current state 𝑠
• Sample 𝑁 random action trajectories
• Evaluate the reward of each action sequence
from 𝑠 with the model 𝑓𝜃; Find the best action
sequence {𝑎0𝑘 , 𝑎1
𝑘 , … , 𝑎𝑇−1𝑘 }
• Execute 𝑎0𝑘 in the environment
![Page 16: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/16.jpg)
Cross-Entropy Method
• Black-box Function Optimization
max𝑥
𝑓(𝑥)
• We have many other choices
• Cross Entropy Method
![Page 17: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/17.jpg)
Cross-Entropy Method
• Basically the simplest evolutionary algorithm
• Maintain the distribution of solutions
![Page 18: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/18.jpg)
Cross-Entropy Method
• Initialize 𝜇 ∈ ℝ𝑑 , 𝜎 ∈ ℝ>0𝑑
• For iteration = 1,2, …
• Sample 𝑛 candidates 𝑥𝑖~𝑁(𝜇, diag(𝜎2))
• For each 𝑥𝑖 evaluate its value 𝑓(𝑥𝑖)
• Select the top k of 𝑥 as elites
• Fit a new diagonal Gaussian to those samples
and update 𝜇, 𝜎
![Page 19: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/19.jpg)
Cross-Entropy Method (in Python)
![Page 20: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/20.jpg)
Model Predictive Control
• Hyper parameters
𝜇𝑎 , 𝜎𝑎, 𝑛iter, 𝑛pop, 𝑛elite
• Initialize an action sequence 𝜇 = {𝑎𝑖 = 𝜇𝑎}𝑖<𝑇• Repeat
• Observe the current state 𝑠
• Search the new action sequence with CEM
{𝑎0′ , 𝑎1
′ , … , 𝑎𝑇−1′ } = CEM(𝜇, {𝜎𝑎}𝑖<𝑇)
• Execute 𝑎0′ in the environment
• Update 𝜇 ← {𝑎1′ , 𝑎2
′ , … , 𝑎𝑇−1′ , 𝜇𝑎}
![Page 21: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/21.jpg)
Model Predictive Control
• Hyper parameters
𝜇𝑎 , 𝜎𝑎, 𝑛iter, 𝑛pop, 𝑛elite
• Initialize an action sequence 𝜇 = {𝑎𝑖 = 𝜇𝑎}𝑖<𝑇• Repeat
• Observe the current state 𝑠
• Search the new action sequence with CEM
{𝑎0′ , 𝑎1
′ , … , 𝑎𝑇−1′ } = CEM(𝜇, {𝜎𝑎}𝑖<𝑇)
• Execute 𝑎0′ in the environment
• Update 𝜇 ← {𝑎1′ , 𝑎2
′ , … , 𝑎𝑇−1′ , 𝜇𝑎}
![Page 22: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/22.jpg)
Notes
• CEM performs well for most control tasks
• Instead of searching for action sequence, we can
also search for the parameters of the network
• General “Gradient Descent”
• CEM and Random shooting work poorly for very
long horizons 𝑇 or dense reward
![Page 23: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/23.jpg)
Performance of CEM
• When the model is known
![Page 24: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/24.jpg)
Comparisons
• On-policy methods: Policy
• Off-policy methods: Value/Q
• Model-based methods: Model
Model ⇒ Value ⇒ Policy
![Page 25: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/25.jpg)
Comparisons
• How difficult to model it
• Q Value > Policy
• Model depends on the priors
• Robustness
• Model < Q Value < Policy
• Time complexity
• Model > Q/Policy
• Data-efficient/Generalization
• Model > Q Value > Policy
![Page 26: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/26.jpg)
Conclusion
• Very few data / We know the model well
• Model-based methods
• We can’t model the environment and we don’t
want to sample too much
• Off-policy methods
• We have enough time/money
• Off-policy + On-policy methods
![Page 27: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/27.jpg)
Sample-based Motion Planning
![Page 28: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/28.jpg)
Topics
• Problem formulation
• Probabilistic roadmap method (PRM)
• Rapidly exploring random trees (RRT)
![Page 29: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/29.jpg)
Topics
• Problem formulation
• Probabilistic roadmap method (PRM)
• Rapidly exploring random trees (RRT)
![Page 30: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/30.jpg)
Configuration Space
• Configuration space( -space) is a subset of containing all possible states of the system(state space in RL).
• contains all valid states.• represents obstacles. • Examples:
• All valid poses of a robot.• All valid joint values of a robot.• …
C C ℝn
Cfree ⊆ CCobs ⊆ C
![Page 31: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/31.jpg)
Motion Planning
• Problem:• Given a configuration space • Given start state and goal state in • Calculate a sequence of actions that leads from
start to goal• Challenge:
• Need to avoid obstacles• Long planning horizon• High-dimensional planning space
Cfreeqstart qgoal Cfree
![Page 32: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/32.jpg)
Motion Planning
LaValle, Steven M. Planning algorithms. Cambridge university press, 2006.
![Page 33: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/33.jpg)
Examples
https://medium.com/@theclassytim/robotic-path-planning-rrt-and-rrt-212319121378
![Page 34: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/34.jpg)
Examples
Ratliff N, Zucker M, Bagnell J A, et al. CHOMP: Gradient optimization techniques for efficient motion planning, ICRA 2009Schulman, John, et al. Finding Locally Optimal, Collision-Free Trajectories with Sequential Convex Optimization, RSS 2013
![Page 35: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/35.jpg)
Sample-based algorithm• The key idea is to explore a smaller subset of
possibilities randomly without exhaustively exploring all possibilities.
• Pros:• Probabilistically complete• Solve the problem after knowing partial of • Apply easily to high-dimensional -space
• Cons:• Requires find path between two close points• Does not work well when the connection of is
bad• Never optimal
CfreeC
Cfree
![Page 36: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/36.jpg)
Topics
• Problem formulation
• Probabilistic roadmap method (PRM)
• Rapidly exploring random trees (RRT)
![Page 37: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/37.jpg)
Probabilistic roadmap(PRM)
• The algorithm contains two stages:• Construction phase
• Randomly sample states in • Connect every sampled state to its neighbors• Connect the start and goal state to the graph
• Query phase• Run path finding algorithms like Dijkstra
Cfree
Kavraki, Lydia E., et al. "Probabilistic roadmaps for path planning in high-dimensional configuration spaces." IEEE transactions on Robotics and Automation 12.4 (1996): 566-580.
![Page 38: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/38.jpg)
Rejection Sampling
• Aim to sample uniformly in .• Method
• Sample uniformly over .• Reject the sample not in the feasible area.
Cfree
C
![Page 39: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/39.jpg)
Pipeline
![Page 40: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/40.jpg)
Challenges
• Connect neighboring points: • In general it requires solving boundary value
problem.• Collision checking:
• It takes a lot of time to check if the edges are in the configuration space.
![Page 41: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/41.jpg)
Example
• PRM generates a graph such that every edge is in the configuration space without colliding with obstacles.
G = (V, E)
![Page 42: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/42.jpg)
Example• Find the path from start state to goal state qstart qgoal
![Page 43: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/43.jpg)
Limitations:narrow passages• It is unlikely to sample the points in the narrow bridge
•
![Page 44: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/44.jpg)
Gaussian Sampling• Generate one sample uniformly in the configuration
space• Generate another sample from a Gaussian
distribution • If and then add
q1
q2𝒩(q1, σ2)
q1 ∈ Cfree q2 ∉ Cfree q1
![Page 45: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/45.jpg)
Bridge sampling• Generate one sample uniformly in the configuration
space• Generate another sample from a Gaussian
distribution
•
• If are not in then add
q1
q2𝒩(q1, σ2)
q3 =q1 + q2
2q1, q2 Cfree q3
Sun, Zheng, et al. "Narrow passage sampling for probabilistic roadmap planning." IEEE Transactions on Robotics 21.6 (2005): 1105-1115.
![Page 46: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/46.jpg)
Topics
• Problem formulation
• Probabilistic roadmap method (PRM)
• Rapidly exploring random trees (RRT)
![Page 47: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/47.jpg)
Rapidly-exploring random tree(RRT)
• RRT grows a tree rooted at the start state by using random samples from configuration space.
• As each sample is drawn, a connection is attempted between it and the nearest state in the tree. If the connection is in the configuration space, this results in a new state in the tree.
![Page 48: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/48.jpg)
Extend operation
![Page 49: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/49.jpg)
Pipeline
![Page 50: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/50.jpg)
Challenges
• Find nearest neighbor in the tree• We need to support online quick query• Examples: KD Trees
• Need to choose a good to expand the tree efficiently• Large : hard to generate new samples • Small : too many samples in the tree
ϵϵϵ
![Page 51: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/51.jpg)
Examples
![Page 52: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/52.jpg)
RRT-Connect
• Grow two trees starting from and respectively instead of just one.
• Grow the trees towards each other rather than random configurations
• Use stronger greediness by growing the tree with multiple epsilon steps instead of a single one.
qstart qgoal
Kuffner, James J., and Steven M. LaValle. "RRT-connect: An efficient approach to single-query path planning." Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No. 00CH37065). Vol. 2. IEEE, 2000.
![Page 53: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/53.jpg)
A single RRT-Connect iteration
![Page 54: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/54.jpg)
One tree grown using a random target
![Page 55: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/55.jpg)
New node becomes target for the other tree
![Page 56: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/56.jpg)
Calculate nearest node to target
![Page 57: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/57.jpg)
Try to add the new node
![Page 58: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/58.jpg)
If successful, keep extending
![Page 59: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/59.jpg)
If successful, keep extending
![Page 60: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/60.jpg)
If successful, keep extending
![Page 61: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/61.jpg)
Path found if branch reaches target
![Page 62: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/62.jpg)
Local planners
• Both RRT and PRM rely on a local planner which gives a sequence of action to connect two states.
• RL can give local planners without solving the dynamics equations explicitly.
• RL cannot solve long term high dimensional planning problem efficiently due to the exploration.
![Page 63: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/63.jpg)
PRM+RL• Sample some landmarks and build a graph in the
configuration space like PRM.• Find the planning path.• Action sequence is found by RL algorithm like DDPG.
Huang, Zhiao, Fangchen Liu, and Hao Su. "Mapping state space using landmarks for universal goal reaching." Advances in Neural Information Processing Systems. 2019.
![Page 64: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/64.jpg)
Moveit• Moveit is package on ROS• A lot of useful motion planning algorithm can be found
![Page 65: Introduction to Deep Reinforcement Learning Model-based](https://reader031.vdocuments.site/reader031/viewer/2022012101/61db8e89932c3a797949efbc/html5/thumbnails/65.jpg)
Summary
• Motion planning• PRM
• Sampling: Gaussian sampling, bridge sampling• RRT
• RRT-Connect• Motion planning + RL
• PRM + RL