reinforcement learning and motion planning - ros.org · i state-of-the-art: policy improvement with...

Post on 06-Aug-2019

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Reinforcement Learningand Motion Planning

Mrinal KalakrishnanUniversity of Southern California

August 25, 2010

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Reinforcement Learning

I Holy grail of learning for roboticsI Curse of dimensionality. . .

I Trajectory-based RLI High dimensionsI Continuous states and actionsI State-of-the-art: Policy Improvement

with Path Integrals - Theodorou etal., 2010

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Reinforcement Learning

I Holy grail of learning for roboticsI Curse of dimensionality. . .

I Trajectory-based RLI High dimensionsI Continuous states and actionsI State-of-the-art: Policy Improvement

with Path Integrals - Theodorou etal., 2010

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Motion Planning

I Sampling-based plannersI Solve very difficult problemsI Jerky paths, require smoothingI Feasible paths, not optimal

I Optimization-based plannersI CHOMP (Ratliff et al., 2009)I Covariant gradient descentI Smooth trajectoriesI Solves “easy” problemsI Local minima

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Motion Planning

I Sampling-based plannersI Solve very difficult problemsI Jerky paths, require smoothingI Feasible paths, not optimal

I Optimization-based plannersI CHOMP (Ratliff et al., 2009)I Covariant gradient descentI Smooth trajectoriesI Solves “easy” problemsI Local minima

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Our New Motion Planner

Apply PI2 to motion planning

I Create a new policy: x = K (u−x)I Control command u(t) = state x(t +1)I Quadratic control cost: uTRuI R = ATA:

I A is an acceleration differentiation matrixI R measures squared accelerations

I Cost = control cost + state costsI State costs can include:

I Collision costI Energy costI Constraint violation costI Need not be differentiable!

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Our New Motion Planner

Apply PI2 to motion planning

I Create a new policy: x = K (u−x)I Control command u(t) = state x(t +1)I Quadratic control cost: uTRuI R = ATA:

I A is an acceleration differentiation matrixI R measures squared accelerations

I Cost = control cost + state costsI State costs can include:

I Collision costI Energy costI Constraint violation costI Need not be differentiable!

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Our New Motion Planner

Apply PI2 to motion planning

I Create a new policy: x = K (u−x)I Control command u(t) = state x(t +1)I Quadratic control cost: uTRuI R = ATA:

I A is an acceleration differentiation matrixI R measures squared accelerations

I Cost = control cost + state costsI State costs can include:

I Collision costI Energy costI Constraint violation costI Need not be differentiable!

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Our New Motion Planner

Apply PI2 to motion planning

I Create a new policy: x = K (u−x)I Control command u(t) = state x(t +1)I Quadratic control cost: uTRuI R = ATA:

I A is an acceleration differentiation matrixI R measures squared accelerations

I Cost = control cost + state costsI State costs can include:

I Collision costI Energy costI Constraint violation costI Need not be differentiable!

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Collision cost

I Distance field / distance transformI Answers clearance and penetration

depth queriesI Voxelize robot body and add up

costs for each voxel

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Collision cost

I Distance field / distance transformI Answers clearance and penetration

depth queriesI Voxelize robot body and add up

costs for each voxel

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

The algorithm

I Generate initial straight-line trajectoryI Repeat until convergence:

I Create noisy rollouts around the trajectoryNoise does not modify start or goal due to Σ= R−1!

I Compute costs for each rolloutI Apply PI2 update: reward-weighted average

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

The algorithm

Initial trajectory

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

The algorithm

Noisy rollout

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

The algorithm

Noisy rollout

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

The algorithm

Noisy rollout

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

The algorithm

Noisy rollout

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

The algorithm

Noisy rollout

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

The algorithm

Noisy rollout

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

The algorithm

Updated trajectory

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Video: Pole

Updated trajectory

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Video: Test Setup

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Test Results

Condition Success rate

Unconstrained 39 / 42

Constrained 38 / 42

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Video: Real-world

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Conclusion

I Optimization-based motion planner that does not requiregradients

I Generates collision-free, smooth trajectoriesI Optimizes arbitrary secondary criteria (constraints,

torques)I May handle local minima better than CHOMP (needs

further testing)I ICRA 2011 submission pendingI Code is in the optimization_motion_planning

package, coming soon to a sandbox near you. . .

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Future Work

I Torque optimalityI Trajectory libraries, cached plans

Thanks:I Sachin ChittaI Peter PastorI Willow Garage

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

Future Work

I Torque optimalityI Trajectory libraries, cached plans

Thanks:I Sachin ChittaI Peter PastorI Willow Garage

Reinforcement Learning and Motion PlanningMrinal Kalakrishnan

top related