robust belief-based execution of manipulation programs

Robust Belief-based Execution ofManipulation Programs

Kaijen HsiaoTomás Lozano-PérezLeslie Pack Kaelbling

MIT CSAIL

Achieving Goals under Uncertainty

Two kinds of uncertainty:• current state:

• need to plan in information space• results of future actions:

• search branches on outcomes as well as actions

Choice of action must be dependent on current information state

Discrete POMDP Formulation

• states• actions• observations• transition model• observation

model• reward

Controller

SE

Environment

belief

actionsensing

POMDP Controller

• State estimation is discrete Bayesian filter• Policy maps belief states to actions

Action selection in POMDPs

• Off-line optimal policy generation• Intractable for large spaces

• On-line search: finite-depth expansion of belief-space tree from current belief state to select single action

• Tractable in broad subclass of problems

Challenges for action selection

• Continuous state spaces

• Requirement to select action for any belief state

• Long horizon

• Action branching factor

• Outcome branching factor

• Computationally complex observation and

transition models

Grasping in uncluttered environments

Points of leverage:

• Robot pose is approximately observable

• Robot dynamics are nearly deterministic

• Bounded uncertainty over unobserved

object parameters

• Room to maneuver

Online belief-space search

Continuous state space: discretize object state space

Discretize object configuration space

workspace

configuration space

belief state



Action for any belief: search forward from current belief state

Search forward from current belief

• Low entropy belief states enable reliable grasp• Use entropy as static evaluation function at leaves• Actions can be useful for information gathering




Long horizon: use temporally extended actions

Use temporally extended actions

Primitive actions Entire trajectoriesReduce horizon Observations at end




Long horizon: use temporally extended actionsLarge action branching factor: parameterize

small set of action types by current belief

Parameterize actions with belief

Actions are entire world-relative trajectories

In current belief state, • execute with respect to most likely object

configuration• terminate on contact or end of trajectory





small set of action types by current beliefComputationally complex observation and

transition models: precompute models

Precompute models

Execute WRT• with respect to estimated state e

• in world state w

Expected observation,transition

Based on geometric simulation





small set of action types by current beliefComputationally complex observation and

transition models: precompute modelsLarge observation branching factor: canonicalize

observations for each discrete state and action

Canonicalize observations

Any (e, w) pair with same relative transformation has same world-relative outcomes and observations

• Only sample for one e with w varying within initial range of uncertainty

Cluster observations and represent each bin of object configurations by a single representative one

• Only branch on canonical observations

Algorithm

Off-line:• plan WRTs for grasping and info gathering• compute models

On-line:• while current belief state doesn’t satisfy goal

• compute expected info gain of each WRT• execute best WRT until termination• use observation to update current belief• return to initial pose

• execute final grasp trajectory

Application to grasping with simulated robot arm

Initial conditions (ultimately from vision)

• Object shape is roughly known (contacted vertices should be within ~1 cm of actual positions)

• Object is on table and pose (x, y, rotation) is roughly known (center of mass std ~5 cm, 30 deg)

Achieve specific grasp of object

Observations

Fingertips: 6-axis force/torque sensors

• position • normal

Additional contact sensors:• just contact

Swept non-colliding path rules out poses that would have generated contact

Grasping a Box

Most likely robot-relative position Where it actually is

Initial belief state

Summed over theta

Tried to move down; finger hit corner

Probability of contact observation at each location

Updated belief

Re-centered

Trying again, with new belief

Back up Try again

Final state and observation

Grasp Observation probabilities

Updated belief state: Success!

Goal: variance < 1 cm x, 15 cm y, 6 deg theta

What if Y coord of grasp matters?

Need explicit information gathering

Simulation Experiments

Methods tested:

• Single open-loop execution of goal-achieving WRT with respect to the most likely state

• Repeated execution of goal-achieving WRT with respect to the most likely state

• Online selection of information-gathering and goal-achieving grasps (1-step lookahead)

Box experiments

Allowed variation in goal grasp: 1 cm, 1 cm, 5 degInitial uncertainty: 5 cm, 5 cm, 30 deg

0

20

40

60

80

100

open loop repeated WRT repeated WRT withinfo-grasp

Pe

rce

nt

gra

sp

ed

co

rre

ctl

y

Cup experiments

Cup experiments

Goal 1 cm x, 1 cm y, rotation doesn’t matter (no info-grasps used)Start uncertainty 30 deg theta (x,y varies)

0

20

40

60

80

100

1 3 5Uncertainty std in x,y (cm)

Per

cen

t gra

sped

co

rrec

tly

Open loop

RepeatedWRT

Increasing uncertainty

Grasping a Brita Pitcher

Target grasp:

Put one finger through the handle and grasp

Brita Pitcher experiments

Brita Pitcher results

Increasing uncertainty

0

10

20

30

40

50

60

70

80

90

100

loc 1, rot 3 loc 3, rot 9 loc 5, rot 15 loc 5, rot 30

Uncertainty standard dev (cm, deg)

Pe

rce

nt

gra

sp

ed

co

rre

ctl

y

Open loop withperfect info

Repeated WRT

Hand-generatedguarded moves

Open loop withimperfect info

Repeated WRTwith info-grasps

Other recent probabilistic approaches to manipulation

Off-line POMDP solution for grasping (Hsiao et al. 2007)

Bayesian state estimation using tactile sensors to locate object before grasping (Petrovskaya et al. 2006)

Finding a fixed trajectory that is most likely to succeed under uncertainty (Alterovitz et al. 2007, Burns and Brock 2007)

The End.

Timing For Brita Pitcher

(2.16 GHz processor, 3.24 GB RAM running Python, times in seconds)

1 cm3 deg

3 cm9 deg

5 cm15 deg

5 cm30 deg

Grid size 5733 16337 14415 24025

Computing observation matrix (1 traj)

12 33 29 51

1st belief-state update

4 10 10 19

Choosing 1st info-grasp

10 9 17 30

Number of Actions Used

1 cm 3 deg

3 cm9 deg

5 cm15 deg

5 cm 30 deg

Robust execution of target

1.9 2.5 3.3 3.5

Robust execution with info-grasps

not run 4.4 4.1 4.2

Creating Information-gain Trajectories

Trajectory generation• Generate endpoints, use randomized planner (such as

OpenRAVE) to find nominal collision-free path• Sweep through entire workspace

Choose a small set based on information gain from start uncertainty

robust belief-based execution of manipulation programs

Documents

current state

current belief statesearch

object state spaceaction

belief statelong horizonaction

space tree

actionschoice of action

search branches

small set of action