robot navigation: from abilities to capabilities€¦ · handles task and robot dynamics helicopter...

73
Robot Navigation: From Abilities to Capabilities Machine Learning in Robot Motion Planning Workshop @ IROS 2018 Aleksandra Faust, Ph.D Google Brain Robotics October 2018

Upload: others

Post on 20-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Robot Navigation: From Abilities to Capabilities Machine Learning in Robot Motion Planning Workshop @ IROS 2018

Aleksandra Faust, Ph.DGoogle Brain Robotics

October 2018

Page 2: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et
Page 3: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Go down the hallway and take the second right.

Navigation

Page 4: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Go down the hallway and take the second right.

Navigation

Perception

Planning

Controls

Slam

Page 5: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Go down the hallway and take the second right.

Navigation

Sight

Hearing

Sensors, hardware, geometry, determine how robot perceives the world

Page 6: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Go down the hallway and take the second right.

Navigation

Sight

Hearing

And what it can do. How it can communicate, move, and interact.

Page 7: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Abilities

Sight

Hearing

Let’s focus on the robot abilities, and learn the foundational motion behaviors.

Page 8: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Essential Navigation behavior

Moving obstacle avoidance

Sight

Hearing

Page 9: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Navigation behaviors based on robot’s abilities

● With primitive sensors● Robust to noise● Dynamically feasible● Transfers between

environments

Moving obstacle avoidance for real robots

Page 10: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Navigation behaviors based on robot’s abilities

● With primitive sensors● Robust to noise● Dynamically feasible● Transfers between

environments

Behaviors to learn: ● Point to point navigation ● Path following

Moving obstacle avoidance for real robots

Page 11: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Confidential + Proprietary

Learning navigation behaviors end to endUnder submission, https://arxiv.org/abs/1809.10124

Hao-Tian Lewis Chiang*, Aleksandra Faust*, Marek Fiser, Anthony Francis

*Equal contributions

Aleksandra Faust Anthony FrancisHao-Tien Chiang Marek Fiser

Page 12: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Learning navigation task

22x18 m

[Lillicrap et al. 2015]

True objective: reach goalWorld

Observations

DDPG

Observation, o

Parameters, θ Policy, 𝛑θ(o, a) = P(a|o)

ActionActor

Critic

[Chiang et al., under submission]

Page 13: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Learning navigation task

22x18 m

[Lillicrap et al. 2015]

True objective: reach goalWorld

Observations

DDPG

Observation, o

Parameters, θ Policy, 𝛑θ(o, a) = P(a|o)

ActionActor

Critic

[Mülling et al., ‘11]

[Faust et al, ‘14][Bagnell and Schneider ‘01]

[Yahya et al., ‘16]

[Levine et al., ‘16]

Handles sensor inputHandles task and robot dynamics

Helicopter image from [Kober et al, ‘13]

Page 14: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Learning Navigation Task

22x18 m

[Lillicrap et al. 2015]

Velocity and orientation @ 5 Hz

True objective: reach goalWorld

Observations: Noisy 1D lidar + goal + orientation

DDPG

Observation, o

Parameters, θ Policy, 𝛑θ(o, a) = P(a|o)

Action

22x18 m

Actor

Critic

[Chiang et al., under submission]

Page 15: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

RL Setup is Hard

22x18 m

[Lillicrap et al. 2015]

Velocity and orientation @ 5 Hz

True objective: reach goalWorld

Observations: Noisy 1D lidar + goal + orientation

DDPG

Observation, o

Parameters, θ Policy, 𝛑θ(o, a) = P(a|o)

Action

22x18 m

Actor

Critic

r(s | θr) = R(s, θr)Selecting reward is hard.

[Chiang et al., under submission]

Page 16: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

RL Setup is Hard

22x18 m

[Lillicrap et al. 2015]

Velocity and orientation @ 5 Hz

World

Observations: Noisy 1D lidar + goal + orientation

DDPG

Observation, o

Parameters, θ Policy, 𝛑θ(o, a) = P(a|o)

Action

22x18 m

Actor

CriticQ

FF Layer 2

FF Layer 1

FF Layer 3

FF Layer 2

FF Layer 1

r(s | θr) = R(s, θr)Selecting reward is hard.

Network architecture selection is hard.

[Chiang et al., under submission]

True objective: reach goal

Page 17: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Shaped-DDPG

Select best policy

]

]

]

]

]

]

DDPG Agents Training in parallel Evaluate

]

Select new weights

]

Spawn new training agent

Solution: large-scale gradient-free hyper-parameter optimization.

Page 18: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Shaped-DDPG

Shape the reward,fixed network

Select best policy

]

]

]

]

]

]

DDPG Agents Training in parallel Evaluate

]

Select new weights

]

Spawn new training agent

]]]]]] ]

[Chiang et al., under submission]

Solution: large-scale gradient-free optimization.

r(s | θr) = R(s, θr)Find the best reward function that maximizes the true objective

Page 19: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Shaped-DDPG

Shape the reward,fixed network

Select best policy

]

]

]

]

]

]

DDPG Agents Training in parallel Evaluate

]

Select new weights

]

Spawn new training agent

]]]]]] ]

[Chiang et al., under submission]

Solution: large-scale gradient-free optimization.

r(s | θr) = R(s, θr)

Each agent uses different reward

Each agent uses different reward

Each agent uses different reward

Each agent uses different reward

Each agent uses different reward

Each agent uses different reward function

Find the best reward function that maximizes the true objective

Page 20: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Shaped-DDPG

Shape the reward,fixed network

Select best policy

]

]

]

]

]

]

DDPG Agents Training in parallel Evaluate

]

Select new weights

]

Spawn new training agent

Shape actor and critic,fixed reward

Best policy

]

]]]]]] ]

]]]]]] ]

[Chiang et al., under submission]

Solution: large-scale gradient-free optimization.

Fixed r(s| θr)

Number of neurons in each layer in actor and critic

Fixed nn

Find the best reward function that maximizes the true objective

Find the best NN architecture, that maximizes the reward

Page 21: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Shaped-DDPG

Shape the reward,fixed network

Select best policy

]

]

]

]

]

]

DDPG Agents Training in parallel Evaluate

]

Select new weights

]

Spawn new training agent

Shape actor and critic,fixed reward

Best policy

]

]]]]]] ]

]]]]]] ]

[Chiang et al., under submission]

Solution: large-scale gradient-free optimization.

Fixed r(s| θr)

Number of neurons in each layer in actor and critic

Fixed nn

Find the best reward function that maximizes the true objective

Find the best NN architecture, that maximizes the reward

Each agent uses different reward

Each agent uses different reward

Each agent uses different reward

Each agent uses different reward

Each agent uses different reward

Each agent uses different neural network architecture.

Page 22: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Shaped-DDPG Learning Results for Path Following

Reward only shaping

1000 trails, 5 million steps each @ 5Hz - trains in a week[Chiang et al., under submission]

Page 23: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Shaped-DDPG Learning Results for Path Following

Reward only shaping Reward and NN shaping

Stable learning, consistent trials

1000 trails, 5 million steps each @ 5Hz - trains in a week. [Chiang et al., under submission]

Page 24: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Shaped-DDPG Learning Results for Path Following

Reward only shaping Reward and NN shaping

Stable learning, consistent trials

1000 trails, 5 million steps each @ 5Hz - trains in a week

Equivalent of 32 years of collective experience.12 days each trial, learning from previous generations

[Chiang et al., under submission]

Page 25: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

... ... ...

323 47 560

Actor network

Shaped-DDPG Learning Results for Path Following

[Chiang et al., under submission]

Page 26: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Shaped-DDPG Evaluation

● Two baselines● Learned: Vanilla DDPG● Classic: Artificial potential

fields● Evaluation environments

● 3 large buildings● With moving obstacles

22x18 m

Building 1, 183 by 66m

Building 2, 60 by 47m Building 3, 134 by 93mTraining, 23 by 18m

[Chiang et al., under submission]

Page 27: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Shaped-DDPG Evaluation: Success Rate

Success rate:shaped-DDPG, vanilla DDPG, classic APF

Higher success rate across all buildings

Path FollowingPoint to point

[Chiang et al., under submission]

Page 28: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Shaped-DDPG vs. vanilla DDPG

Shaped-DDPG: smooth trajectories

Vanilla DDPG: suboptimal behavior

[Chiang et al., under submission]

Page 29: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Shaped-DDPG Evaluation: Impact of Noise

Point to Point

Path Following

Shaped-DDPG policy is more robust to noise

[Chiang et al., under submission]

Page 30: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Shaped-DDPG: On-robot experiments

Path followingPoint to point[Chiang et al., under submission]

Page 31: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et
Page 32: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Impact of Number of Obstacles

No Moving obstacle 30 moving obstacles[Chiang et al., under submission]

Page 33: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Confidential + proprietary

Page 34: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Navigation behaviors based on robot’s abilities

Handles:Sensors to controls, dynamics, noiseObstacle avoidance

Transferable to new environmentsEasy sim2real

Learned end-end methods handle noise.

Shaping optimizes the trajectories.

Traditional methods: well behaved, brittle.

Page 35: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Confidential + Proprietary

Navigation capabilities:Learn to navigate by looking at a map

PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based PlanningICRA 2018, https://arxiv.org/abs/1710.03937, Best paper in Service Robotics

Aleksandra Faust, Oscar Ramirez, Marek Fiser, Kenneth Oslund, Anthony Francis, James Davidson Lydia Tapia

Aleksandra Faust Anthony FrancisMarek Fiser James DavidsonKen OslundOscar Ramirez Lydia Tapia

Page 36: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

How to navigate to a location on a map?

Transferable to new environmentsEasy sim2real

Lacks context needed for navigation.

Handles:Sensors to controls, dynamics, noiseObstacle avoidance

Page 37: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

● Long-distance collision-free navigation○ Approximate all possible robot motions○ Sample and connect robot poses○ Connect them with small, feasible, motion

transitions● Checking validity of pose transitions is

expensive● Sampling based planners that create

reusable roadmaps○ Often consider geometry only

Related work: Sampling-based planners

[Kavraki et al. ‘96]

[LaValle & Kuffner, ‘01]

[Hauser et al., ‘06] [Hsu et al., ‘02]

[LaValle & Kuffner, ‘01]

Page 38: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Related Work: Probabilistic Roadmaps (PRMs)

● Building[Kavraki et al. 1996][Kavraki et al. ‘96]

Page 39: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Related Work: Probabilistic Roadmaps (PRMs)

● Building○ Sample configuration space

[Kavraki et al. 1996][Kavraki et al. ‘96]

Page 40: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Related Work: Probabilistic Roadmaps (PRMs)

● Building○ Sample configuration space○ Reject in-collision samples

[Kavraki et al. 1996][Kavraki et al. ‘96]

Page 41: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Related Work: Probabilistic Roadmaps (PRMs)

● Building○ Sample configuration space○ Reject in-collision samples○ Connect samples only if a local

planner finds a collision-free path

[Kavraki et al. 1996][Kavraki et al. ‘96]

Page 42: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Related Work: Probabilistic Roadmaps (PRMs)

● Building○ Sample configuration space○ Reject in-collision samples○ Connect samples only if a local

planner finds a collision-free path

[Kavraki et al. 1996][Kavraki et al. ‘96]

Page 43: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Related Work: Probabilistic Roadmaps (PRMs)

● Building○ Sample configuration space○ Reject in-collision samples○ Connect samples only if a local

planner finds a collision-free path● Querying

[Kavraki et al. 1996][Kavraki et al. ‘96]

Page 44: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Related Work: Probabilistic Roadmaps (PRMs)

● Building○ Sample configuration space○ Reject in-collision samples○ Connect samples only if a local

planner finds a collision-free path● Querying

○ Add start and goal to the roadmap

[Kavraki et al. 1996][Kavraki et al. ‘96]

Page 45: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Related Work: Probabilistic Roadmaps (PRMs)

● Building○ Sample configuration space○ Reject in-collision samples○ Connect samples only if a local

planner finds a collision-free path● Querying

○ Add start and goal to the roadmap○ Find the shortest path in the graph

[Kavraki et al. 1996][Kavraki et al. ‘96]

Page 46: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Related Work: Probabilistic Roadmaps (PRMs)

● Building○ Sample configuration space○ Reject in-collision samples○ Connect samples only if a local

planner finds a collision-free path● Querying

○ Add start and goal to the roadmap○ Find the shortest path in the graph

● Path following○ Path guided artificial potential fields○ Reinforcement learning

[Kavraki et al. 1996]

[Chiang et al. 2015]

[Faust et al. 2017]

[Kavraki et al. ‘96]

Page 47: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

PRM-RL Algorithm

Trained point to point agent - basic navigation behavior.

[Faust et al., ICRA 2018]

]

RL Agent

Page 48: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

PRM-RL Algorithm

One time setup

[Faust et al., ICRA 2018]

]

RL Agent

Trained point to point agent - basic navigation behavior.

Page 49: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

PRM-RL Algorithm

Add an edge only ifRL agent can consistently navigate between two nodes

One time setup

[Faust et al., ICRA 2018]

Trained point to point agent - basic navigation behavior.

]

RL Agent

Page 50: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

PRM-RL Algorithm

Add an edge only ifRL agent can consistently navigate between two nodes

One time setup

Execute long trajectories[Faust et al., ICRA 2018]

]

RL Agent

Trained point to point agent - basic navigation behavior.

Page 51: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

PRM-RL: Indoor Navigation Building PRMs

180x65 m

60x47 m134x92 m

60x larger than the training

20 trials with 85% confidence

[Faust et al., ICRA 2018]

Page 52: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

2 hours to build

One-time set-up

Largest roadmap:

1700 nodes60 000 edges23 million collision checks

PRM-RL: Indoor Navigation Building PRMs

[Faust et al., ICRA 2018]

Page 53: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

PRM-RL: Results for Indoor Navigation

Longest trajectory 215 meters

45 waypoints

[Faust et al., ICRA 2018]

Page 54: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

PRM-RL Experimental Results

Four noisy trials

All successful, because the map is tuned to the robot’s abilities

[Faust et al., ICRA 2018]

Page 55: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

How to navigate to a location on a map?

Transferable to new environmentsEasy sim2real

Navigates over long distances.

Requires one time set-up. Handles:Sensors to controls, dynamics, noiseObstacle avoidance

Page 56: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Confidential + Proprietary

Navigation capabilities: Following DirectionsFollowing Natural Language Navigation Instructions with Deep Reinforcement LearningUnder submissionAleksandra Faust, Chase Kew, Dilek Hakkani-Tur, Marek Fiser, Pararth Shah

FollowNet: Towards Robot Navigation by Following Natural Language Directions with Deep Reinforcement LearningMLPC @ ICRA 2018, https://arxiv.org/abs/1805.06150Pararth Shah, Marek Fiser, Aleksandra Faust, Chase Kew, Dilek Hakkani-Tur

Aleksandra Faust J. Chase KewMarek Fiser Dilek Hakkani-Tur Pararth Shah

Page 57: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

How to follow directions?

Sensor to controls, dynamics, noiseObstacle avoidance

Transferable to new environmentsEasy sim2real

Go down the hallway and take the second right.

One time building / robot setup.

Hearing

Page 58: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Following instructions

Go down the hallway and take the second right.

Start

Goal

[Shah et al., 2018]

Page 59: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Following instructions

Go down the hallway and take the second right.

Start

Goal

[Shah et al., 2018]

Dataset: 150 instructions 2 buildings

Page 60: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Language complexity

Env

ironm

ent c

ompl

exity

Unfiltered, natural language

Partial observations

Full environment observability

Processed, language tokens

FollowNet

[Mei et al.,2016] [Chaplot et al., 2018]

[Das et al, 2018]

[Misra et al. 2017]

[Yu et al. 2018]

[Thomason, et al, 2015]

[Anderson et al., 2017]

[Arumugam et al. 2017]

[Thomason et al., 2017]

Following instructions: Related work

[Shah et al., 2018]

Page 61: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Following instructions: RL Training

22x18 m

DQN Agent[Mnih et al. 2015]

Actions:Turn left, turnright, go straight

Reward: waypoint reachedWorld

Observations: Images and NL instruction

FollowNet Architecture

Observation, o

Parameters, θ Policy, 𝛑θ(o, a) = P(a|o)

Action

Go down the hallway and take a second right.[Shah et al., 2018]

Page 62: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

FollowNet: Architecture

3, 8, and 16 outputs [1,1], [4,4], [3,3] kernels 1, 2, 1 strides.

32 outputs

16 and 8 hidden layers

16 hidden states

8 and 16 outputs [4, 4], [3, 3] kernels 2, 1 strides

[Shah et al., 2018]

Page 63: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

FollowNet: Results

Better learning curve than the baseline

Baseline: model without attention

Attention over steps

Learns what is important

Learns what to ignore

[Faust et al., under submission]

Page 64: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Instruction complexity

Number of waypoints measures the instruction complexity, not number of words or path length.

[Faust et al., under submission]

Page 65: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

FollowNet on New Instructions

52% success on new instructions67% at least partial success

30% increase over the baseline

Page 66: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

FollowNet performance per word

Words with spatial semantics are more likely to be successful

Landmarks, less so

[Faust et al., under submission]

Page 67: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et
Page 68: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

How to follow directions?

Sensor to controls, dynamics, noiseObstacle avoidance

Transferable to new environmentsEasy sim2real

Go down the hallway and take the second right.

One time building / robot setup.

Does not require building set-up.Promising results.

Page 69: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

From robot abilities to navigation capabilities

Go down the hallway and take the second right.

End to end basic navigation behaviors behaviors End to end complex navigation tasks

Hearing

Page 70: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Go down the hallway and take the second right.

Navigation

Perception

Planning

Controls

Slam

Page 71: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Go down the hallway and take the second right.

Navigation

Bas

ic

beha

vior

s

Sim

ple

task

s

Com

plex

Tas

ks

Sight

Hearing

Page 72: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Navigation

Bas

ic

beha

vior

s

Sim

ple

task

s

Com

plex

Tas

ks

Sight

HearingThank you!QUESTIONS?

Page 73: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et

Thank you

Anthony Francis J. Chase Kew

Hao-Tien Chiang

Marek Fiser

Dilek Hakkani-Tur Pararth Shah James Davidson

Ken Oslund Oscar RamirezLydia Tapia