autonomous aerial search and mapping of unknown built

Autonomous Aerial Search and Mapping of Unknown Built

Environments

by

C. P. Charland

B.S., University of Colorado Boulder, 2020

A thesis submitted to the

Faculty of the Graduate School of the

University of Colorado in partial fulfillment

of the requirements for the degree of

Masters of Science

Department of Aerospace Engineering Sciences

2020

Committee Members:

Prof. Nisar Ahmed

Prof. Morteza Lahijanian

Prof. Zachary Sunberg

ii

Charland, C. P. (M.S., Aerospace Engineering Sciences)

Autonomous Aerial Search and Mapping of Unknown Built Environments

Thesis directed by Prof. Nisar Ahmed

In this thesis, an overview of real time decision making processes and observation

frameworks is given. These processes are then used to investigate the simultaneous search

and mapping problem. This problem describes an agent in an unknown indoor environment,

attempting to determine both a map of the environment as well as the pose of targets in this

environment. This work formulates the problem as a partially observable Markov decision

process (POMDP), using a Monte-Carlo tree search based method to solve the POMDP.

To improve the efficiency of the planner, wall projection, A* augmentation, and progressive

widening are employed. Simulation results in both a simple two dimensional world and a

three dimensional realistic environment show this framework is successful at planning in

the problem space. This augmented planner captures the target 9.2% faster than the naıve

planner when operating in a representative environment.

iii

Acknowledgements

I would like to thank Prof. Nisar Ahmed for his understanding, compassion, and

guidance throughout the creation of this document. As well as, the other members of the

committee for their guidance on this project.

iv

Contents

Chapter

1 Introduction 1

1.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background 4

2.1 Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Map Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Problem Formulation and Solution Approach for 2D Environments 7

3.1 Problem Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.1.1 Quadrotor Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 Formal Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3 Approximations to the POMDP . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.3.1 Monte Carlo Tree Search . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.4 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.5 Naıve Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.6 Double Progressive Widening . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.6.1 Determining Tuning Parameters . . . . . . . . . . . . . . . . . . . . . 26

3.6.2 POMCP-DPW Results . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.7 Wall Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

v

3.7.1 The Hough Transform . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.8 A* Rollout Biasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.8.1 Blob Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.9 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4 Extending to Three Dimensions 38

4.1 Mapping in 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.2 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5 Conclusions and Future Work 44

5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Bibliography 46

vi

Tables

Table

2.1 Realsense D435 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3.1 Baseline Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2 Tested Parameter Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.6 Results of Fully augmented planner vs Naıve Planner . . . . . . . . . . . . . 37

4.1 Results of Fully augmented planner vs Naıve Planner in 3D Case . . . . . . . 43

vii

Figures

Figure

2.1 Occupancy Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.1 Testbed Map approximately 50 x 50 meters . . . . . . . . . . . . . . . . . . 8

3.2 Map with Target Shown in Red . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.3 Nonlinear quadrotor dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.4 Resulting positions of a ‘forward’ command . . . . . . . . . . . . . . . . . . 13

3.5 Translate Forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.6 Rotate Right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.7 Observation Reward vs. Number of Observations for a Single Cell . . . . . . 16

3.8 Simulated Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.9 POMCP Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.10 Malformed POMCP Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.11 Typical run of Naive Planner . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.12 Parameter Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.13 Hough Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.14 Result of Hough Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.15 Wall Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.16 Wall Biasing for 2D Indoor Environment. . . . . . . . . . . . . . . . . . . . . 32

3.17 Run of the planner without A* augmentation . . . . . . . . . . . . . . . . . 35

viii

3.18 With A* augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.19 Total map entropy over 30 runs with finite time . . . . . . . . . . . . . . . . 36

3.20 Mean Reward Gain Over Time . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.1 Octomap Volume Breakdown . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.2 Hector Robot in the Simulated Space . . . . . . . . . . . . . . . . . . . . . . 40

4.3 Top Down View of Willow Garage . . . . . . . . . . . . . . . . . . . . . . . . 40

4.4 Target Sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.5 Observed occupied Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.6 3D Planner Reward Accumulation . . . . . . . . . . . . . . . . . . . . . . . . 42

Chapter 1

Introduction

Search and rescue operations often involve entering structures that are not safe, and

in these situations a robotic agent capable of entering the structure, searching for survivors.

A highly maneuverable robot, such as a quadrotor could traverse spaces that a ground rover

could not, while also having a higher vantage point to better view the scene. In the battlefield,

such an agent could enter a space and determine the location of enemy combatants and relay

this information to the war-fighter on the ground prior to entering a dangerous building. In

all these scenarios, having an accurate map of the space prior to entry allows responders to

more safely navigate the structure, potentially keeping them out of harms way.

Commonly robotics problems focus on either the search problem or the mapping

problem. The typical mapping approaches focus on building the best map of the environ-

ment possible, whereas typical search problems work with a known map.. However, planning

frameworks often aren’t built to both search and map at the same time. This thesis inves-

tigates searching and mapping through an unknown environment, as well as methods that

allow a planner in this space to use domain knowledge to plan more efficiently in an online

manner with real sensor data.

1.1 Previous Work

The previous work in this domain can be broken into three main groups: works that

focus on building a map through an unknown environment, work that focuses on search-

2

ing through a known map for a target, and general planning algorithms. Beginning with

mapping, one of the earliest methods to map an unknown environment was investigated by

Yamauchi [19], here the author describes a method of traversing a map towards the bound-

aries between unexplored and explored space. Yamauchi was one of the first papers to discuss

methods of map exploration that is not based in wall following with a simple topological

planner. More modern approaches for map building and navigation through an unknown

space like the methods proposed by Richter and Roy [13] attempt to traverse an unknown

map quickly by predicting collisions based on training data. However, this method does not

allow the agent to take actions to reduce the uncertainty of the states.

In the mapping domain, extensive work has been done, namely in the area of repre-

senting the physical worlds as a discretized set of objects. The major introduction into the

mapping world occurred in the mid 1980s, with Elfes’ sonar based mapping in unknown

environments [5]. This work describes a method of representing the world as a set of “pixel

like” volumes, from which the map is then incrementally built up as the robot receives more

measurements about the environment. The robotics community then shifted towards simul-

taneous localization and mapping problems as a heavy focus, with numerous works published

on the topic, since the problem was first coined by Leonard and Durrant-White in 1991 [8].

From there many works explored mapping as it pertained to the SLAM problem. Moving

into mapping in three dimensions is important as the aerial robots such as quadcopters and

multirotors interact with the environment in three dimensions. Extensions of the ideas of

two dimensional modeling is shown in Hornung’s Octomap 3D mapping framework [7]. This

work describes a method of occupied cubic volumes, whose three dimensional components

of this can be scaled to nearly any resolution dynamically as needed during run-time. The

Octomap framework is used extensively in this thesis for the 3D based simulations of the

quadcopter environment. Leveraging domain knowledge for map building is important in

this work as well. Luperto [9] discusses a method of extracting building layouts from grid

based maps. This method takes advantage of the regularity of the building geometry to build

3

a polygon based map of the environment. This thesis looks at a different novel strategy for

building ”map priors” online for dynamic mapping and motion planning in unknown indoor

environments.

Decision making frameworks are pervasive throughout the literature, and there are

many different methods for solving decision making problems. In this work I will focus

on methods to solve partially observable Markov decision making processes (POMDPs).

Silver et al [15] discuss a Monte Carlo based tree search method online known as partially

observable Monte-Carlo planning (POMCP). Monte Carlo tree search methods are surveyed

at length by Browne [3] mainly focusing on methods of increasing computational efficiency of

tree generation. POMCP is modified in Sunberg [17] to account for the numerical difficulties

in generating plans in large state in observation space problems such as the one investigated

in this work. This thesis will examine techniques for translating techniques like POMCP

into practical online planners for autonomous robots working with realistic sensor data in

unknown environments.

1.2 Thesis Contributions

In this thesis, a simultaneous search and mapping problem is formalized as a POMDP.

This framing of the simultaneous search and mapping problem is the major contributions

of this thesis. As a second thesis contribution, a planning framework will be built to solve

this POMDP online, with real robotic sensing and perception data to support mapping and

target search. Finally, as a third thesis contribution, the performance of this online POMDP

planning framework will be characterized using realistic 2D and 3D environment single target

search and mapping simulations for an autonomous multirotor platform. Several key aug-

mentations to this planner will be discussed as they are needed to improve performance of

the planner online and incorporate domain knowledge. A simulation environment in Gazebo

representing a realistic scenario with realistic sensors was developed the proposed planning

and sensing in this framework.

Chapter 2

Background

2.1 Sensing

The way in which the robot receives information about the environment is a critical

component to the system design. For this problem the main sensing modality is a structured

light depth camera along with a standard RGB camera based on the Intel Realsense D435.

This camera system uses a structured light projector to generate a depth image in infrared.

This image is then registered to the pose of the robot, and the map is updated with the

new depth data. To find the target, an object classifier was run on the RGB image output

of the camera. This provides a detection or non detection of the object in the frame. This

observation gives no information about the location of the object in the frame. Thus the

pose of the object is not directly observable by the robot.

In order to simulate observations in the future given a sampled map and target position,

an observation model of the sensor p(zt|xt) where zt, xt are the observation, and the total

state respectively. To simulate this measurement, the methods described by Mallick [10] are

used here. The major components of this sensor are listed below in Table 2.1.

Parameter ValueField of View 87o (Horizontal)x 58o (Vertical) x 95o (Diag.)

Resolution 848 x 480Frame Rate 90 fps

Table 2.1: Realsense D435 Parameters

5

The variance in a depth measurement can be fit by the following equation described

by Sung Ahn [1],

σz(z, θy) = 0.001063 + 0.0007278z + 0.003949z2 + 0.022z3/2θy

(π/2− θy)2(2.1)

Where θy and z are respectively the distance to the obstacle, and the angle of the ray from the

camera to the sensed object. The mean of the error is 0, and can be modeled as a Gaussian

random variable N (0, σz). This error is then fed into the occupancy grid implementation

discussed in the next section.

Object (i.e. target of interest) detection is not the main focus of this work. However, the

object detection in this work was completed using the OpenCV object detection framework

You Only Look Once (YOLO) [12] trained on the COCO dataset. The detection rate of

this object detector for the ball was set to 65.5% as described in the 2010 paper. the object

detector used in this work (where the object of interest was taken to be a static red ball

of diameter 0.5 meters) The actual rate of detection on image frames fully containing the

object was observed to be empirically higher than this threshold.

2.2 Map Representations

Occupancy grids are probabilistic discrete approximations of a continuous partially

occluded space, where the map is broken into grid cells of finite size.

Figure 2.1: Occupancy Grid

6

Above Fig. 2.1 shows a typical occupancy grid, where the grey cells have not yet been

observed, the white cells are probably unoccupied, and the black cells are probably occupied.

The vision cone shown in the picture (triangle) depicts the robot sensor’s limited field of

view. The belief of the occupancy of any given cell is represented by a binomial distribution.

One of the major assumptions of the occupancy grid representation is independence of the

cells, that is the probability of occupancy for each cell is independent of the occupancy state

for surrounding cells. This assumption and its limitations will be further examined in this

thesis.

The log odds representation of occupancy over the map can be described as shown in

Eq. 2.2, where mi is the grid cell at index i for cell location mi ∈ [0, 1]. Using the log odd

representation of occupancy prevents numerical instability near zero and one.

lt,i = logp(mi|z1:t, x1:t)

1− p(mi|z1:t, x1:t)(2.2)

Algorithm 1 occupancy grid mapping(lt−1,i, xt, zt)

lt−1,i = occupancy probability of cell ixt = robot posezt = measurementfor all cells mi do

if mi is in field of zt thenlt,i = lt−1,i + inverse model(mi, xt, zt)-l0

elselt,i = lt−1,i

end ifend for

As described in Thrun [18] Algorithm 1 updates the occupancy probability of the

grid cell i based on the observed measurement zt written in log odds form. l0 is the prior

probability of occupancy represented in the log odds form. The inverse model mentioned

in Alg. 1 is derived from Eq. 2.2. Note that in the simultaneous search and mapping

problem, it is assumed that the lower/upper bounds for all dimensions of the indoor space

to be traversed are known in advance.

Chapter 3

Problem Formulation and Solution Approach for 2D Environments

3.1 Problem Introduction

Real time planning for the simultaneous mapping and search problem was first in-

vestigated in a two dimensional grid world. The algorithms described in this chapter were

implemented in MATLAB to prototype before being ported to a C++ implementation. This

prototype was used to quickly iterate over the algorithm design and allowed rapid modifi-

cation of the project during development. In chapter 4, extensions of these techniques to a

realistic simulated 3D search space are presented.

The problem was first simplified to a two dimensional grid world example. The problem

domain was reduced to a discrete domain with a point robot. The robot is able to rotate

in 90o increments, and move forward in increments of one or two cell widths. The robot

probabilistically observes the occupancy of the cells in a cone in front of it, as discussed

in Section 2. The area to be explored by the robot is defined by a binary ”ground truth”

occupancy grid, shown in Fig. 3.1 referred to as the ‘map’.

8

Figure 3.1: Testbed Map approximately 50 x 50 meters

The ground truth map contained multiple walls shown in the black cells with the

unoccupied regions shown as the white cells. The map measured 100x100 cells, with each

cell measuring 0.5 meters by 0.5 meters. But, the resolution of the cells in the grid world is

arbitrary and can be scaled to match the scale of the quadrotor. This map was used due to

its long continuous walls, and the multiple rooms that are representative of an office space.

Inside of this space, a single target is introduced. The location of this target is shown

in red on Fig. 3.2 but is unknown to the robot at the beginning of the search mission (the

robot has some prior distribution over the target’s location, given that it knows the extent

of the indoor map boundaries a priori). The target can be anything that the detector is able

to detect reliably; in this case, it is assumed that the target is a red ball.

9

Figure 3.2: Map with Target Shown in Red

3.1.1 Quadrotor Dynamics

In order to capture the configuration of the robot properly, a state vector x was defined

to represent the current position and orientation of the robot inside of the map.

The total quadrotor state vector is described in Eq. 3.1, these terms encapsulate the

current state of the quadrotor. In square brackets ‘[]’ the frame of reference that the term is

defined in is described. This frame of reference is either ‘Earth’ frame meaning fixed to the

world that the robot is operating in described in a North-East-Down reference system, or

in ‘Body’ frame which is described in reference to the quadrotor frame (this reference frame

moves with the quadrotor through the space).

10

x =

φ- roll angle [Earth]

θ - pitch angle [Earth]

ψ - yaw angle [Earth]

p - roll rate [Body]

q - pitch rate [Body]

r - yaw rate [Body]

u - linear velocity [Body]

v - linear velocity [Body]

w - linear velocity [Body]

x - Position [Earth]

y - Position [Earth]

z - Position [Earth]

(3.1)

u = [ft τx τy τz] (3.2)

The non-linear dynamics of the quadrotor are described in [14]. This set of equations

is shown in Fig. 3.3. These show the relationship between the dynamics of the vehicle, and

the physical characteristics, inertia Ix,y,z, and vehicle mass m. These equations allow the

progression of the states through time, and can be approximated to generate trajectories.

For small perturbations we can set [φ θ ψ]T = [p q r]T , which holds for only small

angles of displacement [4].

The quadrotor dynamics were thus simplified to linear dynamics, such that Eq. 3.2 is

linearized about a stable hover to produce the set of dynamics shown in Eq. 3.3 (where u

is the set of nominal control inputs needed to produce a stable hover, with input elements

corresponding to the torques in x, y, z as well as the total force. The set of hover states is

11

Figure 3.3: Nonlinear quadrotor dynamics

12

described in Eq. 3.4

x =

φ = p

θ = q

ψ = r

p = 0

q = 0

r = 0

u = −gθ

v = gφ

w = 0

x = u

y = v

z = w

(3.3)

x = [0 0 0 0 0 0 0 0 0 x y z] (3.4)

u = [mg 0 0 0] (3.5)

The control vector described in Eq. 3.5 shows the force of gravity acting on the robot

must be counteracted in the Z direction. In a stable hover this force, mass times gravity is the

only constant force acting on the robot. In lateral directions, the robot requires no constant

input to maintain a simple hover. hence zeros for these torque terms respectively. As the

robot drifts from this constant hover feedback will be employed to return the quadrotor to

its initial state of a steady hover.

The linearized dynamics were used to compute the transition probabilities of the quad

from one occupancy map grid cell to another. This is then used to form a probabilistic state

transition matrix during the tree generation phase of the online planner to be described

13

later. This approximation to the stochastic dynamics is required due to the computational

complexity of simulating true control inputs on the fly. Rather, the results of many simu-

lations performed offline is sufficient to allow the planner to much more quickly sample the

output of the desired control action (which again in this case are limited to 90 degree turns

and forward translation maneuvers for the point robot).

Additive white Gaussian noise is introduced into the system at the level of accelerations

u, v. This model of process noise is used as an analogue to system disturbances that is not

able to be properly modeled by the first principles equations of the robot’s dynamics alone,

i.e. things such as wind, (e.g. from open doors/windows, hallway drafts, ventilation, or

ground/side wash effects, etc.) control and estimation errors all are captured by this noise

term. The simulated process noise covariance was modeled in discrete time with intensity

matrix Q defined in Eq. 3.6.

Q = diag

∣∣∣∣0 0 0 0 0 0 0.18 0.18 0 0 0 0

∣∣∣∣ [m/sec2] (3.6)

Figure 3.4: Resulting positions of a ‘forward’ command

Fig. 3.4 shows the controller for a ‘forward’ motion shown using the full quadrotor

dynamics. This is the model that is used in the forward planning simulation portion of

this work. There is a lot of uncertainty involved with this motion in this case, there is a

14

96% chance of ending up in the desired grid cell. The figure shows each run of the low

level controller as a green plus, where the low-level controller in this case represents an

implementation of the controller discussed in [2], this controller is the same one that is

implemented in the PX4 open source hardware stack. All runs of this particular controller

started at the black dot. In the true planner this is not the case, but is shown here for

illustrative purposes.

After a ‘forward’ command, the robot returns to a stable hover in the desired cell. Then

the whole process is repeated in the next cycle of the planner. Rotations are completed in

nearly the same process as the transnational motion. The same low level controller is used,

in the rotational case, the injected noise does not directly cause a rotation, therefore the

resulting rotational command ends within 0.5o of the commanded angle in every case.

3.2 Formal Problem Definition

Before describing the planning methods, the problem needs to be formally defined.

First, we recap the objective of the simultaneous mapping and search problem. We have

a robot moving in unknown indoor environment that is looking for a single static object.

The robot’s objectives are to detect and locate the the target as quickly as possible (due to

limited flight time) while also mapping out as much of the environment as possible, taking

care to avoid collisions with obstacles. The robot is assumed to know its pose at all times

(i.e. perfect onboard navigation and localization capability), but can have its movements

disrupted by stochastic disturbances in the environment. The ground truth map is assumed

to stay constant, but is unknown a priori at start of the search problem. The static target’s

location is unknown but can be described by a prior probability distribution (e.g. uniform

over all grid cell locations) which can be updated as the robot gathers new information from

its sensors.

Defining this problem as a POMDP begins by defining the tuple (S,A, T ,R,O,Z, γ),

beginning with the state space S = (m1:n, x, τ). Where m1:n are the binary occupancy states

15

of the map cells, x is the pose of the robot defined as the position in x and y as well as the

yaw angle θ, and τ is the unknown position of the target in x and y (so for the ground truth

map defined in Figure 3.1, there are a total of over 10000 states). A is the set of actions that

the robot can take, where in this problem the set of actions is move forward a meter, move

forward a half meter, turn left 90 degrees, and turn right 90 degrees. While a minimal action

set, if the problem needs to be redefined to have more actions, augmenting the action set

is simple and straightforward. This minimal action set simplifies the computation allowing

the Monte Carlo tree (to be presented later for online planning) to be built faster than for

larger action sets. Figs. 3.5 & 3.6 depict two of the actions available to the agent.

Figure 3.5: Translate Forward Figure 3.6: Rotate Right

T is the probabilistic state transition model p(s’|s,a), taking a state s and action a

to probabilistically select a new state s’. R is the reward function, which this problem is

defined in Eq. 3.7 where robs is the reward gained by observing map grid cells, and rtarget

is the reward for observing the target in the workspace. robs is calculated by determining

the number of times that a particular map cell has been observed (nobs,i) and the reward

is proportional to the inverse of that number plus one. This was chosen to incentivize the

planner to observe a cell multiple times, but as the cell is observed more than a few times

16

the reward quickly goes to near zero as shown in Fig. 3.7.

r(s, zt) = robs + rtarget (3.7)

robs =n∑i=0

1

1 + nobs,i(3.8)

Figure 3.7: Observation Reward vs. Number of Observations for a Single Cell

The observation space O of this problem is the major sticking point as the observation

consists of two components, a detection or no detection of the target in the image frame, and

a set of observed cells from the processed range data ∈ [0, 1]n where n is dependent on the

number of cells in view from the sensor, 0 is observed unoccupied, 1 is observed occupied.

Only cells that are in the view of the sensor are updated by the mapping program. The

sensor observes depth data within 0.3 and 5 meters. A representative observation is shown

in Fig. 3.8 where the teal represents observed cells that are empty, yellow shows occupied

17

cells and dark blue shows unobserved cells. Note that the occupied cells occlude areas behind

them relative to the robot shown in green. With each cell being a state in the 3D case the

robot has an extremely large observation space. Even in the small 2D case shown below

the robot can observe hundreds of cells, each one being a state means the total number of

possible observations is 2n+1 where n is the number of cells observed, and the additional one

is a detect no detect of the target. Observing 100 cells gives a total of 2.5x1030 possible

unique observations.

Figure 3.8: Simulated Observation

p(τocc|o) =p(o|τocc)p(τocc)∑

i p(o|τi)p(τi)(3.9)

Vπ = E[∞∑t=0

γtrt] (3.10)

Z for the POMDP defines the observation model p(o|s, a, s′) based on the equation

defined Eq. 2.1 and the detection rate of the YOLO object detector. The γ term of the

POMDP is the future reward discount factor, in the range [0,1] that reduces the value

of rewards in the future. This tuning knob determines how important future reward is

compared to near reward. By maximising the discounted future reward function a policy π

18

can be generated that maps a belief state to an action. Where the discounted future reward

Eq. 3.10 is the total expected reward from the current time to infinity, where future rewards

values are reduced by the discount factor γ, increasing the importance of rewards gained

earlier in time.

The target belief is updated through a Bayesian belief update for τ given a detection

or non detection of the map grid cells in view of the detector, Eq. 3.9, describes the update

over the visible grid cells i for the pose of the target.

For the simultaneous search and mapping problem, the observation space is so large

that computing the analytical solution to the problem is computationally intractable. Sim-

ulating a new observation is almost always unique, i.e. the probability of sampling a large

space and returning the same value twice is astronomically small for large discrete sets, and

zero in the case of continuous distributions.

3.3 Approximations to the POMDP

Computing a solution to the POMDP on a problem with a large state and large ob-

servation space increases the computation required to find a solution, beyond anything that

is possible in real time. In order to find a solution to the problem in reasonable time, this

thesis investigates a combination of tree-based methods with augmentation that allows the

search space to be significantly reduced. The widely used tree-based method of solving a

POMDP, known as POMCP as described in [15]. This is a Monte Carlo based tree search

method using an unweighted particle filter to represent the belief states in the tree of future

histories a series of possible state and action pairs up to a finite horizon. This tree generates

an explicit approximation of the value function using a particle set at the starting state,

rather than solving for the entire policy over the full belief space, as this is computationally

intractable even offline. POMCP has been used widely in the community for online solvers

to POMDPs, but is challenged by continuous or very large observation spaces.

Monte Carlo based planning uses a simulator as a generative model G of the POMDP.

19

This simulator generates a successor state, observation, and reward for a given state and

action pair. This simulator is used to generate sequences of state, observation, and rewards.

These sequences known as future histories are used to update the value function without

requiring the true dynamics of the underlying system.

The challenge posed by the continuous observation space is that the probability of

sampling the same observation twice from a continuous random variable is 0. Therefore

after the first step the Monte Carlo search tree explodes, and no meaningful conclusion can

be drawn over any planning horizon greater than a single observation step in the future.

This phenomenon can be seen in Fig. 3.10: as the planner generates new future histories,

none of the observations are the same; once the planner takes an action the observation will

not match any of the generated future histories. A properly formed tree of future histories is

shown in Fig. 3.9. Starting from a prior belief b the future histories are generated through

simulated action observation pairs to new beliefs bi

Figure 3.9: POMCP Tree

20

Figure 3.10: Malformed POMCP Tree

3.3.1 Monte Carlo Tree Search

Monte Carlo Tree Search uses a Monte Carlo simulation to evaluate the nodes of the

search tree. In this tree there is a single node for a given history h, containing the value

Q(h), as well as a visitation count of that node N(s). The value of a history V(h) is the

estimated mean return of all simulations that begin at h. Each simulation begins at an

initial state sampled from the belief B(ht). On initialization, the values of Q and N are set

to zero. There are two stages of simulation that are preformed. The first is when there are

children nodes for all children, in this configuration, actions are selected by UCB described

in Eq. 3.11. Actions are selected to maximize this Upper Confidence Bound

UCB(s, a) = Q(s, a) + c

√logN(s)

N(s, a)(3.11)

Where N(s) =∑

aN(s, a) and c is a parameter specific to the problem. The second

stage of simulation actions are selected by a rollout policy πrollout(h, a) such as random

uniform action selection. After a simulation, one node is added to the tree corresponding to

the first new history encountered by the simulation. When the simulation is complete, the

agent selects the action at that has the greatest value. Then receives a real observation ot.

At this point the node corresponding to the history h,at,ot becomes the new root of the tree

and the remainder is pruned.

21

3.4 Algorithms

This section will describe the anatomy of the planner, and the associated algorithms.

The following is the complete algorithms, which will be incrementally discussed through the

following sections piece by piece.

Algorithm 2 Plan(b)

L ← HoughTransform(b)for lines ∈ L do

bias b over Lend forwhile planning time do

s ← sample from bSIMULATE(s, b, dmax)if out of planning time then

return arg max Q(ba)end if

end while

Defining terms used throughout the algorithms above: the starting point is the ‘Plan’

subroutine. First given the current map, representative lines are generated through the

Hough transform and those lines are used to project into the map. A history h is generated

consisting of an observation o, and an action a repeating through (b, a1, o1, ... ak, ok). The

general flow is then this: a tree of future histories is generated until the process is out of

compute time; when new branches of the tree are generated, the Rollout policy periodically

selects the centroid of unexplored space as the target of an A* search. When reward is far

away, this allows the robot to get unstuck. This procedure allows the discovering new areas

when the robot has fully explored a sub area, but not the entire map. The Rollout strategy

described here is a significant factor which contributes to the novelty and success of this

work.

For completeness, h, ha, and hao are different stages of the tree generation, h represents

a history (b,a1, o1,...), where b is the initial belief, ha is the tree with an action selected, hao

is the tree with an action and an observation selected. d is the planning depth, i.e. how

22Algorithm 3 SIMULATE(s,h,d)

if d = 0 thenreturn 0

end ifa ← ActionProgWiden(h)if |C(ha)| ≤ k0N(ha)α0 then

s’,o,r ← G(s,a)C(ha) ← C(ha) ∪ oM(hao) ← M(hao) + 1append s’ to B(hao)if M(hao) = 1 then

total ← r + γRollout(s’,hao,d-1)else

total ← r + γSimulate(s’,hao,d-1)end if

elseo ← select o ∈ C(ha) w.p. M(hao)∑

oM(hao)

s’ ← select s’ ∈ B(hao) w.p. 1|B(hao)|

r ← R(s,a,s’)total ← r + γSimulate(s’,hao,d-1)

end ifN(h) ← N(h) + 1N(ha) ← N(ha) + 1

Q(ha) ← Q(ha) + total−Q(ha)N(ha)

return total

many levels down the tree should be expanded. C is the list of children of a node. N counts

the number of times that particular node has been visited. M counts the number of times a

history has been generated. The list of states associated with a node is denoted B. Finally

Q is the estimated value function at that node. All of these are initialized to 0 or ∅. s

is the state of the system, r is the reward function evaluated at a particular state. The

simulate dynamics function G is a black box simulator, that takes the state and the action

and forward simulates the system in time. This uses the linearized dynamics with controller

described earlier in Chapter 2.

In next sections, various implementations of the POMCP online planner are described,

building up to the most sophisticated implementation that constitutes the novel contribution

23Algorithm 4 ROLLOUT(s,h,d)

n ← random ∈ [0,1]if n ≤ bias rate then

goal ← centroid of unexplored regions’, h, a ← A*(s, goal,h)

elsea ← select from action sets’, o ← G(a,s)

end ifr ← R(s,a,s’)total ← r + γ Rollout(s’,hao,d-1)return total

of this thesis. This implementation uses double progressive widening as well as a novel

technique for generating and incorporating wall priors to account for the issue of continuous

measurements described earlier.

3.5 Naıve Planning

In the fully naıve case, the planner is run without double progressive widening, wall

continuity, and without A* augmentations. This case represents the bare bones planning

scenario. This simple case uses the POMCP framework to generate approximations to the

policy π. In the fully naive case the planner is prone to becoming stuck in small rooms and

other confined spaces. This can be seen in Fig. 3.11, here the robot trajectory shown in red

displays that the robot is unable to find a path out of the small room that it began in, or is

otherwise trapped in tight spaces.

24

Figure 3.11: Typical run of Naive Planner

Median Time to Capture (TTC) 451 secondsCapture Rate 13.3%

Mean Reward per Second 3.4 reward/secMean Total Reward at 500 seconds 1727

Total Reward Variance 63.2

Table 3.1: Baseline Results

The simulation with the naive planner was run 30 times for 500 seconds each run.

The exploration parameter c was set to 80, the tree exploration depth was set to 10, and

the particle set was 300 particles. The 500 second time limit is enforced by the battery

constraints of a typical quadrotor platform. As described in Table 3.6 the fully naive planner

does not capture the target in more than 13% of the run cases. For a target search planner

this rate of capture is not acceptable. The median time to capture for cases where the

25

target was successfully captured also supports this fact. In the cases where the planner did

successfully capture the target, it did so at nearly the end of the full trial run. This is a

major contributing factor to the terrible capture rate. With a mean reward per second of 3.4

the agent is still uncovering new cells and exploring the space. However as the observation

space is so large, the planner is essentially operating as a greedy planner, with a horizon of

one. This prevents the planner from taking information further out from the current pose

of the robot into account. The remedies for this are discussed in the following section.

3.6 Double Progressive Widening

For online path generation, a tree-based approach of approximating the value function

of the POMDP was used. This method incrementally builds a tree of future history based

on the current belief. It only requires a black box simulation of the problem domain, making

it perfect for this problem, as forward simulation of this problem is simple. During the

planning phase of the problem, future histories of the tree are built, then the robot takes the

action that generates the maximal reward. Due to the extreme size of the observation space,

Double progressive widening as described by Sunberg [17] was implemented. This reduces

the branching of the tree significantly allowing the POMDP to be approximated by removing

unlikely observations, and sending more particles down the tree on more likely observations

paths. Fig. 3.9 describes a tree of future histories that is properly formed; this tree has

particles that are evenly distributed by the planner. The branches of the tree that are more

common have more particles than those that are unlikely. Without progressive widening, the

planner would never sample the same observation twice even from the same state. This leads

to a tree that looks like the tree described in fig 3.10, where there are a large number of child

nodes from a single node. Progressive widening artificially limits the number of children

nodes that can be generated to kNα, where k, α are tuning parameters determined by the

problem itself. When the limit of node children is reached, the planner selects a child node

that has already been simulated with probability M(hao)∑oM(hao)

, where M(hao) is the number of

26

times a history has been generated by the model. This selects for observations that are more

common, and increases the number of particles sent through nodes, thereby increasing the

particles on more plausible tracks.

The major procedures for DPW as described in [17] allow the tree expansion to be

reduced, allowing the tree to be formed more properly. Without DPW, the large observation

space will prevent planning beyond a small time horizon, as the probability of receiving a

observation that was simulated is essentially zero. This means the generate tree of future

state histories and observations degenerates after receiving the first actual set of observations

from the robot’s sensors.

3.6.1 Determining Tuning Parameters

As the planner becomes more and more complex, the number of tuning parameters

also increases. For simple POMCP, the major parameter c determines the exploration of the

tree: a larger c increases the likelihood that the planner takes paths that it has not before.

To determine the values of the tuning parameters, the goal is to maximize some statistic over

the planner. In the case of this problem, the total reward was maximized with respect to the

tuning parameters. The planner was run 30 times starting in a location selected at random

over the map, this location was selected randomly to prevent the planner from fitting the

parameters to the specific starting location.

Parameter Values

α 4 5 6 7

K 1/8 1/5 1/2 1

Table 3.2: Tested Parameter Values

27

Figure 3.12: Parameter Tuning

Table 3.2 shows the parameter values used to tune the planner. These parameters are

not the only ones that were tested, but were the ones that were most productive. Higher

and lower values of α and K did not produce better results for this planner. Fig. 3.12

shows the results of the parameter tuning. There is not a large difference between all of

the parameters, but the combination of α0 = 1/5 and K = 5 produced the largest total

reward. This result is an improvement over the fully naive case. This planner still relies on

the agent generating observations that are coherent, and also assumes that all adjacent cells

are completely independent of one another. The next section details how domain knowledge

can be leveraged to improve the observations that are sampled by POMCP.

3.6.2 POMCP-DPW Results

Statistic Baseline Planner POMCP-DPWMedian Time to Capture (TTC) 451 seconds 450 seconds

Capture Rate 13.3% 17.3%Mean Reward per Second 3.4 reward/sec 3.49 reward/sec

Mean Total Reward at 500 seconds 1727 1896.5Total Reward Variance 63.2 62.4

Table 3.3: Results

28

Using Double progressive widening allows the planner to preform better than the base-

line planner. For statistical testing, the null hypothesis is stated to be the planner with

progressive widening is not better than the POMCP planner with respect to total reward

gained over the trial run. With the given n of 30 and the p for the null hypothesis of 0.133,

a Z-test is not appropriate as np ≤ 10, so a standard binomial test is used for the capture

probability. With a significance level of 10% the null hypothesis is rejected. Therefore adding

the double progressive widening does improve the planning capability of the agent.

However, the planner still assumes that there is no relationship between grid cells, this

assumption will be explored and evaluated in the next section.

3.7 Wall Continuity

In order to further reduce the observation space, assumptions about the built environ-

ment are made. In a human made environment, walls have a tendency to continue on. This

prior knowledge of the built environment allows the planner to increase confidence about the

environment through the generated occupancy grid map.

3.7.1 The Hough Transform

Traditionally used for feature extraction during image recognition algorithms, the

Hough Transform is a voting algorithm that takes in an image, and determines the ‘strongest’

line candidates based on the image. In the simplest of cases, the Hough Transform is used to

determine line segments in an image. It has been extensively modified to extract other fea-

tures such as polygons. However, for this work, the Hough Transform will be used to extract

lines that represent walls of the building. As humans have a tendency to build environments

with straight walls, and ninety degree corners, the Hough Transform is an ideal candidate

to determine where walls are in a map. The Hough Transform uses an array known as the

‘accumulator’ to characterize lines as described in Eq. 3.12, where r is the minimum distance

from the origin to the line, and θ is the angle between the x-axis and the line running from

29

the origin to the closest point as shown in Fig. 3.13. The algorithm calculates parameters

for (r, θ) in the neighborhood of a pixel. If there is evidence that a straight line exists at

that pixel, the bin for the set of parameters is incremented. As the end of a run the highest

values of the parameter space are used to determine the lines.

r = x cos(θ) + y sin(θ) (3.12)

Figure 3.13: Hough Values

Using the Hough transform, the representative walls can be extracted from the 2D

map. These segments can then be used to project the walls out into the unexplored regions

of space. These projections can be used to bias the prior in these unexplored regions. Since

the walls tend to continue into this unexplored region, the bias on the prior allows the

simulated observations to more closely match the unexplored part of the map. The result of

a Hough transform on a completed map can be seen in Fig. 3.14, where the results of the

Hough transform are shown as green line segments with red x’s on either end. These line

segments closely follow the underlying walls of the map that the transform was run on.

30

Figure 3.14: Result of Hough Transform

While the Hough transform gives the representative line segments of walls on the map,

it would be incorrect to assume that the walls continue on into the unknown regions of space

indefinitely. So, as the wall segment is extrapolated into the unknown areas of the map, the

strength of the biasing is reduced the further from the endpoints the cell is. The biasing can

be seen in as the dull yellow in Fig 3.16. The Hough transform gives the representative lines

along with their endpoints.

31

Figure 3.15: Wall Projection

Wall biasing for a simple hallway can be seen in Fig. 3.15, which depicts a robot as a

circle facing North. This robot can sense the walls shown in black. From this information, it

projects the walls into the neighboring cells. Shown as the lightening gray tones. The lighter

the tone, the less impact the observed wall has on the map. The blue shows the unobserved

space, that the robot has no information about, and the white space is observed unoccupied.

Eq. 3.13 describes the amount that the map is biased away from the terminating points of

the representative line segments. This equation implicitly assumes that biased cells should

only exist along the same axis as the representative line segment. These cells are the ones

that would remain occupied by the wall assuming that it continues along that same path.

βmax is the bias factor that describes how much the projection biases the cells that are along

the same axis as the wall segment. The cell bias at cell mi with prior l0 is biased based

on the inverse of the distance from the endpoint of the wall to the centroid of the cell in

32

question.

bias(mi) = l0 + βmax1√

(xm,i − xend)2 + (ym,i − yend)2(3.13)

This method has the benefit of resolving some of the dependency between cells that is

not accounted for in the occupancy grid representation of the map.

Figure 3.16: Wall Biasing for 2D Indoor Environment.

The biasing of the map can easily be seen in Fig. 3.16, where the robot shown in red,

the observations in dark blue and the unknown space in teal. The robot sees a small section

of wall in the map, and the planner extends that wall segment into the unknown regions

of the map where it thinks the wall is likely to be. Here multiple line segments are shown

overlapping spaces that there are likely to be walls, even though the robot has not yet seen

what is in those spaces.

33Statistic Baseline Planner POMCP-DPW Wall Projection

Median Time to Capture (TTC) 451 seconds 450 seconds 438 secondsCapture Rate 13.3% 17.3% 19.6 %

Mean Reward per Second 3.4 reward/sec 3.49 reward/sec 3.92 reward/secMean Total Reward at 500 sec 1727 1896.5 1960.4

Total Reward Variance 63.2 62.4 ,60.2

Table 3.4: Results

The wall projection continues to improve upon the performance of the planner, the

major component of this performance increase is due to more of the Monte Carlo planning

tree being maintained between action selections, as well as fewer possibilities in the observa-

tion space being considered. Again binomial hypothesis testing was used to determine the

significance of the improvement in the capture rate, with a significance level of 10% again.

Therefore wall projection is an effective method of improving the planner beyond the limits

of progressive widening.

This leaves one issue left to address, where the robot still occasionally runs into, the

robot becomes stuck in confined spaces. The next section discusses a remedy this problem.

3.8 A* Rollout Biasing

Since the planner only works on a finite horizon, as the map becomes more and more

explored, there is a larger possibility of the planner becoming trapped in a region of explored

space. In this explored space the reward is far enough away from the current position of the

robot that it does not effectively ‘see’ the reward far away. To fix this problem, the rollout

step of POMCP is biased with cycles of A* planning towards centroids of unexplored space.

A* based search is accomplished through the methods discussed in [6], where the planner

maintains each grid cell as a node in the A* search graph, and the transitions between cells

are defined by the most probable transition for a given control input. However the A* planner

requires both a goal, and a target destination, with the map defining the transitions between

cells, so the planner must determine what the location of the target is. In order to find

34

this destination in the map space the planner employs another image processing technique,

namely blob detection.

3.8.1 Blob Detection

The blob detection algorithm takes in the physical map and returns the centroids of

‘blobs’ or continuous regions of unoccupied cells. It does this by thresholding the entropy

of the cells of the map then determining the Laplacian of the Gaussian of the map in (x,y)

frame. The result of this operation is segments of the map that contain continuous regions of

unexplored space. The centroid of the largest of these regions is then easily computed. The

A* planner is run from the current location of the robot towards this centroid and included

in the POMCP tree as a branch. When the robot reaches the point of needing this biasing,

the cells between the robot and these frontiers of unexplored regions of the map have been

observed enough that the robot is certain of the occupancy of the cells between it and the

frontiers.

Statistic Baseline POMCP-DPW Wall Proj A* biasingTime to Capture (sec) 451 450 438 405

Capture Rate 13.3% 17.3% 19.6 % 20.1%Mean Reward per Second 3.4 3.49 3.92 4.07

Mean Total Reward at 500 sec 1727 1896.5 1960.4 2037.8Total Reward Variance 63.2 62.4 60.2 61.5

Table 3.5: Results

The statistics for these augmented planners are collated in Table 3.5. Again the hy-

pothesis is tested through a standard binomial test with a 10% significance level. The null

hypothesis that the augmented planner is not able to gather more total reward is rejected.

Therefore augmentation of the planner does increase the performance of the planner at

gathering more total reward over these trials than the non-augmented planner.

35

3.9 Results

The simulation was run for 500 time steps, to simulate the battery based time con-

straint. The action set is to be expanded in future work on this problem to allow the robot

to cover much more ground than the MATLAB implementation would show.

Figure 3.17: Run of the planner without A* augmentation

Fig. 3.17 shows again a typical run of the planner without A* augmentation, DPW

or wall projection, it is very easy for the planner to become stuck in an area where it can

no longer ‘see’ the reward due to computation constraints. This is shown by the robot, the

red trace, staying in the top left corner of space. Stated another way, the tree the planner

generates is no longer deep enough for it to gather reward from the far away regions. Instead

it becomes trapped in the small room.

By including the A* augmentation, the planner is able to see the reward outside of the

room, and leaves the room for the unexplored regions of space. This is seen in a typical run

of the planner with A* shown above in Fig. 3.18 where the robot leaves the room for the

unexplored regions of space.

36

Figure 3.18: With A* augmentation

Figure 3.19: Total map entropy over 30 runs with finite time

The differences between the unaugmented planner and the augmented planner can

further be seen by examining the total map entropy over time, as shown in Fig. 3.19. The

algorithm without A* often becomes trapped, and the total map entropy begins to stall, as

the robot is trapped in a room and unable to leave. On the other hand, the A* augmented

robot leaves the room allowing to reduce to total map entropy much more before hitting the

time limit. Total map entropy can be thought of as the certainty of the robot in the map

that is has generated. The lower the entropy, the better the robot believes it knows the map.

37

Figure 3.20: Mean Reward Gain Over Time

The mean reward gained by fully augmented planner is compared to the naıve case in

Fig. 3.20, as the robot traverses the space, where the reward gain per time step is shown

in blue for the totally unmodified planner and red for the fully augmented case. The fully

augmented case maintains a higher reward per second than the non augmented planner. This

is expected as the modifications to the planner have been shown to increase the exploration

of the robot as it traverses the space. Table 3.6 shows a numerical breakdown of the results

for these runs. Note that the numbers here are from the same runs that are described by

Fig. 3.20.

Statistic Baseline Planner Augmented PlannerMedian Time to Capture (TTC) 451 seconds 405 seconds

Capture Rate 13.3 % 20.1 %Mean Reward per Second 3.4 reward/sec 4.07 reward/sec

Mean Total Reward at 500 sec 1727.0 2037.8Total Reward Variance 63.2 61.5

Table 3.6: Results of Fully augmented planner vs Naıve Planner

Chapter 4

Extending to Three Dimensions

After using the two dimensional case to test the planner in a lower dimensional space,

the code was transferred to a three dimensional simulator. For the move to three dimensions

the Robot Operating System (ROS) was used along with the Gazebo simulation environment.

Gazebo is a general physics simulator that is natively shipped with ROS. This allows the

simulation of the sensors for the system as well as the response to control inputs in the system.

The mapping solution must also be migrated from two dimensions to three dimensions with

Octomap. These changes also necessitated a migration from the Matlab based simulator to

a C++ based computation for the speed. These changes were also needed because ROS uses

C++ bindings and does not support Matlab.

4.1 Mapping in 3D

The two dimensional grid is augmented to include a third dimension. For this problem

the third dimension is segmented into 3 discrete levels. Each separated from another by 1

meter. From above the map is segmented the same way as the two dimensional case, broken

into a grid of squares.

The Octomap representation [7] for a 3D occupancy grid segments the space into equal

volumes, which as in the two dimensional case, maintain a Binomial distribution about

whether the volume is occupied or free. As with the two dimensional case, the observations

update the occupancy of the volume that is visible from the perspective of the simulated

39

Figure 4.1: Octomap Volume Breakdown

robot sensor. Fig. 4.1 shows the way that the Octomap representation of the space can

operate at differing resolutions depending on what is requested by the user. In this way the

Octomap can be resolved to differing dimension to increase computational speed or fidelity.

In future iterations this project could leverage this to quickly build paths then refine them

to be more accurate with finer resolution that requires heavy computation.

Again similarly to the two dimensional case the occupancy probability can be described

by Eq. 2.2, where the log odds of occupancy lt,i is defined. The log odds is used here as well,

due to not having convergence issues when approaching zero and one.

Once the Octomap is built, the different vertical levels of the map are used to generate

a projected 2D grid map that the robot planner can use for online planning at a given

altitude. The Octomap implementation for ROS easily allows for this altitude based slicing,

making it possible to use the 2D grid world POMCP planner described in Chapter 3 along

with the various augmentation elements.

4.2 System Architecture

The system architecture is based off the Hector Gazebo ROS package [11], which con-

tains an open source quadrotor simulator. The Hector quadrotor is seen in Fig. 4.2. This

package handles the generation of the quadrotor dynamics in the Gazebo simulationenvi-

ronment, which for this thesis was selected to be the built-in ”Willow Garage” world. This

40

Figure 4.2: Hector Robot in the Simulated Space

world simulates an indoor space, with straight planar walls, and ninety degree corners.

Figure 4.3: Top Down View of Willow Garage

Fig 4.3 shows the layout of the Willow Garage world. This environment contains both

large open spaces, as well as long corridors. This world is representative of a more or less

standard built environment.

In this environment the target is represented by a sphere with a 0.5 meter radius as

seen in Fig. 4.4. The robot is assumed to be able to discern the target from the environment

with 90% accuracy as long as it is in the frame of the camera.

41

Figure 4.4: Target Sphere

Figure 4.5: Observed occupied Space

As the robot traverses the space the map that is built is seen in Fig. 4.5. This figure

also shows the two dimensional projection that occurs for the planner to operate in a lower

dimensional space. By operating in a squashed representation of the space, the computational

requirement for the planner is lowered. While this allows faster computation of plans, it also

precludes the robot from using map features such as open windows to traverse the space.

42

4.3 Results

As in the two dimensional case the robot was run in this environment for a span of time

equivalent to the typical battery life expected, which is around 500 seconds of flight for this

configuration. The preliminary results are shown below in Table 4.1. These results reflect

a run of 500 seconds and 25 separate trails, that all start at the same location on the map.

The median time to capture the target, the capture rate, mean reward per second, mean

total reward, total reward variance were recorded during the 25 trials each of the augmented

and unaugmented planners. These results can also be seen in Fig. 4.6 as the mean reward

gained per second is described there.

Figure 4.6: 3D Planner Reward Accumulation

The time to capture the target is highly dependent on the location of the target relative

to the starting pose of the robot: if the target is placed close to the starting location of the

robot it is much more likely to find it faster, whereas if the target is well hidden in the map,

the ability of the robot to find it is greatly diminished. As shown in the two dimensional

43

case, the fully augmented planner preforms slightly better than the naıve planner. The

augmentation allows the robot to find the target around three times as often as the naıve

planner for this configuration of the space. However, these results may not generalize to any

space or configuration. More work is required to determine the limits of where this planner

outperforms the baseline, and where tweaking may be required to make the augmentation

preform better.

Statistic Baseline Planner Augmented Planner

Median Time to Capture (TTC) 306 seconds 280 seconds

Capture Rate 8 % 24 %

Mean Reward per Second 2.3 reward/sec 4.5 reward/sec

Mean Total Reward at 500 sec 1164.7 2254.7

Total Reward Variance 170.5 130.5

Table 4.1: Results of Fully augmented planner vs Naıve Planner in 3D Case

Concluding the three dimensional work would require more tuning of the parameters,

as well as trialing differing environments in the Gazebo simulator. However, the preliminary

results show here are promising that this method of planner augmentation could be used

to plan for actual live vehicles in POMDP style problems. Augmentation of the planner

shows promise as a method to incorporate domain information into the planning process,

and leverage this information to make a better planner than a simple naıve planner in the

same space.

Chapter 5

Conclusions and Future Work

5.1 Conclusions

This thesis presented an approach to facilitate the simultaneous search and map of an

unknown structured environment. This environment was constrained to be a built indoor

setting, allowing the robot to use prior information about the way that humans build the

world around them to generate a better model of future observations.The problem was first

formally cast into the POMDP planning framework, allowing the simultaneous mapping and

search problem to be tackled in a unified way using computational methods that account for

planning and sensing under uncertainty at the same time. A reward function for the problem

was developed that enabled the robot to both explore/map the unknown space and search for

an unknown static target as efficiently as possible to minimize search time. Approximations

to the analytically intractable problem of finding POMDP policies were then formulated

using the POMCP Monte Carlo Tree Search algorithm for online planning. This formulation

is particularly relevant for realistic robotic platforms like aerial vehicles, which have limited

flight time and fields of view, as well as limited computational abilities.

By leveraging prior environment information, the simulated robot was able to more

accurately predict whatactual sensor observations may be obtained for online planning. This

biasing allows the extremely high dimensionality of the observation space to be sampled in

a way that is more representative of the ground truth. The Hough transform allows this

information to be extracted from the map in the form of straight lines. These lines are

45

then expanded in both directions biasing the prior of the map in unexplored regions. This

modification increases the computational efficiency of the tree generation by around 10%.

Double progressive widening of the tree of future history for the POMDP prevents the

tree from becoming sparse, where each branch of the tree is generated by only one particle.

This bush structure contains no statistically significant information about the distribution of

the map. Double progressive widening allows particles to follow branches that have already

been generated. With the inclusion of double progressive widening and the biasing, the

algorithm is able to select observations that are more likely to occur.

Transferring this information to the three dimensional space allows the algorithm to

be tested in an environment that is more representative of the operating space, namely the

Willow Garage environment which is available through the Gazebo simulator. In this three

dimensional simulator, the planner was implemented to work with Octomap and a modified

implementation of the planner for operatingn in three dimensions. Here the augmented

POMCP planner still out-preformed the naıve planner for the specific configuration that was

presented, showing promising proof of concept results for implementation on a realistic robot

platform operating in an indoor environment with actual sensing and perception hardware.

5.2 Future Work

In future versions of this work, implementing this algorithm into a real hardware envi-

ronment should be investigated. By implementing the algorithm on a real robot, questions

about the suitability of this process to be used in real life situations could be answered along

with how well the system noise was modeled during the algorithm development. Further

augmentation of the planner could be preformed to adapt to other types of semi-structured

environments, where underlying patterns to the way the area is constructed can be further

exploited to enhance online planning under uncertainty.

Bibliography

[1] Min Sung Ahn, Hosik Chae, Donghun Noh, Hyunwoo Nam, and Dennis Hong. Analysisand noise modeling of the intel realsense d435 for mobile robots. 2019 16th InternationalConference on Ubiquitous Robots (UR), pages 707–711, 2019.

[2] Dario Brescianini, Markus Hehn, and Raffaello D’Andrea. Nonlinear quadrocopter at-titude control: Technical report. Technical report, ETH Zurich, 2013.

[3] Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowl-ing, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, andSimon Colton. A survey of monte carlo tree search methods. IEEE Transactions onComputational Intelligence and AI in games, 4(1):1–43, 2012.

[4] A. Das, F. Lewis, and K. Subbarao. Dynamic inversion with zero-dynamics stabilisationfor quadrotor control. IET Control Theory Applications, 3(3):303–314, Jan 2009.

[5] A. Elfes. Sonar-based real-world mapping and navigation. IEEE Journal on Roboticsand Automation, 3(3):249–265, 1987.

[6] P. E. Hart, N. J. Nilsson, and B. Raphael. A formal basis for the heuristic determina-tion of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics,4(2):100–107, 1968.

[7] Armin Hornung, Kai M Wurm, Maren Bennewitz, Cyrill Stachniss, and Wolfram Bur-gard. Octomap: An efficient probabilistic 3d mapping framework based on octrees.Autonomous robots, 34(3):189–206, 2013.

[8] J. J. Leonard and H. F. Durrant-Whyte. Mobile robot localization by tracking geometricbeacons. IEEE Transactions on Robotics and Automation, 7(3):376–382, 1991.

[9] Matteo Luperto and Francesco Amigoni. Extracting structure of buildings using layoutreconstruction. In Marcus Strand, Rudiger Dillmann, Emanuele Menegatti, and Ste-fano Ghidoni, editors, Intelligent Autonomous Systems 15, pages 652–667, Cham, 2019.Springer International Publishing.

[10] Tanwi Mallick, Partha Pratim Das, and Arun Kumar Majumdar. Characterizations ofnoise in kinect depth images: A review. IEEE Sensors journal, 14(6):1731–1740, 2014.

47

[11] Johannes Meyer, Alexander Sendobry, Stefan Kohlbrecher, Uwe Klingauf, and OskarVon Stryk. Comprehensive simulation of quadrotor uavs using ros and gazebo. volume7628, pages 400–411, 11 2012.

[12] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once:Unified, real-time object detection. In Proceedings of the IEEE conference on computervision and pattern recognition, pages 779–788, 2016.

[13] Charles Richter, William Vega-Brown, and Nicholas Roy. Bayesian Learning for SafeHigh-Speed Navigation in Unknown Environments, pages 325–341. Springer Interna-tional Publishing, Cham, 2018.

[14] Francesco Sabatino. Quadrotor control: modeling, nonlinear control design, andsimulation. PhD thesis, 2015, KTH School of Electrical Engineering (EES), StockholmSweden.

[15] David Silver and Joel Veness. Monte-carlo planning in large pomdps. In J. D. Lafferty,C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, editors, Advances inNeural Information Processing Systems 23, pages 2164–2172. Curran Associates, Inc.,2010.

[16] Adhiraj Somani, Nan Ye, David Hsu, and Wee Sun Lee. Despot: Online pomdp planningwith regularization. In Advances in neural information processing systems, pages 1772–1780, 2013.

[17] Zachary Sunberg and Mykel J. Kochenderfer. POMCPOW: an online algorithm forpomdps with continuous state, action, and observation spaces. CoRR, abs/1709.06196,2017.

[18] Sebastian Thrun. Probabilistic robotics. Communications of the ACM, 45(3):52–57,2002.

[19] Brian Yamauchi. Frontier-based exploration using multiple robots. In Proceedings ofthe second international conference on Autonomous agents, pages 47–53, 1998.

autonomous aerial search and mapping of unknown built

Documents