a feature-based solution for kidnapped robot problem
TRANSCRIPT
A Feature-based Solution for Kidnapped Robot Problem
by
Lei Wei
A thesis submitted to the Graduate Faculty of
Auburn University
in partial fulfillment of the
requirements for the Degree of
Master of Science
Auburn, Alabama
December 12, 2015
Keywords: Kidnapped robot problem, Robot localization, Feature matching, SIFT
Copyright 2015 by Lei Wei
Approved by
Thaddeus Roppel, Chair, Associate Professor of Electrical and Computer Engineering
John Hung, Professor of Electrical and Computer Engineering
R. Mark Nelms, Professor of Electrical and Computer Engineering
ii
Abstract
The Kidnapped Robot problem in robotics commonly refers to a situation where an
autonomous robot can’t locate itself against the map. It can be caused by external force when a
robot is carried to an arbitrary place, or experiencing a wake-up sequence, etc.
In this thesis, a solution is proposed and implemented using feature matching to solve the
problem in an efficient and robust way. First a flood coverage algorithm is proposed to mark the
building with limited number of fiducial markers and environmental features. Then a match-
then-vote method is implemented based on ORB (Oriented FAST and Rotated BRIEF) and SIFT
matching algorithms to recognize the place. The robot’s position can then be estimated by its
distance and angle to the marker. After calibration by an AMCL (Adaptive Monte-Carlo
Localization) node in ROS (Robot Operating System) using feature matching, the position
message will be published in the robot operating system to locate the robot in a map.
After tested in Auburn University Broun Hall, the solution proves to be a functional method in
kidnapped recovery with acceptable error and speed: The average error of the distance
measurement is below 1%. The average error in position estimation is about 0.31m while the
angle error in the pose estimation is less than 3.5 degrees. And the recovery process with marker
in sight cost less than 1.5 second.
iii
Acknowledgments
I would first and foremost like to thank my advisor Dr. Thaddeus Roppel for his support
and guidance during my time at Auburn University. Dr. Roppel gave me the opportunity to be
involved into the cooperative robotics lab in EE department during my graduate studying in
Auburn, and bring me in touch with a lot resources during my research. I also want to thank
Dr.Nelms and Dr.Hung for serving on my thesis committee.
I also want to thank my parents, who trust me, support me and love me all these years.
And finally, I want to thank all my friends, for their words and their time listening my boring
ideas.
iv
Table of Contents
Abstract ...................................................................................................................................... ii
Acknowledgments ..................................................................................................................... iii
List of Figures .......................................................................................................................... vii
List of Tables............................................................................................................................. ix
List of Abbreviations.................................................................................................................. x
Chapter 1: Introduction .............................................................................................................. 1
1.1 Goals ............................................................................................................................. 2
1.2 Motivation .................................................................................................................... 2
Chapter 2 Literature Review ...................................................................................................... 4
2.1 Autonomous Robot and Kidnapped Robot Problem ..................................................... 4
2.2 Robot Operating System (ROS) .................................................................................... 6
2.3 Computer Vision in Feature Detection and Matching .................................................. 7
2.3.1 Scale-Invariant Feature Transform (SIFT)............................................................. 7
2.3.2 Oriented FAST and Rotated BRIEF (ORB)......................................................... 10
2.3.3 Feature Matching ................................................................................................. 11
v
2.3.4 OpenCV (Open Source Computer Vision) .......................................................... 12
2.4 Hardware Specification ............................................................................................... 12
2.4.1 Hokuyo LIDAR .................................................................................................... 12
2.4.2 Logitech C920 Camera ........................................................................................ 13
Chapter 3 Solution Implementation ......................................................................................... 14
3.1 Mark the Building ....................................................................................................... 16
3.1.1 Find Out the FOV and Angle Limitation of the Robot’s View............................ 19
3.1.2 Pick a Start Point of the Map and Start Marking Sequence ................................. 20
3.1.3 Form and Maintain a Feature Library .................................................................. 25
3.1.4 Mark the Rooms If Needed .................................................................................. 26
3.2 Place Recognition........................................................................................................ 26
3.3 Position Estimation ..................................................................................................... 30
3.3.1 Distance Estimation ............................................................................................. 30
3.3.2 When the Place Is a Hallway ............................................................................... 34
3.3.3 When the Place Is a Room ................................................................................... 35
3.4 Error Prediction ........................................................................................................... 36
3.4.1 Error in Place Recognition ................................................................................... 36
3.4.2 Error in Position Estimation ................................................................................. 36
Chapter 4 Experiment Setup and Results ................................................................................. 38
vi
4.1 Marking the Map ......................................................................................................... 39
4.2 The Experiment Recovery System Implementation ................................................... 43
4.3 The Recovery Result and Analyzation ........................................................................ 46
4.3.1 Random Recovery Test ........................................................................................ 46
4.3.2 Experiment for Error Observation ....................................................................... 48
4.3.3 The Speed of the Recovery Program ................................................................... 50
Chapter 5 Conclusion and future work .................................................................................... 51
5.1 Summary and Recommendation ................................................................................. 51
5.2 Future Work ................................................................................................................ 52
References ................................................................................................................................ 53
vii
List of Figures
Figure 1. Hallway Kidnapped Robot Example ...................................................................... 5
Figure 2. DOG Computing Process [22] ................................................................................ 8
Figure 3. FAST corner detection [25] ................................................................................. 10
Figure 4. Hallway Map Example ........................................................................................ 14
Figure 5. Fiducial Marker Example .................................................................................... 16
Figure 6. Hallway Marking Example .................................................................................. 20
Figure 7. Hallway Marking Flood ....................................................................................... 21
Figure 8. Hallway Marker ................................................................................................... 22
Figure 9. Main Marking Sequence Flow Chart ................................................................... 24
Figure 10. Camera projection model y-z plane ..................................................................... 31
Figure 11 Camera projection model x-z plain...................................................................... 32
Figure 12. Hallway Position Illustration ............................................................................... 34
Figure 13. Room Position Illustration ................................................................................... 35
Figure 14. Bivariate Normal Distribution ............................................................................. 37
Figure 15. Broun Hall Third Floor West Wing Test Map..................................................... 38
viii
Figure 16. Trigger Marker Used In Experiment ................................................................... 39
Figure 17. FOV Test Matching .............................................................................................. 40
Figure 18. Place Candidates in Map ..................................................................................... 41
Figure 19. Message Flow in Recovery System ..................................................................... 44
Figure 20. Terminal Window of the Recovery Node ............................................................ 45
Figure 21. Recovery Result Showing in RVIZ ..................................................................... 46
Figure 22 Recovery Speed ................................................................................................... 50
ix
List of Tables
Table 1. Feature Library .......................................................................................................... 25
Table 2. Voting after object 1 ................................................................................................. 28
Table 3. Voting after object 2 ................................................................................................. 29
Table 4 Voting after object 3 ................................................................................................. 29
Table 5. Voting after object 4 ................................................................................................. 29
Table 6. Voting After object 5 ................................................................................................ 30
Table 7. Feature Lib after Place A Added............................................................................... 41
Table 8. Feature Lib after Place B, D Added .......................................................................... 42
Table 9. Distance Test for Candidate E and H ........................................................................ 42
Table 11 Random Recovery Test 1 .......................................................................................... 47
Table 13 Recovery test 3 result ................................................................................................ 49
x
List of Abbreviations
ROS Robot Operating System
AMCL Adaptive Monte-Carlo Localization
CRRLAB Cooperative Robotics Research Lab
FOV Field of View
LIDAR Light Detection and Ranging
RVIZ Robot Visualization Tool
SLAM Simultaneous Location and Mapping
RFID Radio frequency identification
DOG Difference of Gaussian function
SIFT Scale-invariant feature transform
SURF Speeded-Up Robust Features
FAST Features from Accelerated Segment Test
BRIEF Binary Robust Independent Elementary Features
ORB Oriented FAST and Rotated BRIEF
FLANN Fast Library for Approximate Nearest Neighbors
1
Chapter 1: Introduction
Although the term ‘Robot’ was first coined by Asimov in his science fiction in 1941, the idea
of automata can be traced back to many ancient civilizations, like ancient China and Ptolemaic
Egypt [1]. An account in Lie Zi, a 3rd century text book, describes a humanoid automata. The
automata built by Yan Shi, was presented to the king with a life-size, human-shaped figure, and
filled with artificial organs made by leather and wood. [2] With years of development, robot is
now a big family with many branches, range from commercial manufacture to medical agent
inside a human body [3].
Today, many of the robots are equipped with legs, wheels or propeller so they can move in
different terrain or space to finish their task. When a robot is capable of moving, the first
problem emerges will be the localization problem. To gain accurate knowledge of robot’s
position within its operating environment, many solutions have been proposed based on human
interaction, computer vision, RFID or LIDAR feature. Many of them are combing with a
probabilistic model [4] [5], which is using the recent acquired information and the previous robot
state information to estimate its current position. This kind of model needs to keep tracking the
robot past positions to give more precise estimation. What if a robot experiences a power issue or
is moved by external force, and can’t locate itself? This is when the kidnapped problems comes
up.
2
1.1 Goals
The work presented in this thesis aims to develop and implement a feature-based solution to
recover the autonomous robot’s location when it has been kidnapped. The system, implemented
in ROS, is mainly based on feature detection using a single camera and LIDAR. The solution
also proposes a method to mark building hallway and rooms, and maintain a library of objects
for detection.
1.2 Motivation
Released in 2009, ROS is a flexible framework for robot operating packages. Since it is
open source and compatible with multiple platforms, it has been widely used around the world
[6]. Its developers and users have formed a community in which many program packages are
produced and shared every day [7]. One of the most frequently used programs is ROS navigation
stack. The navigation stack provided by Eitan Marder-Eppstein is a simple and efficient solution
for 2D navigation [8]. However, when the program has been started up and given a task, it still
needs an input of initial position to localize the robot against a map. This means a robot system
with navigation stack on board is still a human-in-the-loop system.
Here, the gap between autonomous and human-in-the-loop control has been referred to as the
kidnapped robot problem [9] [14]. Solutions proposed based on computer vision can be separated
into two branches.
The first group of solutions use computer vision combined with markers for place recognition
and localization, like in [10], [11] and [12]. In those solutions based on fiducial markers, the
robot usually needs three markers to locate itself against the map [13]. These algorithms not only
3
require multiple pictures of robot’s surroundings (at least two or more), but also need a large set
of fiducial markers. Both of them will reduce the recovery speed performance.
The other branch of solutions is based on visual words, like K-means clustering algorithms
combined with a vocabulary tree in [20]. These solutions will need a large set of training pictures
to achieve a high accuracy.
The solution proposed in this thesis is an attempt to combine those two branches together.
Instead of a large set of fiducial markers, very limited number of markers combined with a set of
training objects will be used to mark the place. Instead of computing with multiple pictures, the
algorithm will finish recognition with only one picture. And a voting algorithm based on fuzzy
logic similar to vocabulary tree in [20] will be used to increase the speed performance.
The remainder of this thesis is organized in the following manner: Chapter 2 provides an
overview of the field of autonomous robot and computer vision with focuses on kidnapped robot
problem and object matching. Chapter 3 gives a specific implementation of the algorithm,
followed by Chapter 4, which shows the experiment design and result to demonstrate the
solution. Chapter 5 presents the conclusion, suggestions and future work.
4
Chapter 2 Literature Review
2.1 Autonomous Robot and Kidnapped Robot Problem
Autonomous robot is an intelligent machine equipped with computational resources,
capable of performing tasks in the world without human interaction [16]. Nyagudi summarized
the following rules for what an autonomous robot could do in [17]:
1. Gain information about the environment.
2. Work for an extended period without human intervention.
3. Move itself throughout its operating environment without human assistance.
4. Avoid situations that are harmful to people, property, or itself unless those are part of its
design specifications.
Kidnapped problems in robotics commonly refer to a situation when an autonomous robot
in action lost its location information. It can be caused by external force when a robot is carried
to an arbitrary place, or experiencing a wake-up sequence, etc. No meter what situation it is,
when a robot is kidnapped, it means the localization algorithm it uses cannot give its accurate
position information in current environment, thus the task assigned to it will be forced to stop.
Not all the localization algorithms could solve the kidnapped problems properly. For
example, the localization method used in ROS navigation stack will sometimes fail a kidnapped
recovery because of its own defect. The algorithm uses a LIDAR feature matching along with a
probabilistic model to localize the robot against a 2-D map. To demonstrate the kidnapped
5
situation, let us consider the map shown in Figure 1, which is the Auburn University Broun Hall
third floor hallway map.
Assuming the robot is working autonomously at position A with a circle indicating its LIDAR
detection range. Then it is kidnapped to B. We can see with the information gathered from
LIDAR, the system could report a matching with several places in the map. Tested in ROS
navigation stack, the kidnapped situation will trigger a recovery sequence, which means the
robot will rotate to gather more LIDAR information in position B. The recovery proves to be a
failure every time, simply because the LIDAR feature input at position B will never be enough to
recognize the place, nor will the former position estimations in the probabilistic model be of any
help.
In the situation above, the kidnapped problem could be solved by adding a program package,
to hang up the former task and send an explore command. So when it moves to position C or
Figure 1. Hallway Kidnapped Robot Example
6
position D, the feature matching will give a more influential matching report, and then recover
its position by a probabilistic model. But this solution could add a huge latency when the
exploration process interrupts the given task of the robot for too long.
There are several other localization algorithms that are capable of solving the kidnapped robot
problem, for example a semantic localization method proposed by Chuho Yi and Byung-Uk in
[18]. feature based SLAM in [19], or computer vision combined with a vocabulary tree in [20].
These localization algorithms above are not primarily designed to solve the kidnapped robot
problem, but after the kidnapped situation occurs, these solutions are capable of recovering the
robot’s position in the map.
2.2 Robot Operating System (ROS)
Robot Operating System, aka ROS, is a collection of software frameworks for robot software
development. It was first implemented in Stanford AI Lab, to support Stanford AI Robot STAIR
project [21]. After that, it was developed and maintained by Willow Garage, a robot research
institute. Now, ROS has become an open source platform provide standard operating system
services, such as hardware abstraction, low level device control, application implementation,
message passing between multiple level processes, and file management, etc.
In this thesis, the solution will be implemented into a ROS node, the most common and basic
level of application in ROS system. When running, this node could communicate and transfer
data with other nodes by ROS messages. The message will be published with a standard format
under a certain topic, so other node could subscribe to this topic and read this message.
Two main nodes will be used in this solution, one is the AMCL node, provided by ROS
navigation stack. The function of this node is to localize the robot against a given map, using
7
LIDAR feature matching and an adaptive (or KLD-sampling) Monte Carlo localization
approach. Another one is based on python language, using computer vision for place recognition
and distance measurement.
2.3 Computer Vision in Feature Detection and Matching
2.3.1 Scale-Invariant Feature Transform (SIFT)
Scale-Invariant Feature Transform was proposed by D. Lowe from University of British
Columbia in 1999 [24]. It is used to detect and describe local features in images.
Image features are usually the corners or blobs in the image. Such features are widely used in
object detection, place recognition, etc. In SIFT, for any object in the image, interesting points
can be extracted as a feature description of the object. Then the feature description can be used to
detect and locate the object in another test image which contains many other objects.
Basically there are four steps for a SIFT sequence.
1. Scale-space extrema detection
This stage of the filtering process attempts to identify locations and scales that can be
repeatably assigned under different views of the same object [22]. This can be achieved using a
scale-space function based on Gaussian Function shown as below:
𝐿(𝑥, 𝑦, 𝜎) = 𝐺(𝑥, 𝑦, 𝜎) ∗ 𝐼(𝑥, 𝑦)
In the function above, ∗ is the convolution operator, 𝐺(𝑥, 𝑦, 𝜎) is a variable-scale Gaussian
function and 𝐼(𝑥, 𝑦) is the input image.
Then a Difference-of-Gaussian function can be described as below:
𝐷(𝑥, 𝑦, 𝜎) = (𝐺(𝑥, 𝑦, 𝑘𝜎) − 𝐺(𝑥, 𝑦, 𝜎)) ∗ 𝐼(𝑥, 𝑦) = 𝐿(𝑥, 𝑦, 𝑘𝜎) − 𝐿(𝑥, 𝑦, 𝜎)
8
As shown in figure2 [23], each DOG is computed as the difference between two scaled
images. So the initial image is repeatedly convolved with Gaussians to produce image in the left,
and adjacent Gaussian images are subtracted to produce DOG images in the right.
Once the DOG images are produced, local extrema will be extracted by comparing each pixel
with its 8 neighbors in the same scale and 9 neighbor pixels in the up and down scale images. If
the value is maximum or minimum, it is an extrema.
2. Keypoint Localization
Due to the poorly defined peak in DOG function, it will introduce a strong response to long
edges. And some of the edges detected are aligned by locations which can be poorly determined
and unstable to noises [22]. Thus the candidate points extracted in step one will be dumped if
they are poorly localized on an edge. For the same reason, points have low contrast will also
been rejected.
For each point (𝑥, 𝑦, 𝜎) we got in step 1, �̂� is defined and calculated as:
�̂� = −𝜕2𝐷−1
𝜕𝑥2
𝜕𝐷
𝜕𝑥
Figure 2. DOG Computing Process [22]
9
If �̂� is below a certain threshold, as 0.03 as in [23], the keypoint will be rejected for low
contrast.
For edge-located keypoints, a principal curvature will be computed at its location and scale
using a 2x2 Hessian matrix. Since we already know the eigenvalue of Hessian matrix are
proportional to principal curvatures of D, we can use the magnitude ratio between the largest and
a smaller one to setup a criterion. If the ratio is below a certain threshold, the keypoint will be
rejected.
3. Orientation Assignment
Orientation will be assigned to each keypoint in this step to make them invariable to image
rotations. So for each keypoint, gradient magnitude 𝑚(𝑥, 𝑦) and orientation 𝜃(𝑥, 𝑦) will be
calculated using pixel differences [22]:
𝑚(𝑥, 𝑦) = √(𝐿(𝑥 + 1, 𝑦) − 𝐿(𝑥 − 1, 𝑦))2 + (𝐿(𝑥, 𝑦 + 1) − 𝐿(𝑥, 𝑦 − 1))2
𝜃(𝑥, 𝑦) = tan−1((𝐿(𝑥, 𝑦 + 1) − 𝐿(𝑥, 𝑦 − 1))/(𝐿(𝑥 + 1, 𝑦) − 𝐿(𝑥 − 1, 𝑦)))
An orientation histogram will be formed from the gradient orientations of sample points
within a region around the keypoint. Then it use the highest peak of the histogram with any other
local peak within 80% of the height of this peak to create a keypoint with that orientation.
4. Keypoint descriptor
Now keypoint descriptors are created, a 16 by16 neighborhood around keypoint is taken. It is
divided into 16 sub-blocks with 4 by 4 size. For each block, 8 bin orientation histogram is
created. So total 128bin are available for each keypoint. It is represented as a vector to form
keypoint descriptor.
After the keypoints descriptor generated in two or more images, several matching algorithms
could be applied to locate object or place of interest.
10
2.3.2 Oriented FAST and Rotated BRIEF (ORB)
The ORB algorithm was first brought up by Ethan Rublee, Vincent Rabaud, Kurt Konolige
and Gary R. Bradski in their paper ‘ORB: An efficient alternative to SIFT or SURF’ in 2011. It
is basically a fusion of FAST keypoint detector and BRIEF descriptor with many modifications
to enhance the performance [24].
FAST is short for Features from Accelerated Segment Test. Developed by Edward Rosten and
Tom Drummond, the algorithm is used for high-speed corner detection. Consider the following
image with pixels enlarged:
For each pixel P, we calculate its intensity as 𝐼𝑝. A threshold for intensity will be chosen as 𝑡
to test four pixels: 1, 5, 9 and 13. If more than two of their intensity values is all larger than
( 𝐼𝑝 + 𝑡) or all smaller than( 𝐼𝑝 − 𝑡), then test the 16 pixel set around pixel P. if there exist a set
with number of N pixels, whose intensity values all larger or smaller than the threshold compare
to 𝐼𝑝, then P is a corner. [25]
Figure 3. FAST corner detection [25]
11
After corner keypoint detection in FAST, orientation value will be add to each point in ORB
algorithm. It computes the intensity weighted centroid of the patch with located corner at center.
The direction of the vector from this corner point to centroid gives the orientation.[24]
Then ORB uses a BRIEF descriptor for the keypoints. Binary Robust Independent Elementary
Features, BRIEF, is proposed by Michael Calonder, Vincent Lepetit, Christoph Strecha, and
Pascal Fua, as a more robust way to modify the feature descriptors to gain better CPU
performance [26]. In ORB, for any feature set with N binary tests at pixel(𝑥𝑝,𝑦𝑝), a matrix with
size ( 2 ∗ 𝑁) will be defined. The matrix will contain the coordinates of the pixel set
around(𝑥𝑝,𝑦𝑝). Combined with orientation found before, the matrix will be rotated by a rotation
matrix to from a new matrix, which will be used as ORB’s descriptor.
For performance, according to the author in [25], ORB is much faster than SIFT and SURF
and ORB’s descriptor works better than SURF descriptor.
2.3.3 Feature Matching
There are several matching algorithms which can be used to match two data set. In this thesis,
two of them are used.
One is Brute-Force Matcher. Its principle is quite sample; it takes the descriptor from one data
set and compares with all other feature descriptors in the second set to see if there is a similar
match. Distance will be calculated to evaluate the similarity between two descriptors.
The other one is FLANN based matcher. FLANN is short for Fast Library for Approximate
Nearest Neighbors [27]. Basically it perform a nearest neighbor search between two data sets
[28]. When it comes to a large data set, FLANN based matcher could be a lot faster than Brute-
Force Matcher [29].
12
2.3.4 OpenCV (Open Source Computer Vision)
OpenCV is a programming library mainly aimed at real-time computer vision processing [30].
It was originally developed by Intel research center in Russia, and now is maintained by Itseez.
In the early days of OpenCV, the goals of this project were described as [31]:
1. Advance vision research by providing not only open-source but also optimized code for
basic vision infrastructure.
2. Disseminate vision knowledge by providing a common infrastructure that developers
could build on, so that code would be more readable and transferable.
3. Advance vision-based commercial applications by making portable, performance-
optimized code available for free.
Now the library has been released for several versions with modifications, and earned users all
around the world. In this thesis, most of the image processing work will be based on OpenCV
library.
2.4 Hardware Specification
2.4.1 Hokuyo LIDAR
The Hokuyo URG-04lx-UGO1 is a dedicated laser range finder (LIDAR) used in many robot
applications and with an accuracy +/-30mm at a 5.6 m range. The sensor have approximately
240-degree field of view, but here in this thesis, when mounting on our lab’s turtlebot, only front
180 degree will be used to gather range information for feature matching.
13
2.4.2 Logitech C920 Camera
Logitech C920 is a HD web camera with maximum definition of 1920*1080 pixels. Its
focal length is 3.67mm, with sensor at 3.60mm in height and 4.80mm in width. The camera is
used in this thesis to gather image information for place recognition and robot localization.
14
Chapter 3 Solution Implementation
As we could see in Figure 1, when using solutions based on LIDAR feature matching and
probabilistic model, kidnapped recovery process may introduce a huge latency or even fail. The
latency is caused not only by the matching process, but also by an exploring operation. The
exploring operation, which is used to choose one most possible position among many possible
candidates, sometime may not work even after it explored the whole map. Consider the
following hallway map, where the gray area will be the rooms or wall, and white area is the
hallway:
In this map, if we use the solution above, the exploring process will always end up giving a
twin candidates share the same possibility, making it impossible to autonomously recover its
location. We all know that in the modern building layout, a symmetry room or hallway design is
Figure 4. Hallway Map Example
15
widely used. This means the possibility of accruing a huge latency or recovery failure not only
exist, but also could be very high.
To solve the robot kidnapped problem more efficiently, a solution should meet the
following requirement:
1. First, the solution should be able to recover the position and direction information of the
robot. And the information should accurate enough for a later task, like navigation through
a map to another location.
2. Secondly, a solution should introduce as little latency as possible. So the algorithm should
only use a short period of exploring time or none. Since the robot may still have a pending
task to finish, it shouldn’t spend a long time and distance in location recovery.
3. For a feature-based solution, the feature library used for matching should keep a small size
and easy-maintained quality.
4. No solution will give a position with 100% percent accuracy, so the solution should be
able to calculate the confidence value for its position estimation. Later in use, if the
confidence is lower than a certain threshold, it can dump the result and set up another
recovery process or report a failure.
5. A good solution should be transferable, which means it is not for a unique map. Once been
slightly modified, it can be applied to another building or indoor environment.
To meet the requirement above, a feature based solution is implemented in the following
sections.
This solution for robot kidnapped problem mainly contains two parts: place recognition, and
position estimation. To recognize the place more efficiently, first, a limited number of fiducial
markers will be used to mark the robot’s operating area. Then, the robot can collect the place
16
information by recognizing the fiducial markers combined with the environmental objects
around the marker. After this, a voting procedure is introduced to find out which place the robot
is located and the confidence of the recognition result. Finally, a position estimation will be done
using the marker and the LIDAR feature obtained.
In this thesis, the solution implementation is separated into three steps. So when applied to a
certain building, a specific design could be formed step by step according to these steps.
3.1 Mark the Building
Since most buildings’ indoor layout are using symmetric design, and these indoor places are
sharing many similarities, adding fiducial markers will make a place more distinguishable, thus
speeding up the recognize process.
Figure 5. Fiducial Marker Example
17
But if we use a unique marker for each place the robot may operating at, it will be a very large
marker library to use. To reduce the size and complexity of the marker library, the solution tries
to follow the criterion below:
1. If the kidnapped robot does not need to move, it will take one or several pictures of the
surrounding. In this scenario, the robot should see at least one marker in the pictures taken.
2. The fiducial marker combined with the objects nearby should be able to form a unique set
for each place in the map.
So in this solution, the idea of a trigger marker will be introduced. As its name indicates, a
trigger marker is used to initiate a recovery sequence. According to the criterion 1 above, the
kidnapped robot may not take only one picture, instead it will take several pictures to find a
marked place to begin the matching algorithm. So instead of matching every marker and object
in every picture, checking only one or two triggers will save a huge amount of time. Usually the
same trigger can be used in places sharing the same attribute, for example, places in the same
floor, places are all rooms or all in the hallway, etc. So triggers can also be used to break the
place candidates set down into several branches, with each branch using its own interesting
objects library.
Another concept introduced here is the distance between two sets of interesting objects. As we
know, for places inside a building, it is very hard to assign each place with a unique object set
with unique elements which can’t be found in any other place. The fact is that it is very likely to
find several places sharing a similar object set, but different combinations, or only one or two
object in their set are unique to other sets. To compare the similarity of two interesting object
sets, a distance function is defined. Here, we use 𝑂𝐴 and 𝑂𝐵 for two object sets in place A and B.
18
𝑁𝑆𝐴𝐵 is the number of elements sharing between 𝑂𝐴 and 𝑂𝐵 , 𝑁𝐴 and 𝑁𝐵 is the number of
elements in 𝑂𝐴 and 𝑂𝐵. So the distance between 𝑂𝐴 and 𝑂𝐵 is defined as:
𝐷(𝑂𝐴, 𝑂𝐵) = 1 −𝑁𝑆𝐴𝐵
𝑁𝐴 + 𝑁𝐵 − 𝑁𝑆𝐴𝐵
It can be calculated that when(𝑁𝑆𝐴𝐵 = 0), which means place A and B share no interesting
object, 𝐷(𝑂𝐴, 𝑂𝐵) = 1. When A and B are holding the same object set, the distance will be zero.
This calculation not only gives the similarity reference for two set, but also scale it into [0,1],
which will be easier to use later as performance reference for a feature library.
Another parameter introduced in this solution is confidence value. It is used to estimate the
accuracy of a SIFT matching result. When FLANN based matcher in OpenCV is used, it will
return a matched descriptor set with distance ratio for each pair of descriptors. Then a filter will
be applied to dump a pair of descriptors if its distance ratio is less than a threshold. A value of
0.7 is recommended in OpenCV. After filtered, if the number of the good matches is larger than
a certain number, a positive match result will be reported. Otherwise, a match failure result will
be given by the program.
In the program, the number of well-matched descriptors and their distance ratio can be scaled
into a confidence number. The lower the confidence is, the higher the possibility of a wrong
match result it will be. Assume the distance ratio for each pair of descriptors is 𝐷𝑟𝑚𝑎𝑡𝑐ℎ with
total matched pair number 𝑁𝑚, the total keypoints number in the quarry image, for example, a
fiducial marker, is 𝑁𝑞. The confidence of a successful match is defined as:
𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒𝑚𝑎𝑡𝑐ℎ = (𝑟 + (1 − 𝑟) ∗∑ 𝐷𝑟𝑚𝑎𝑡𝑐ℎ
𝑁𝑚1
𝑁𝑞) ∗ 100%
The confidence of a failed match is defined as:
19
𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒𝑚𝑎𝑡𝑐ℎ = (𝑟 + (1 − 𝑟) ∗ (1 −∑ 𝐷𝑟𝑚𝑎𝑡𝑐ℎ
𝑁𝑚1
𝑁𝑞)) ∗ 100%
𝑟 is a scale parameter, in this thesis, 𝑟 is 0.8.
In this chapter, an algorithm to mark the building hallway will be implemented first, then it
could be slightly modified to mark the rooms. With the use of the triggers and distance, the
following steps can be taken to mark a hallway map:
3.1.1 Find Out the FOV and Angle Limitation of the Robot’s View
In this solution, the FOV is a range with boundaries as the longest and smallest distance at
where the robot’s camera could recognize the trigger marker and another objects. The distance
could be determined by an object matching test, which involves setting up the camera, and
matching a sample set of the objects and markers. If all the SIFT matching result could pass the
confidence threshold, for example 85%, then the distance will be inside the FOV range. Then try
to increase the distance, repeat the matching, until we have a distance long enough to apply in
step 3.1.2, or the confidence is below the threshold. It will give us the upper boundary of the
FOV range. Similarly, the lower boundary of the range can be found by decreasing the distance
from camera to objects set.
To increase the FOV range, we can try to enlarge the size of the trigger marker. Since an ORB
algorithm will be used in matching triggers, a bigger-sized trigger can increase the ORB’s
performance. But the size of the marker should not be too big. Since we can’t increase the
object’s size, an oversized trigger can't give more confidence on object matching. Also it can be
seen in the following section that a bigger trigger may introduce a larger distance error.
20
The angle limitation of the view is also required since the marker is not always facing directly
to the camera. It means the angle formed by the marker plain and image plain when robot is
facing directly to the marker direction. If the space indoor is too big, and the angle to too big, for
example close to 90 degrees, the robot will not be able to recognize the marker no matter how
many photos are taken.
3.1.2 Pick a Start Point of the Map and Start Marking Sequence
In this step, first we need to choose a start point, then a flood coverage sequence will be
applied to pick place candidates around the map, mark the places and form a feature library.
Here, the hallway example shown in Figure 4 will be used for demonstration. Each corner,
which is formed by two walls, can be marked as below in Figure 6. Here, we use 𝐶𝐴𝐵 for the
corner at place 1, as another four corners marked as 𝐶𝐶𝐷,𝐶𝐸𝐹 and 𝐶𝐺𝐻 . Similarly the hallway
could be marked as 𝐻𝐼𝐽,𝐻𝐴𝐹 etc. As for the starting point, the corner of the map, like place 1 or 2
is recommend. Because if a start point is chosen in the middle of the hall way, like place 3, the
‘flood’ will spread to multiple directions, and increase the complexity of calculation.
Figure 6. Hallway Marking Example
21
Each white pixel in the map, is a possible position for the robot. So we perform the flood
sequence by changing every pixel in the white area with the RGB value of (0,0,0) into black
pixel with RGB value of (255,255,255). The sequence to change all the pixels from black to
white is like a flood map coverage, so it is called flood marking sequence. Thus when there is no
pixel in white area, we will know the marking work has covered all possible positions of the
robot in map.
In Figure 6, it can be seen that each corner or hallway will propose two place candidate. For
instance, in corner AB, a marker can be put either on A or B. Now, let’s start the flood sequence.
Note that in the map, the gray area means the unknown places, while black lines mark the
boundary of the hallway. The flooding procedure is shown in Figure 7.
When we first start the flood sequence in map shown in Figure 6 at position 1, we can
simply choose a place nearby inside the FOV. Here we choose A and put a trigger marker. Then,
we choose the objects beside the trigger, which is used to form a set of interesting objects for
place A.
Then, for every new pixel been flooded through, we do the following test. Firstly, within the
FOV and angle criterion, does there already exist a place with a trigger marker? If the answer is
yes, then move on to the next position. If the answer is no, then we check the hallway in which
the robot is located. As in the Figure 6, we begin in position 1, the first set of flooded positions
Figure 7. Hallway Marking Flood Sequence
22
should be able to locate themselves at hallway 𝐻𝐵𝐶 and 𝐻𝐴𝐹, the hallway will help to locate four
candidate places, A,B,C and F.
Consider position a, b, c and d in Figure 8, let’s mark locate them one by one. For position b,
check the FOV first. Because it is too close to A and B, assume it can’t recognize the marker in
place A, and also won't recognize the possible marker in B. So we are left with place candidates
C and F. Then we consider the FOV again, is place F covered in FOV when the robot is at
position b? If not, candidate F will be dumped. The same test will be done on place C. If both
answers are yes, then an interesting object set will be formed for C and F. Then the candidate set,
which is 𝑂𝐶 and 𝑂𝐹, will be used to calculate Hausdorff distance to every set already existing in
the library. In this case, it will be set 𝑂𝐴 . For each distance calculated, it will be first compared
to a threshold. If the distance is below the threshold, the new candidate set will be rejected. If
more than one candidate set passes the threshold test, the total distance (dissimilarity) will be
calculated as:
Figure 8. Hallway Marker Assignment
23
𝑇𝑜𝑡𝑎𝑙𝐷𝑖𝑛𝑠𝑡𝑎𝑛𝑐𝑒𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 = ∑ 𝐷(𝑂𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 , 𝑂𝑙𝑖𝑏𝑟𝑎𝑟𝑦)
𝑆𝑖𝑧𝑒(𝑙𝑖𝑏𝑟𝑎𝑟𝑦)
1
After the calculation, candidate holing the least total distance value will be chosen as a new
member in the feature library.
In the same way, the place is chosen for position a, c and d. One situation that may happen is
that the FOV of a robot is so limited, that it can’t detect any hallway end. For example, in
position c, if a robot’s FOV cannot reach place A and F, the algorithm will fail. In this situation,
the first thing to do is trying to increase the FOV. This could be done by using a camera with
higher resolution or enlarging the trigger’s size. If the FOV still can't reach any hallway end, we
can hang the trigger image under the ceiling to solve the problem.
When feature library comes into a large scale, another situation may occur that for one
position, all its place candidates can’t pass the distance threshold test. This could also happen
when each candidate for a place holds an exactly same object set as another set in library. In this
case, a fiducial marker will be added to each candidate place. The distance test will be performed
all over again too choose the most suitable candidate. If the threshold still can't be reached, we
could add a second marker, a third, and so on.
During the process above, more than one fiducial marker may be added to the library to satisfy
the distance test. When this happens, every next time we need to add a fiducial marker, we first
pick from the markers which are already been used before. Only if they all fail the distance test
will we add a new fiducial marker to the library. This principle will help to reduce the total
fiducial marker used.
The main marking sequence could also be described as a flow chart shown in Figure 9.
24
No
No
Yes All
failed?
No
Yes
Start Flooding Sequence
Next Possible
Position
Apply FOV,
Angle Limit
Marker
Existed?
Pick Place
Candidates
Distance Threshold Test
Add a Fiducial Marker
Total Distance Test
Pick Candidate, Add Into Library
All
positions
covered?
Finished Marking Sequence
Yes
Figure 9. Main Marking Sequence Flow Chart
25
Another fact that needs to be mentioned is that different start points for flooding sequence may
give different libraries. Because different places may be chosen, different marker allocation may
be used. Here, the performance of a library can be evaluated by calculating average distance
between each pair of place sets in it.
3.1.3 Form and Maintain a Feature Library
For each place newly added to the library, it will be numbered, as well as the objects in the
new set. So a following table could be formed to represent the library. ‘1’ means the object is
included in the set of a place, while ‘0’ means not included. The column shows that for each
place, which object is included in its set. And the row could tell us for each object, which place it
belongs to.
Place A Place C Place B Place F Place G Place I
Trigger 1 1 1 1 1 1 1
Object 1 1 1 0 0 0 1
Object 2 1 0 1 1 0 1
Object 3 0 1 1 0 1 0
Object 4 0 0 1 0 1 1
Object 5 0 0 0 1 0 1
Table 1 shows one possible marking result for the right wing of the hallway in Figure 6. Using
places as columns and object as rows, the computer could simply keep the library as a matrix. To
maintain a library in operation, for example, if a new object is added into one or several existed
Table 1. Feature Library
26
set, we can add a new row in the matrix. Similarly each time we add a new place, a new column
will be added into the matrix.
3.1.4 Mark the Rooms If Needed
The steps to mark the rooms will be similar to the steps to mark the hallway. Choose one room
to start, or use room number to mark them one by one. For a room with small size and non-
blocked space, only one trigger is needed. If a room is too big to be reached by the robot FOV,
place candidates will be picked similar like the hallway, and proceed with the distance test.
Also, a feature library will be formed. Here, if the trigger used in the room is the same as the
hallway, we can add these two libraries together into one matrix. If not, it is recommended to
store them in different matrices to improve the query performance later.
A major difference for marking the rooms is for every trigger, another marker will be assigned
beside them with a certain distance. This marker will be used for every room and will not change
with the triggers, meaning all the room could share a same marker. The latter marker will not be
in the library. The marker is added for position estimation, which will be illustrated in Section
3.3.3.
3.2 Place Recognition
Once we finished marking the map, the feature library matrix can be imported into the
recovery program in a robot’s computer, and the kidnapped robot recovery process can start.
When the robot is kidnapped or powered-on, it will start a recovery program. First a picture
will be taken by the onboard camera. The picture will be matched with the trigger marker using
ORB detector. If there is no match report, the robot will rotate a certain angle to take another
27
picture. The procedure will be repeated until there is a match report. Although the ORB detector
is not good at precision compared with SIFT, it can still provide a reliable match report with
speed several times faster than SIFT.
Once the trigger is recognized, the robot will know that the place it is looking at is one of the
places in the feature library. When several triggers are used, the robot will also know which
feature library the place belongs to. Then the next step to take is to decide which place it is. To
achieve this, robot needs to match the picture with objects in the feature library. Since every
place in the library holds a unique combination of objects, a unique result should be reported
after a full library check. Due to the precision quality of the SIFT detector, it is used in the
feature library matching. So when a trigger marker is detected, the algorithm will take the object
one by one from the library to match with the picture using SIFT. SIFT matching will report two
kinds of result, either a successful match along with a confidence ratio, or a match failure with a
confidence ratio. But SIFT is much slower than the ORB algorithm. If we run the test for the
whole feature library, it will introduce a huge amount of latency, especially when we have a
large number of place candidates. To speed up the process, a voting process based on fuzzy logic
will be performed right after every object match test.
First let’s consider a voting with result in true or false. When an object is in the picture, we
vote for true. It means each place holding a set including this object will gain one vote, while
each place without this object will lose one vote. The vote cannot be less than zero. When the
object is not in the picture, any place holding a set with this object inside will lose one vote. Note
that no set will gain a positive vote in this situation, because it will reduce the accuracy of the
estimation later.
28
But when SIFT algorithm reports a match, it won’t be a 100% match or match failure. Instead
a confidence number in percentage will be reported, telling us the probability of the object
existing or not existing in the picture. Here, we use 𝑅𝑐 to represent the ratio of confidence. Since
we know 0 ≤ 𝑅𝑐 ≤ 1 ,instead of voting 1 or 0, we can split the vote into 𝑅𝑐 and (1 − 𝑅𝑐). For
each object that reports a match with confidence of 𝑅𝑐, we assign 𝑅𝑐 v for true, and (1 − 𝑅𝑐) for
false. Thus a place with this object will receive (2 ∗ 𝑅𝑐 − 1) vote in total, and a place without
this object will receive (−𝑅𝑐) vote. Similarly, for the failure report with 𝑅𝑐 , place with the object
will receive (1 − 2 ∗ 𝑅𝑐) vote, while place without the object will receive (𝑅𝑐 − 1) vote.
After each object matching, we will update the vote for each place, and then calculate the
weighting number for each place’s vote in the total vote. If one place got a leading vote, and it is
the only one in the leading position, we will break the matching loop and report a successful
recognition. A threshold test can also be introduced here. It is an optional test, when applied to
the voting, the winning place not only need to have a leading vote, but also a weighting above
threshold. The threshold can force the matching algorithm to match for more object than needed,
thus to increase the accuracy of the recognition.
Here, let’s demonstrate the voting process using feature library in Table 1. Assume the robot is
kidnapped in the hallway showed in Figure 8, at position c, facing place A. The confidence ratio
for each detection is 0.9.
After the test for object 1, program will report a match, and voting result is shown in Table 2:
Place A Place C Place B Place F Place G Place I
vote 0.8 0.8 0 0 0 0.8
weighting 0.3333 0.3333 0 0 0 0.3333
Table 2. Voting after object 1
29
After the test for object 2, the program will report a match, with result in Table 3:
Place A Place C Place B Place F Place G Place I
vote 1.6 0 0.8 0.8 0 1.6
weighting 0.3333 0 0.1667 0.1667 0 0.3333
After the test for object 3, program will report a match failure, with result in Table 4:
Place A Place C Place B Place F Place G Place I
vote 1.5 0 0 0.7 0 1.5
weighting 0.4054 0 0 0.1892 0 0.4054
After the test for object 4, program will report a match failure, with result in Table 5:
Place A Place C Place B Place F Place G Place I
vote 1.4 0 0 0.6 0 0.7
weighting 0.5185 0 0 0.2222 0 0.0.2593
After this round of voting, we can see place A holds a leading vote weighting number as
0.5185. So we can call an end to the matching process now. In this case, the voting process will
reduce the time cost by skipping the matching operation on the object 5. However, if the program
do the match for more objects, it will increase the accuracy of the voting result. This can be done
by setting a weighting threshold larger than 0.5185. So the algorithm will take object 5 for
match.
After the test for object 5, program will report a match failure, with result in Table 6:
Table 3. Voting after object 2
Table 4 Voting after object 3
Table 5. Voting after object 4
30
Place A Place C Place B Place F Place G Place I
vote 1.3 0 0 0 0 0
weighting 1 0 0 0 0 0
After this round, the weighting number is 1, which is above the threshold for sure, we can
break the further possible loop and report a recognition.
However, the possibility may exist that with all object matched, the leading weighting number
is still below the threshold. This means the threshold setting has a flaw. This could be solved
either by using a lower threshold, or by adding more markers to increase the distance between
two similar sets.
3.3 Position Estimation
Once a robot has recognized the place it is facing, the next step is to get the position
information against a known map. Here, the distance from the robot to the trigger marker will be
calculated first. Then the position of the robot will be estimated by one of the two algorithms
depending on which kind of place it is located: a room, or a hallway.
3.3.1 Distance Estimation
In a sample camera projection model shown in Figure 10, distance 𝑑 is the distance between
fiducial marker and camera projection center along z axis.
Table 6. Voting After object 5
31
In the model above, the camera’s y axis and fiducial marker is always perpendicular to the
ground, and the fiducial marker’s center is located at y-z plain. Assume the height of the fiducial
marker is 𝑙𝑓, and it is 𝑙𝑖 when projected into the image plain. When the focal distance is f, we
have:
𝑑
𝑓 =
𝑙𝑓
𝑙𝑖
But the marker’s center is not always in the y-z plain of the camera coordinate. When it is not,
as shown in Figure 11 at camera’s x-z plain, the projection triangles in Figure 10 will become a
line with 𝑑 and 𝑓’. Assume the line will form an angle of 𝜕. It is not hard to see that 𝑑 can be
calculated as below:
𝑑 = 𝑙𝑓
𝑙𝑖∗
𝑓
cos 𝜕
Image
Plain
𝑧
𝑦
ℎ
Figure 10. Camera projection model y-z plane
32
Since we know that 𝑙𝑓 is proportional to the pixel number 𝑃𝑓𝑖𝑑 in the fiducial marker image
file, and 𝑙𝑖 is proportional to the pixel number 𝑃𝑐𝑎𝑚 of the marker in camera image, the distance
can be calculated as
𝑑 = 𝑟 ∗𝑃𝑓𝑖𝑑
𝑃𝑐𝑎𝑚 ∗ cos 𝜕
In the equation above, 𝑟 is a parameter that can be estimated by experiment. So we don't have
to know the camera technical parameter when camera is changed. For 𝜕 estimation, we can use
the Taylor series. In figure 11, we can see 𝜕 is not proportional to the marker-to-image-center
offset, but tan 𝜕 does. For a camera with limited FOV, for example less than [−30,30] degrees,
it is accurate enough to estimate 𝜕 in [0,30] or [−30,0] using two stage Taylor series. The
parameter of the Taylor series can be calculated by sample points in camera image with 𝜕
number manually measured.
When applied to the computer program, SIFT detector will be used to find a pair of matched
points between the fiducial marker and the camera image, then calculate their coordinate
Figure 11 Camera projection model x-z plain
33
difference along camera’s y axis to solve the distance. To pick a good pair of matched point,
here, two filters will be used.
The first filter is used to reduce the error of distance estimation. During the estimation, error is
introduced mainly from two sources. The first one is the estimation of 𝑙𝑖. 𝑙𝑖 is calculated by the
number of pixels. Since the number is a discrete integer value, 𝑙𝑖 will always come with a certain
error. However, the error can be reduced by increasing the number of total pixels along y axis. It
means to pick a pair of the matched keypoints with the biggest pixel distance along y axis. The
second source of the error is caused by the perspective transformation. A pair of points reported
may have a difference along x coordinate of the image plain and the marker plain. So when the
points have been transformed from maker plane into image plane, this difference will cause a
change in their distance along image y axis. Assume the lower point is in the height of h in
Figure 10. Then the z axis forms an acute angle with the marker plane as 𝜕, and the distance of
the points in the fiducial marker in image x coordinate is ∆𝑥. According to [32], the ratio of the
error to length 𝑙𝑓 as 𝑟𝑒𝑟𝑟𝑜𝑟 can be given in:
𝑟𝑒𝑟𝑟𝑜𝑟 = ℎ∆𝑥 cos 𝜕
𝑑𝑙𝑓
So to reduce this error, we can lower the position of the fiducial marker and choose the points
will smaller ∆𝑥. Usually for each fiducial marker, there will be about 3 to 16 points reported as
matches. So for each pair of the points coordinates, we set up a criterion as 𝐶, calculated as:
𝐶 = ∆𝑥
𝑙𝑖2
The pair of points with the smallest 𝐶 value will be chosen to estimate the distance.
34
The second filter is used to eliminate a wrong match. During the recognition process, there is
a possibility that the SIFT detector will report a match with the point outside the fiducial marker.
The wrong-match point can easily have a leading position in the first filter’s test, cause an
estimation failure.
During the test, observation reveals that wherever the wrong-matched point is, it is usually
located outside the fiducial marker’s keypoints cluster. If we calculate the geometric center of all
matched points and its distance to each point, it can be seen that the wrong-matched point
usually have the longest distance to the geometric center. So here, an average distance to the
center will be calculated, then a ratio between distance of one point to average distance can be
found. If the ratio exceeds a certain threshold, it will be dumped before the first filter taken into
action.
With the distance calculated, position can be found according to the type of the place. There
stands two possibilities: hallway, or a room.
3.3.2 When the Place Is a Hallway
𝑑
𝑝
Marker
Hallway
Figure 12. Hallway Position Illustration
35
When the place is a hallway, due to the angle and FOV we got in 3.1.1, the robot position can
always be located at an arc shown in figure 12.
Since the place with the marker can be found in the map with a coordinate reference, the
center of the arc can be calculated as 𝑝 in figure 12. Then position will then be input into the
AMCL node in ROS. Based on LIDAR feature matching using the wall of the hallway, AMCL
will publish a position estimation using TF publisher in ROS.
3.3.3 When the Place Is a Room
When the place is in a room. The LIDAR features may have a confusing layout because of the
indoor objects. So here a second marker is been used to estimate the position. The marker is
located with certain distance or angle with the trigger marker. For accuracy consideration, it is
recommended to put this marker in a place where the robot camera can’t acquire it in the image
frame with the trigger. So after the robot recognize the room and the marker, it needs to rotate a
Marker 1
𝑑1
Room
𝑝1
𝑝2
Marker 2
𝑑2
Figure 13. Room Position Illustration
36
certain angle to find the second marker and its distance information, similar as the process to
locate the trigger. When both markers’ distance are calculated, program can find two possible
positions shown in figure 13.
An extra position can be dumped according to the rotation of the camera. For example if the
camera is been rotated counterclockwise and marker 2 is in the first frame, marker 1 is in one of
the followed frames, then 𝑝2 is the robot position, 𝑝1 will be dumped. The heading direction of
the robot can be estimated by the second marker’s position in the camera frame. Then a full
position estimation can be published in ROS topics.
3.4 Error Prediction
3.4.1 Error in Place Recognition
For place recognition, error accrues when a wrong place is recognized and reported. In this
part, the weighting number of the final vote can also be used to calculate a confidence reference.
For place A in library, assume we do a voting with all matching result at 100% confidence, a
winning vote weighting can be calculated as 𝑉𝑏𝑖𝑛𝑎𝑟𝑦. If place A is the winner in a voting process,
with the final vote weighting 𝑉𝑝, the confidence for the recognition is defined as:
𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒𝑝𝑙𝑎𝑐𝑒 = 𝑉𝑝
𝑉𝑏𝑖𝑛𝑎𝑟𝑦∗ 100%
This confidence can be used to evaluate the possibility of a wrong recognition result reported.
3.4.2 Error in Position Estimation
For position estimation. When in the hallway, error will come from both distance calculation
and AMCL feature matching. AMCL in ROS will usually give a result with fix covariance for
37
reference. The error in distance calculation could be estimated use a normal distribution as
below, with distance value as mean value 𝜇, and 𝜎 given through a distance test. The distance
test program will take a known marker, calculate the distance for hundreds of times, and
calculate 𝜎 against a known distance.
𝑓(𝑥|𝜇, 𝜎) = 1
𝜎√2𝜋𝑒
−(𝑥−𝜇)2
2𝜎2
When the robot is in the hallway, we can estimate error in a similar way as above. With two
distance in two normal distribution sharing same 𝜎 , a bivariate normal distribution can be
described as follow [34]
𝑓(𝑥, 𝑦) = 1
2𝜋𝜎2𝑒
−(𝑥−𝜇𝑥)2(𝑦−𝜇𝑦)2
4𝜎4
Basically it will locate a circle of possible positions with distribution similar to figure 14.
Figure 14. Bivariate Normal Distribution
38
Chapter 4 Experiment Setup and Results
In this thesis, the west wing of Auburn University Broun Hall Third floor along with room 368
will be used to form a map for test. First, we use a flood sequence to mark the map and form up a
feature library. Then test the recovery program at multiple positions to see the result and
accuracy of the recovery solution.
Since the kidnapped recovery sequence only need a camera and a LIDAR, we don't need to
test the robot in a fully functional mode. So the robot will be remotely controlled, move to a
certain position of the map, and a recovery program will be started to test the performance. The
rotation of the robot can be achieved by remote control or manually, both of them won’t affect
the recovery sequence.
Figure 15. Broun Hall Third Floor West Wing Test Map
39
Before the camera is mounted on the robot, it will be calibrated for radial and tangential
distortions. It can be done by using the functions provided in OpenCV based on an algorithm
proposed in [36].
With a calibrated camera. First the FOV and camera angle will be calculated, then the map
will be marked with a feature library in 4.1. With a marked map, the recovery result will be
given in 4.2. Finally, the performance of the system will be analyzed in 4.3.
4.1 Marking the Map
The first step is to measure the FOV and angle limitation of the camera. The FOV can be
measured by a trigger maker. In this case, the fiducial marker shown in the figure 16 is been
used. The marker is chosen as the trigger because it has a good height-to-width ratio, and a good
size of the keypoint set in SIFT, which will help reduce the error in distance calculation later.
In this test, for every frame taken by the camera, it will be matched with the marker in figure
16 using ORB detector. A certain threshold of the matched descriptors will be set. If number of
Figure 16. Trigger Marker Used In Experiment
40
the matched descriptors exceeds the threshold, it will be shown as follow in picture 15, means we
have a match.
The FOV will be reported as a range of the distance. The lower boundary is where the camera
will lose the fiducial marker and other objects in the image. The Higher boundary is where the
ORB detector can’t report a match when facing directly to the trigger marker. After tested, the
Logitech C920 camera with trigger marker in figure 16 got a FOV from about 2 meters to
approximately 14 meters in the hallway, 1 to 14 meters in a room, and the angle of the FOV is
about 60 degrees. The FOV in the room will have a lower boundary to the FOV in a hallway,
because the objects in the rooms are mostly chairs, desks, vase or bookshelf, which are in a lower
height level compare to the posters and fire alarms in the hallway.
Now, we can start marking the map in figure 15. First, all the place candidates will be marked
as in figure 18. Since the size of room 368 shows in the map is a small one, it will be marked as
one place. The number of the place candidates in the map use is below 26, so they will be
marked alphabetically to help the illustration. For same reason, sample positions will be marked
Figure 17. FOV Test Matching
41
form 1 to 6. In this map, distance threshold between two sets in feature library is 0.6. So the least
distance between two places in library is 0.6.
Here, we choose the bottom left corner in figure 18, which is corner 𝐶𝐵𝐷 to start the flooding
sequence. So the first place to added in the library will be place A, since at position 1, near 𝐶𝐵𝐷,
marker at BD is out of the range in FOV.
Now the feature library will be like the table below, after we choose the objects. Here, a
fiducial marker will be used because place A only have one object, which gives us a low distance
number when adding another place candidate.
Place A
Trigger marker 1
Fiducial marker 1 1
Fire alarm 1
Figure 18. Place Candidates in Map
B
D
C A
H
G
E F
1m
1
2 3
4
6
5
Table 7. Feature Lib after Place A Added
42
With the flood sequence going on to position 2 and 6, B and D will marked, with the library
updated as table 8.
When position 3 is flooded, two new candidates will be reported as place E and H. Place E
have fire alarm and a poster different from poster 1, while place H have a door and a fire alarm.
In this condition, distance will be calculated as table 9.
Distance to A Distance to B Distance to D Total
Place E 0.75 0.6 0.8 2.15
Place H 0.6667 0.5 0.75 1.9167
As can be seen in table 9, place H not only failed the threshold test, but also hold a lower total
distance. So here, place E will be added into the feature library.
When position 5 is flooded, place C and G will be chosen as candidate. C have a door in
object set, while G have only a fire alarm. It is not hard to see that they will both fail the distance
Place A Place B Place D
Trigger marker 1 1 1
Fiducial marker 1 1 0 0
Fire alarm 1 1 0
Fire extinguisher 0 1 0
Door 0 1 1
Poster 1 0 0 1
Table 8. Feature Lib after Place B, D Added
Table 9. Distance Test for Candidate E and H
43
test. So fiducial marker 1 is added to C and G for a new round of distance test. The result is G
fail the distance test, while C passed all the test. So C is added to the library.
After the hallway been marked, place F will be added as a room place candidate. So when the
marking processes been finished, the library will be as table 10.
Place A Place B Place C Place D Place E Place F
Trigger marker 1 1 1 1 1 1
Fiducial marker 1 1 0 1 0 0 0
Fire alarm 1 1 0 0 1 0
Fire extinguisher 0 1 0 0 0 0
Door 0 1 1 1 0 1
Poster 1 0 0 0 1 0 0
Poster 2 0 0 0 0 1 0
Road Block 0 0 0 0 0 1
Chair 2 0 0 0 0 0 1
Turbot 0 0 0 0 0 1
4.2 The Experiment Recovery System Implementation
This section is the initiation of the recovery system. Before start up the recovery system,
the following parameter and data will be feed to recovery program or published around ROS to
prepare for the recovery sequence.
Table 10 Final feature library
44
1. The reference point of the trigger marker in each place. Each trigger’s position in the map
is recorded by saving its top left corner’s coordinate in an array. It is used to calculate the
robot’s position in map coordinate.
2. The parameter used in distance calculation. First is the estimation of 𝜕 , as discussed
before, a set of image point with offset and angle already measured is used to form a
Taylor series. Then parameter r in the distance function can be estimated by measure a set
of sample points with distance already know. With both 𝜕 and r settled, we can calculate
the covariance of distance estimation by taken a large set of sample distance. After
calculation, the covariance is 43.376𝑚𝑚.
3. The map is provided by ROS map server, with a resolution at 0.05𝑚/𝑝𝑖𝑥𝑒𝑙, under the
frame named ‘map’.
4. The LIDAR information is provided by ROS URG node. Using hokuyo LIDAR, it will
publish scanned LIDAR features under the ‘laser’ frame in topic ‘scan’.
5. Static transform publisher, to publish the transform form frame ‘laser’ all the way to
‘odom’, so the AMCL node can receive the message form hokuyo LIDAR and work
functionally.
Map_
server
Recovery
node
AMCL
urg_
node
Static
_TF
Odom TF
Figure 19. Message Flow in Recovery System
45
The message flow among the whole system is shown in figure 19. In this experiment, the input
of the recovery program is the camera pictures, the output will be the initial pose of the robot
published in the ROS topic, and a transform form map frame to ‘Odom’ frame. The second
transform can be enabled optionally to replace the initial pose setting of the AMCL node. But the
main function of the initial pose send by recovery node is a filter threshold, so if the AMCL
report an initial pose too far from the one in recovery node, it will be filtered.
After a successful recovery, the recovery node will have an output in its terminal window as in
figure 20. We can see that after the trigger been detected, a match-then-vote sequence will begin
according to the feature library. The voting sequence stop when one place become the only
leading winner among the place candidates. Then the distance will be calculated, followed by the
initial pose in the map coordinate. After calculated by AMCL node, the final recovery result will
be like figure 21. As we can see in the image section in the bottom left figure, text will be
printed on the image showing the matched objects. The right part is the estimated pose array
calculated by AMCL node. The array is shown as a set of arrows indicates the robot possible
Figure 20. Terminal Window of the Recovery Node
46
positions. The black line is part of the hallway map, while the white line showing the laser
feature in the surrounding environment.
4.3 The Recovery Result and Analyzation
4.3.1 Random Recovery Test
The first table below shows some random positions recovery results in the hallway. Here, the
real coordinate is manually measured and calculated. Since the map been used have a resolution
of 0.05m/pixel, and is drawn by a SLAM ‘gmapping’ node in ROS, error in the measured
coordinates in the table may not be very accurate. But the map hold a distance error measured
less than 1%, so the map and recovery result is still acceptable by ROS navigation stack.
Figure 21. Recovery Result Showing in RVIZ
47
Place Vote
weighting Confidence Distance/mm
Distance
measured/mm
Angle
Error/
degree
Coordinate Coordinate
Measured
Coordinate
Error/mm
A 0.5053 100% 5773.88 5760.0 1.52 (478.532, 107.642) (481, 105) 176.78
A 0.4926 98.52% 7420.58 7442.0 2.51 (501.935, 130.230) (498, 139) 490.43
A 0.4938 98.76% 8988.92 9000.0 1.22 (524.122, 153.125) (527, 151) 176.14
B 1.0 100% 3429.02 3467.0 1.67 (525.109, 161.817) (525, 162) 11.18
B 1.0 100% 6820.32 6806.0 1.75 (440.609, 80.709) (438, 74) 359.34
B 1.0 100% 8046.75 8110.5 3.00 (463.17, 100.800) (471, 94) 518.52
C 0.9077 90.77% 5401.23 5422.0 1.07 (357.618, 114.382) (360, 115) 130.00
C 0.9062 90.62% 6650.39 6671.0 1.66 (339.95, 132.045) (344, 134) 225.85
D 1.0 100% 5672.96 5702.0 0.89 (518.772, 229.214) (522, 235) 340.66
D 1.0 100% 10099.56 10005.0 3.12 (456.460, 291.538) (482, 302) 1380.71
D 1.0 100% 6740.22 6812.0 1.50 (478.532, 107.642) (481, 105) 180.35
D Failure Failure N/A 15000.0 N/A N/A N/A N/A
E 1.0 100% 4229.89 4242.0 0.65 (370.746, 311.193) (372, 313) 109.97
E 1.0 100% 6082.34 6100.0 1.27 (343.237, 284.987) (343, 288) 150.47
E 1.0 100% 7829.85 7888.0 2.7 (319.281, 260.380) (322, 268) 404.52
F 1.0 100% 3758.74 3757.0 1.5 (543.015, 384.458) (542, 382) 132.68
F 1.0 100% 5055.97 5032 2.3 (528.467,363.877) (527, 361) 161.47
From the results shown in table 11, we can see:
1. Within the FOV, all the places in the map can be recognized with acceptable confidence.
2. The distance measurement in table 11 have a convenience in 30.12mm. The maximum
error in the test is 98.56mm at a distance about 10000mm, which is still below 1%.
Table 11 Random Recovery Test 1
48
3. The coordinates can be used to estimate the error distance in recovered position. The
maximum error distance is 1.38m, with an average error at about 0.309m.
4. It can be seen that when the distance number is larger, the error in the distance
measurement will also increase.
5. The recovery angle will be slightly larger when distance increase. But we can see all the
angle errors are below 5 degrees.
6. Even with the trigger detected, the place may still fail the recognition, as the example
shown in table 11. Although the trigger can be detected at 15 meters to start a voting
process. The place still cannot be recognized with multiple match failures.
4.3.2 Experiment for Error Observation
To get more information about the system’s performance, two experiments are set to observe
the error in position estimation and recognition failure.
The first experiment is a recovery test at similar distances around one place. The recovery
node give the position estimation in the center of the arc as shown in figure 12, where AMCL
take this pose to give a more accurate estimation. Here, in this test, the robot is moved roughly
along an arc like in figure 12 with different rotation angles. The recovery results are shown as
table 13.
It can be seen that in the same position, increasing the angle of rotation will also increase the
distance value. This is because the 𝜕 angle used in distance calculation is estimated by a two
stage Taylor series. So when angle increased, the Taylor series approximation cannot well follow
the 𝜕 value, thus introduced more error.
49
Place Distance
to the arc
center/mm
Rotation
along map
x-axis
Distance/
mm
Distance
measured/
mm
Angle
Error/
degree
Coordinate Coordinate
measured
B 5.0 0 degree 8097.85 8083.0 1.80 (461.915, 98.918) (462, 98)
B 5.0 8.2 degree 8152.87 8083.0 2.8 (460.624, 97.618) (462, 98)
B 212 0 degree 8060.87 8086.0 1.2 (462.034, 99.037) (465, 95)
B 212 5.8 degree 8082.5 8086 2.1 (461.951, 98.932) (465, 95)
B 543 0 degree 8063.02 8099.0 1.94 (462.023, 99.045) (469, 91)
B 543 -7 degree 8072.0 8099.0 2.2 (461.988, 98.965) (469, 91)
B 1035 0 degree 8032 8110.0 2.6 (463.786, 101.234) (471, 94)
B 1035 5.7 degree 8046.75 8110.0 3.00 (463.17, 100.800) (471, 94)
Another fact shown in table 13 is the distance form robot to the canter line of the hallway will
also affect the pose estimation result. The further the distance is, the larger the error will be. This
is because the recovery node is using an arc for estimation. When robot is getting further from
the center line of the hallway, it will still be treated as it was in the center. So the error will
increase. Also we can see, the AMCL node cannot correct this error very effectively.
The second experiment been done is to test is how a wrong match will affect the recovery
result. If the robot can’t recognize the trigger, there is no doubt the recovery will fail. After the
trigger been detected, the match process will go through the feature library. During the
experiment in 4.3.1, it can be observed that when a place wins a voting test, some objects in its
feature set may still remain untested. Assume place B is being tested, as shown in figure 20.
After the matching test on fire extinguisher, which gives a positive result, place B will be
recognized. So in this case, object ‘door’ is a redundant object.
Table 13 Recovery test 3 result
50
The second experiment is implemented to recognize place A to E with one object covered by
white paper. The experiment in 4.3.1 shows that place B is the only one among A to E who has a
redundant object. And result in this test shows B is the only place who got a chance to be
recognized with one covered object. It can also be proved by simulating a voting using feature
library. As the result shows, with one object mismatched, a place with redundant object may still
be recognized, while a place without redundant object will fail the recognition for sure.
Another fact need to mention is that during the test, with no object been blocked deliberately
like above, place F is most likely to experience a recovery failure. Since the object in the room is
been moved or blocked frequently.
4.3.3 The Speed of the Recovery Program
The speed of the recovery node can be evaluate using the publish rate of the initial pose
message. As shown in picture 20. After tested for all the places, for each recovery procedure,
when the camera is capable of taking a picture with trigger in sight, the maximum recovery time
is 1.332s, the minimum time is 1.199s.
Figure 22 Recovery Speed
51
Chapter 5 Conclusion and future work
5.1 Summary and Recommendation
This thesis proposed a solution to the kidnapped robot problem. First a set of fiducial markers
combined with environmental features are used to mark a building. Then a voting algorithm is
implemented based on object matching to recognize the place, followed by a position estimation
based on marker’s distance and angle.
As can be seen in the experiment result, using the algorithms above and a map marked with
feature library, the robot can recover its position with an acceptable speed and error. The average
error of the distance measurement is below 1%, while the average error in position estimation is
about 0.31m. And the angle in pose estimation haS an error less than 3.5 degrees. It is an error
which can be calibrated later with the LIDAR feature and probabilistic model in AMCL node.
However, we can see this solution may still failed to recover robot’s position at a certain
situation. So the following recommendations is provided.
1. When marking the building, each place is recommend to have one or more redundant
object, so the algorithm can be more robust and reliable.
2. The algorithm works more efficiently in a hallway than in a room, since objects in a room
is more likely to be obscured.
3. More accurate estimation method for angle 𝜕 is recommend to reduce the total error.
52
5.2 Future Work
One of the problem emerged in this solution is: when applied in room recognition, the
object in the room may be blocked or rotated, cause a recovery failure. So here, a machine
learning algorithm can be used and make the recognition process more robust.
In a more broad scope, the library building process can also be replaced by computer
program. A program can be implemented to detect the trigger autonomously, pick the object and
store it in to a library. This will make the solution more user friendly.
53
References
[1] Currie Adam. “The History of Robotics”. July 2006.
URL:https://web.archive.org/web/20060718024255/http://www.faculty.ucr.edu/~currie/robo
adam.htm
[2] Needham Joseph, “Science and Civilization in China: Volume 2, History of Scientific
Thought”, Cambridge University Press. ISBN 0-521-05800-7.
[3] CMU robotics institute, “2005-2010 research guide”,
URL: https://www.ri.cmu.edu/research_guide/medical_robotics.html.
[4] Maybeck,P.S., “The Kalman filter: An introduction to concepts”,1990, pp 194-204.
[5] Sebastian Thrun, Wolfram Burgard, and Dieter Fox. “Probabilistic Robotics (Intelligent
Robotics and Autonomous Agents)”. The MIT Press, 2005.
[6] Morgan Quigley et al. “ROS: an open-source Robot Operating System”. In: ICRA Workshop
on Open Source Software. 2009.
[7] “ROS - Powering the world's robots”. 2014. URL: http://www.ros.org/.
[8] navigation stack - ROS Wiki. 2014. URL: http://wiki.ros.org/navigation.
[9] Negenborn Rudy, “Robot localization and kalman filters”. Institute of Information and
Computing Sciences, Utrecht University. 2003.
[10] Hyon Lim , “Real-time single camera SLAM using fiducial markers”, Aug 2009.
[11] Sivic, J., Okutomi, M. , Pajdla, T. “Visual Place Recognition with Repetitive Structures”,
March, In: Pattern Analysis and Machine Intelligence, IEEE Transactions
on (Volume:37, Issue: 11 ).
[12] Luke Ross, Dr Karen Bradshaw, “Fiducial Marker Navigation for Mobile Robots”, Oct, 2011
[13] Luke Ross, “Fiducial Marker Navigation for Mobile Robots”, Aug, 2012
[14] Alan Mutka ,Damjan Miklic ,Ivica Draganjac ,Stjepan Bogdan, “A low cost vision based
localization system using fiducial markers”, July 2008
[15] Qingxiang Wu. “Rough computational methods on reducing cost of computation in Markov
localization for mobile robots”. In: Intelligent Control and Automation, 2002. Proceedings
of the 4th World Congress. pp. 1226–1230.ISBN 0-7803-7268-9
[16] Roland Siegwart and Illah R. Nourbakhsh. “Introduction to Autonomous Mobile Robots”.
Scituate, MA, USA: Bradford Company, 2004.
[17] G.A. Bekey. “Autonomous Robots: From Biological Inspiration to Implementation and
Control”. In: Intelligent robotics and autonomous agents. MIT Press, 2005.
[18] Nyagudi, Nyagudi Musandu. “Humanitarian Algorithms: A Codified Key Safety Switch
Protocol for Lethal Autonomy”, Sep 2015.
[19] Chuho Yi, Byung-Uk Choi, “Detection and Recovery for Kidnapped-Robot Problem Using
Measurement Entropy” ,2012
54
[20] Jung, I.Lacroix,s., “Simultaneous localization and mapping with stereovision”, In:
Proceedings of the 11th International Symposium Robotic Research, Siena, Italy,2005
[21] Nister, D.Stewenius, H., “Scalable recognition with a vocabulary tree”, 2006
[22] Morgan Quigley, Eric Berger, Andrew Y, “STAIR: Hardware and Software Architecture”,
Stanford University, 2007
[23] David G. Lowe, “Distinctive image features from scale-invariant keypoints”, In:
International Journal of Computer Vision, 2004.
[24] David G. Lowe, “Object recognition from local scale-invariant features”, In: International
Conference on Computer Vision, Corfu, Greece, Sep1999.
[25] Ethan Rublee, Vincent Rabaud, Kurt Konolige and Gary R. Bradski “ORB: An efficient
alternative to SIFT or SURF”, 2011
[26] Edward Rosten, Tom Drummond “Machine learning for high-speed corner detection”, 2006
[27] Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua, “BRIEF: Binary
Robust Independent Elementary Features”, In: 11th European Conference on Computer
Vision, Heraklion, Crete, Greece, Sep 2010
[28] Marius Muja and David G. Lowe: “Scalable Nearest Neighbor Algorithms for High
Dimensional Data". Pattern Analysis and Machine Intelligence” 2009
[29] Marius Muja and David G. Lowe: “Fast Matching of Binary Features”. Conference on
Computer and Robot Vision (CRV) 2012.
[30] Marius Muja and David G. Lowe, “Fast Approximate Nearest Neighbors with Automatic
Algorithm Configuration”, In: International Conference on Computer Vision Theory and
Applications (VISAPP'09), 2009
[31] OpenCV. 2014. URL: http://opencv.org/
[32] OpenCV History, URL: https://en.wikipedia.org/wiki/OpenCV#cite_note-Itseez-1
[33] The Geometry of Perspective Projection,
URL: http://www.cse.unr.edu/~bebis/CS791E/Notes/PerspectiveProjection.pdf
[34] D. P. Bertsekas and J. N. Tsitsiklis, “Introduction to Probability”, 2nd edition (2008)
[35] Roland Siegwart, Illah R.Nourbakhsh, David Scaramuzza, “Introduction to Autonomous
Mobile Robots”, second edition, 2011, pages 195-227
[36] Z.Zhang. “Flexible Camera Calibration by Viewing a Plane from Unknown
Orientations”. In: International Conference on Computer Vision (ICCV'99), Corfu, Greece,
pages 666-673, Sep 1999.