2013 bayesian multicategorical soft data fusion for human–robot collaboration
TRANSCRIPT
-
8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration
1/18
IEEE TRANSACTIONS ON ROBOTICS, VOL. 29, NO. 1, FEBRUARY 2013 189
Bayesian Multicategorical Soft Data Fusionfor HumanRobot Collaboration
Nisar R. Ahmed, Member, IEEE, Eric M. Sample, and Mark Campbell, Member, IEEE
AbstractThis paper considers Bayesian data fusion of conven-tional robot sensor information with ambiguous human-generatedcategorical information about continuous world states of inter-est. First, it is shown that such soft information can be gener-ally modeled via hybrid continuous-to-discrete likelihoods thatare based on the softmax function. A new hybrid fusion proce-dure, called variational Bayesian importance sampling (VBIS), isthen introduced to combine the strengths of variational Bayes ap-proximations and fast Monte Carlo methods to produce reliableposterior estimates for Gaussian priors and softmax likelihoods.VBIS is then extended to more general fusion problems that in-volve complex Gaussian mixture (GM) priorsand multimodal soft-max likelihoods, leading to accurate GM approximations of highly
non-Gaussian fusion posteriors for a wide range of robot sensordata and soft human data. Experiments for hardware-based mul-titarget search missions with a cooperative human-autonomousrobot team show that humans can serve as highly informative sen-sors through proper data modeling and fusion, and that VBISprovides reliable and scalable Bayesian fusion estimates via GMs.
Index TermsBayesian methods, Gaussian mixtures, human-robot interaction, machine learning, Monte Carlo methods, recur-sive state estimation, robot sensor fusion, variational Bayes.
I. INTRODUCTION
IN order to behave intelligently in complex environments,
autonomous robots must continuously update their under-standing of the world by combining new data from various
sources. Despite considerable recent advances in autonomous
robot control and perception, human inputs are still required in
many practical settings to overcome various actuation/sensing
limitations and ensure robustness in the presence of uncertain-
ties. As such,data fusionplays an important role in the applica-
tion of collaborative humanrobot teams to diverse areas such
as defense and security [1], search and rescue [2], space ex-
ploration [3], and social robotics [4]. However, looking beyond
the ability to provide supervisory validation or training data for
static abstract phenomena (e.g., categories for object types [5]
or places [6]), the potential richness of human sensor data is
often overlooked for robotics applications.
Manuscript received December 18, 2011; revised May 28, 2012; acceptedAugust 18, 2012. Date of publication September 12, 2012; date of currentversion February 1, 2013. This paper was recommended for publication by As-sociate Editor C. Stachniss and Editor D. Fox upon evaluation of the reviewerscomments. This work was supportedin part by theNationalScience FoundationGraduate Research Fellowship Program and in part by AFOSR MURI FA9550-08-1-0356.
The authors are with the Autonomous Systems Laboratory, Cornell Univer-sity, Ithaca, NY 14850 USA (e-mail: [email protected]; [email protected];[email protected]).
Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TRO.2012.2214556
The problem considered here is the dynamic fusion of con-
ventional robot sensor data (i.e., hard data) with ambiguous
human-generated observations (i.e., soft data) related to un-
certain continuous/physical world states of interest, e.g., object
location, velocity, mass, temperature, etc. This study is moti-
vated by the fact that maintaining full observability over all
states of interest through robotic sensors alone can be challeng-
ing in many applications. For instance, as discussed in [7], a
robot which is equipped with a 2-D horizontal scanning lidar
can track the position and velocity of moving people, but will not
have direct access to height, weight, or goal location informa-
tion that could be used to improve target motion models. Moreimportantly, all target states become unobservable if targets are
occluded, confused with false alarms, or beyond sensor range
for a long time. By acting as an externally available sensor in
such cases, a helpful human agent can furnish the robot with
relevant data that substantially reduce uncertainty or inconsis-
tencies in desired state estimates, e.g., due to poor observability
or previous fusion of faulty information. However, unlike hard
data, soft data are difficult to model from first principles and
are not guaranteed to be provided in a consistent manner, since
they are highly context-specific and subject to uncertainties via
psychocognitive factors (e.g., expertise, stress, fatigue, memory,
and perception bias) [8].Given these considerations, soft data fusion hasbeen explored
in thecontext of several robotics applications, such as navigation
by social interaction [4], [9], cooperative tracking and surveil-
lance [10], and search and rescue [2], [11], [12]. However,
formal modeling and fusion of soft data via the standard
Bayesian state estimation paradigm [13], [14] have been con-
sidered in only a few relatively recent studies. Kaupp et al.de-
veloped a Bayesian method to fuse continuous soft range-with-
bearing data to tracked objects by modeling human sensors via
linear-Gaussian regression models, which were then incorpo-
rated into decentralized Kalman filters [15], [16]. The authors
of [17] extended this work to include probabilistic models of
human visual sensing, which were used to improve data asso-
ciation and object classification accuracy in joint humanrobot
tracking tasks. Bourgaultet al.considered grid-based Bayesian
fusion of binary human visual target detection likelihoods for
a distributed 2-D search problem [12]. Importantly, however,
these existing Bayesian approaches are inadequate to fuse in-
formation related through coarse/fuzzy terminology, which is
a predominant feature of soft data [8]. Some examples include
the following.
1) The car is moving quickly around the block; a bike is
close behind it.
2) Nothing is behind the building, on top of the roof, or near
the truck to the left of me.
1552-3098/$31.00 2012 IEEE
-
8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration
2/18
190 IEEE TRANSACTIONS ON ROBOTICS, VOL. 29, NO. 1, FEBRUARY 2013
3) The sidewalk is very steep; the nearby obstacle is much
lighter than the robot.
The main issue at hand here is how such data can be sta-
tistically modeled and fused with hard robot data in a rigor-
ous Bayesian manner. Structured codebook modeling strate-
gies have already been successfully used in the development of
human-assisted motion planners, which use probabilistic mod-
els of symbolic/linguistic motion primitives to infer constraints
on robot paths (e.g., go around the table and between the
chairs) [4], [18], [19]. However, the models that are devel-
oped for these planners are geared toward characterizing human
motion commands and are thus unsuitable to extract dynamic
state information from purely observational soft inputs. A mixed
fuzzy-Bayesian modeling approach for soft-hard fusion was
proposed by Mahler using random finite sets [20], in which soft
linguistic observation codebook likelihoods are modeled via
fuzzy set interpretations of virtual linear state measurements
(this method was also adopted in [7]). However, such likelihood
models cannot describe ambiguous reports with highly non-
Gaussian uncertainties (e.g., range-only reports such as the caris not too far from me).
Alternatively, it is proposed here to model such soft obser-
vations via multicategorical random variables that are condi-
tionally dependent on the states of interest. As such, terms
like nearby and left of imply uncertain discrete classifi-
cations of the continuous state by a human observer. Though
less precise than typical continuous hard data (e.g., lidar, sonar,
etc.), binary categorical data in the form of negative mea-
surements have already proved quite useful for Bayesian state
estimation in robotic mapping [13], localization [21], and ob-
ject search/tracking [2], [22], [23]. However, dynamic estima-
tion of continuous states from discrete multicategorical datarequires approximation of an analytically intractable hybrid
Bayesian inference problem. Various solutions exist in the esti-
mation [24][26] and machine learning literature [27], [28], but
these all have drawbacks that severely limit their suitability for
online dynamic data fusion.
This paper develops a novel recursive Bayesian fusion frame-
work to efficiently combine hard robot data with soft multi-
categorical observations of dynamic continuous states. Three
contributions are made in this regard. First, it is shown here
that soft multicategorical observations can be generally mod-
eled as discrete random variables via flexible hybrid likelihood
functions that are based on softmax distributions, which are eas-
ily learnable from training data and have convenient propertiesfor online state estimation. Second, a new variational Bayesian
importance sampling (VBIS) algorithm is developed for reli-
able fusion of soft multicategorical data. The VBIS algorithm
overcomes key limitations of other existing hybrid Bayesian
inference algorithms and leads to the rigorous development
of compact Gaussian mixture (GM) posterior approximations
for general hard-soft fusion applications. Finally, the proposed
fusion framework is demonstrated through online multitarget
search experiments that involve a cooperative humanrobot
team operating under various sensing modalities and prior in-
formation conditions. The experimental results show that the
proposed human sensor likelihood modeling approach, VBIS
algorithm, and GM-based recursive fusion framework enable
humancollaborators to serve as effective informationsources for
robotic state estimation tasks. This paper builds on preliminary
work in [29] and [30] by providing a more thorough explanation
and experimental evaluation of the proposed humanrobot data
fusion framework.
II. HUMANROBOTDATAFUSION ANDSOFTCATEGORICAL
DATAMODELING
A. General Problem Statement
The Bayesian data fusion approach proposed here models
soft (i.e., human-generated) descriptions of continuous states
with discrete random variables that represent contextually dis-
tinct sets of state categorizations. These discrete random vari-
able dependences on the state are modeled directly via flexible
continuous-to-discrete hybrid likelihood functions, thus en-
abling recursive Bayesian estimation of the unknown continu-
ous states from multicategorical soft data.
For discrete time index k Z0+ , let Xk Rn be the con-tinuous random state vector of interest with prior probability
density function (pdf) p(X0 ) and transition pdf p(Xk |Xk 1 )arising from known stochastic dynamics. Let k be a vector of
hard robot sensor data, which may contain a mixture of con-
tinuous data (e.g., lidar returns) and discrete data (e.g., detec-
tion/no detection outputs from a vision-based object detector)
with joint conditional observation likelihoodp(k |Xk ). LetDkbe anm-valued discrete random variable that represents a cate-
gorical human observation, where Dk has a conditional likeli-
hood function P(Dk =j |Xk ) forj {1, . . . , m} and m Z+ .Them possible realizations ofDk are assumed to be mutually
exclusive and exhaustive so thatm
j =1 P(Dk =j |Xk ) = 1.The sequences of all k and Dk until time k are denoted as1: k {1 , . . . , k } andD1: k {D1 , . . . , Dk }, respectively.
This paperadopts a recursiveBayesianprocess to sequentially
fuse 1: k and D1: k information at each time step kto update the
pdf forXk . Given1: k1 andD1: k1 , the dynamics prediction
step propagates the most recent pdf ofXk 1 forward in time via
the ChapmanKolmogorov equation [14]
p(Xk |1: k 1 , D1: k 1 )
=
p(Xk |Xk1 )p(Xk1 |1: k1 , D1: k1 )dXk1 . (1)
The robot measurement update step fuses the result of (1) withrobot-generated information ink via Bayes rule
p(Xk |1: k , D1: k 1 )
= p(k |Xk )p(Xk |1: k1 , D1: k1 )p(k |Xk )p(Xk |1: k1 , D1: k1 )dXk
. (2)
Finally, the human measurement update step fuses (2) with
human-generated information inDk via Bayes rule
p(Xk |1: k , D1: k )
= P(Dk |Xk )p(Xk |1: k , D1: k 1 )P(Dk |Xk )p(Xk |1: k , D1: k 1 )dXk
. (3)
-
8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration
3/18
AHMEDet al.: BAYESIAN MULTICATEGORICAL SOFT DATA FUSION FOR HUMANROBOT COLLABORATION 191
The main problem then is to determine the posterior pdf
p(Xk |1: k , D1: k ) (i.e., the filtering density), which representsthe uncertainty in Xk given all information up to time k.
1
It is assumed without loss of generality that the pdfs in (1)
and (2) can be, respectively, estimated by the prediction and
measurement update steps of conventional filters, such as the
(extended/unscented) Kalman filter [14], particle filter [24], or
Gaussian sum filter [31].
This paper focuses primarily on the measurement update
which is defined by (3); conditioning on Dk =j, 1: k andD1: k 1 is hereafter suppressed so that
p(Xk ) p(Xk |1: k , D1: k1 ) (4)
and p(Xk |Dk ) p(Xk |1: k , D1: k1 , Dk =j ) (5)
are the Bayesian prior and posterior in (3), respectively. Substi-
tuting these expressions into (3) gives
p(Xk |Dk ) = P(Dk |Xk )p(Xk )
P(Dk |Xk )p(Xk )dX =
p(Xk , Dk )
P(Dk ) (6)
where p(Xk , Dk ) is the joint pdf, and P(Dk ) is the marginalobservation likelihood.
For any given continuous state Xk , the possible realizations
forDk can be quite large and must be suitably tailored for each
practical application. Hence, just as raw lidar data or camera
images must be processed to generate meaningful k data, soft
observations are assumed to be processed by an application-
dependent interpreter to generate contextually recognizable Dkdata. As in most humanrobot interaction applications, such an
interpreter could be based on a predefined communication pro-
tocol that relies on a dictionary of known descriptor models and
contextual reference values, to ensure consistent communica-
tion [4], [19]. It is assumed for simplicity that the mpossiblevalues ofDkrepresent all desired human categorizations ofXk .
However,Dk can also be a vector whose elements are discrete
random variables that represent different types of categories
over arbitrary subsets ofXk (e.g., separate range-only bins and
bearing-only bins), in which case (3) is performed sequentially
for each element ofDk .2
Since Xk is continuous and Dk discrete, (6) defines a hy-
brid Bayesian inferenceproblem [33], for which two key issues
must be addressed3: 1) How to specify an appropriate human
sensor likelihood model P(Dk |Xk ), and 2) how to subsequentlyevaluate (6) for any givenp(Xk )?
B. Basic and Extended Softmax Models for Human Sensors
For eachj {1, . . . , m} , P(Dk =j |Xk ) must map Xk =xto the interval [0, 1] such that
mj =1 P(Dk =j |Xk =x) = 1.
1Ifk = orDk =, then (2) or (3) is skipped, accordingly.2The vector model also allows binary categories to be defined, as in nearby
versus not nearby and next to versus not next to. This offers an alternative tolumping nearby and next to into exclusive realizations of the same randomvariable so that different likelihoods for similar labels are obtained as a functionofXk . However, the interpreter must then ensure that contradictory realizationswithin Dk (i.e., where elements have joint likelihood of zero) are either avoidedor handled via Bayesian conflict resolution [32].
3As shown in Section V, the techniques that are developed here for D k can
be applied to categorical k data as well.
While many functions satisfy this criterion, this study exclu-
sively considers likelihoods that are defined via the softmax
function
P(Dk =j |Xk ) = ew
Tj
x+bj
mh= 1e
w Th
x +bh(7)
where wj , wh Rn
and bj , bh R1
are, respectively, vectorweights and scalar biases for classes j, h {1, . . . , m}. Thesoftmax function (also known as the multinomial logistic func-
tion) is widely used in statistical pattern recognition [34] and
is naturally well suited to modeling hybrid continuous-to-
discrete mappings in complex stochastic systems with state-
dependent switching behavior [33], [35]. An interesting feature
of (7) is that the log-odds ratio between any categoriesj andc
for a givenXk =x yields a linear hyperplane
logP(Dk =j |Xk )
P(Dk =c|Xk ) = (wj wc )
T x + (bj bc ) (8)
which implies that the probabilistic boundaries between cat-
egories for a given likelihood ratio are also linear and com-pletely specified by the parameter sets W ={w1 , . . . , wm } andB = {b1 , . . . , bm }. Note that the elements of W control thesteepness of the probability surface between categories and the
locations of the class boundaries, while the elements ofB enable
shifts from the origin. The authors of [35] prove that boundaries
defined via (8) always lead to a complete convex decomposition
ofRn so thatXk can always be fully partitioned among the m
classes ofDk .
Fig. 1(a) shows one possible softmax likelihood model for
a human providing one of 16 soft location labels (in terms of
categorical ranges and bearings) to indicate the relative 2-D
positionXk = [X, Y]T
of an object relative to some arbitraryorigin. This example shows how the model in (8) represents cat-
egorical ambiguities as a function ofXk ; softer weights lead to
fuzzier probability contours between class labels (in range di-
rections, for this example), while steeper weights lead to nearly
deterministic probabilities over geometrically convex regions
defining classes (across bearing directions). W and B can be
learned from labeled training data using convex optimization
procedures that are based on maximum likelihood or maximum
a posterioriestimation [34].
Equation (7) can be generalized by introducing hidden vari-
ables to induce nonconvex/multimodal categorical partitions
of Xk . One such generalization is the multimodal softmax
(MMS) model [36], which represents each observable class
j {1, . . . , m} as a collection of sj hidden subclasses de-pendent on Xk that are mutually exclusive and exhaustive,
where sj 1, andm
j =1 sj =S is the total number of sub-classes. Let R represent the hidden subclass variable, which can
take valuesr {1, . . . , S }4, and defineDk to be conditionallyindependent ofXk given R so that P(Dk =j, R= r|Xk ) =P(Dk =j |R= r)P(R= r|Xk ). Furthermore, define (j) tobe the set of all sj subclasses of classj, where (j)
(c) =
for j=c. If P(Dk =j |R= r) =I(r (j)) (the indicator
4Assume without loss of generality that the subclasses are indexed sequen-
tially in class order.
-
8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration
4/18
192 IEEE TRANSACTIONS ON ROBOTICS, VOL. 29, NO. 1, FEBRUARY 2013
Fig. 1. (a) Probability surfaces for example softmax likelihood model, where class labels take on a discrete range in{Next To,Nearby,Far From}and/ora canonical bearing {N, NE, E, SE,. . . ,NW}. (b) Probability surfaces for example MMS range-only model, where labels with similar range categoriesfrom (a) are treated as subclasses that define one geometrically convex class (Next To with s1 = 1) and two nonconvex ones (Nearby with s2 = 6and FarFrom withs3 = 8). (c) and (d) Calibration training data and learned MMS range-only probabilities forDk =Nearby for two different human subjects.
function) andP(R= r|Xk ) is defined via the softmax model,then marginalization ofR fromP(Dk , R|Xk )gives
P(Dk |Xk ) =S
r =1
P(Dk |r)P(r|Xk ) =
r (j )e
w Tr x+ brSc= 1e
w Tc x+bc.
(9)
Hence, the MMS likelihood forDk =j givenXk is the sum ofall sj subclass softmax likelihoods that are associated with class
j. Given an appropriate subclass configuration [s1 , . . . , sm ], (9)can model an arbitrary continuous-to-discrete likelihood func-
tion using an embedded softmax model to produce piecewise
linear class boundaries. Fig. 1(b) shows a simple example of
an MMS model that is derived from the basic softmax model
in Fig. 1(a). In this example, the MMS subclass weights are di-
rectly obtained from the model in Fig. 1(a), as any basic softmax
model can be trivially converted to an MMS model. However,
it is also generally possible to estimate MMS model parame-
ters directly from training data using maximum likelihood or
Bayesian learning techniques, when a basic softmax model is
unavailable [36]. Fig. 1(c) and (d) shows estimated MMS range-
only models for two different human sensors using maximum
likelihood learning with actual data. Thelabeled (Xk , Dk ) train-ing data points shown in these plots were acquired through an
experimental calibration procedure that requires human sub-
jects to provideDk observations under controlled conditions,
where Xk is known exactly. This principled statistical proce-
dure is very similar to the one described by Kaupp [10] tomodel continuous range-with-bearing human observations, ex-
cept that discrete multicategorical data are recorded instead of
continuous data, and nonlinear optimization techniques are used
for offline softmax/MMS model identification instead of linear
regression.
C. Hybrid Bayesian Inference for Soft Data Fusion
Although softmax-based functions are well suited to model-
ing P(Dk |Xk ), they unfortunately do not lead to closed-form
posteriors p(Xk |Dk ) for any choice of p(Xk ). For instance,
substituting (7) into (6) for anyp(Xk )yields
p(Xk |Dk ) = 1
C p(Xk )
exp
wTj x + bj
mh =1
exp wTh
x + bh (10)where C=
p(Xk )exp
wTj x + bj
m
h =1 exp
wThx + bhdX. (11)
Equation (10) cannot be represented in closed form since the in-
tegral for the normalization constant Chas no analytical solution
for anyp(Xk ). Furthermore, even whenp(Xk ) is a well-behavedpdf such as a uniform or Gaussian pdf, the softmax denomina-
tor in (10) cannot be absorbed along with the numerator and
prior into a known parametric pdf family. Therefore, (6) must
be approximated, as in all hybrid Bayesian inference problems
that involve continuous-to-discrete dependences [33].
Although standard EKF/UKF updates are not applicable,grid-based [2], [13] or Monte Carlo particle approximations
[13], [23], [24] of (6) could be used. Grids naturally support
recursive Bayesian fusion with arbitrary priors and likelihoods,
although they scale poorly with state dimension n, do not pro-
vide a compact posterior representation, and do not mesh easily
with typical filters fork data (e.g., EKFs/UKFs). Particle filter
approximations overcome the latter problem, but do not pro-
vide a compact approximation if many samples are needed. In
principle, particles could be compressed into a single Gaussian
pdf for filtering [25], although this leads to significant informa-
tion loss when (6) is highly non-Gaussian or multimodal. While
particles could also be compressed to flexible GM pdfs via on-
line EM learning [25], [26], this is prone to poor local maximaand high computational expense. Particle approximations also
require special care to ensure accuracy and mitigate undesirable
phenomena such as sample degeneracy. For instance, the per-
formance of the standard bootstrap particle filter (BPF) [24] can
degrade significantly ifn is large or the observation likelihood
is small, e.g., for a surprising observation [33].
Another possible approach to hybrid Bayesian inference
comes from variational Bayes (VB) methods, which attempt
to maximize the similarity between analytically intractable pos-
teriors and well-behaved posterior approximation pdfs that
are defined through freely optimizable parameters [34]. Mur-
phy proposed a local VB lower bound approximation to (6) for
-
8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration
5/18
AHMEDet al.: BAYESIAN MULTICATEGORICAL SOFT DATA FUSION FOR HUMANROBOT COLLABORATION 193
the special case of Gaussian priors and m= 2 binary logis-tic likelihood functions [27]; this approximation uses the fact
that the posterior pdf is well approximated by a Gaussian pdf
and admits a lower bound to Cthrough a convex lower bound
to the logistic likelihood function proposed in [37]. While this
VB approach leads to a scalable, deterministic, and accurate
Gaussian approximation of the true posterior with guarantees
onC, it is limited to P(Dk |Xk )for the special case ofm = 2.Furthermore, the VB posterior leads to an optimistic posterior
covariance estimate, which is highly undesirable in recursive
state estimation [14]. Bouchard [28] proposes to generalize the
VB method to softmax likelihoods with m 2, but only consid-ers the dual problem to infer (W, B)from(Xk , Dk )trainingdata for simple softmax models and thus does not generalize
Murphys method to approximate (6), present solutions to the
persistent optimistic covariance issue, or consider MMS models
for multimodal posteriors.
These issues are tackled next through new hybrid inference
approximations that not only generalize VB approximations to
m 2 softmax likelihood functions, but also address the op-timistic VB posterior covariance issue via novel application
of fast Monte Carlo importance sampling (IS), generalize to
inference with multimodal posterior distributions (induced by
non-Gaussian priors and MMS likelihoods), and guarantee con-
vergence to unique solutions (i.e., no poor local minima). These
approximations naturally lead to a fusion framework based on
compact GM pdf approximations, whichare especially desirable
for humanrobot fusion applications since they 1) lead to com-
putational costs that scale well with n and number of categories
mand 2) greatly facilitate online storage, communication, and
fusion with hardk data.
III. BASELINEFUSION: GAUSSIAN-SOFTMAXINFERENCE
A. Baseline Variational Bayes Approximation
Assume a Gaussian prior p(Xk ) =N(, ) with mean Rn and covariance matrix Rn n , and let P(Dk |Xk )be given by (7) for m 2. The local VB approximation de-rived here uses the fact that the analytically intractable joint pdf
p(Xk , Dk )in (6) can be well approximated by an unnormalizedGaussian lower bound pdf; this in turn leads to a VB Gaus-
sian posterior approximation p(Xk |Dk ) upon renormalizationthat also guarantees a lower bound to C. This proposed VB
inference approach generalizes the method [27] derived for the
special case ofm = 2.Let f(Dk , Xk ) be an unnormalized Gaussian function that
approximates the softmax likelihoodP(Dk |Xk ). The joint pdfand normalization constant (11) are approximated as
p(Xk , Dk ) p(Xk , Dk ) =p(Xk )f(Dk , Xk ) (12)
C C=
p(Xk , Dk )dXk . (13)
Note that p(Xk , Dk ) is an unnormalized Gaussian, since it isthe product of two Gaussians. This permits Cto be evaluated inclosed form as an approximation to the marginal likelihood of
the discrete observation,C=P(Dk =j ), in (6).
For m 2, f(Dk , Xk )is derived here via the upper bound tothe problematic softmax denominator in (7) proposed in [28],
which uses a variational product of m unnormalized Gaus-
sians. Specifically, for any set of scalars , c and yc for
c {1, . . . , m}, [28] proves that
log m
c= 1e
y c +
mc=1
yc c
2
+ (c )[(yc )2 2c ]+log(1 + e
c ) (14)
where (c ) = 1
2c
1
1 + ec
1
2
, andyc =w
Tc x + bc .
The variables and c are free variational parameters; given
yc , and c are selected to minimize the upper bound in (14),
thus providing the tightest possible upper bounding approxima-
tion to the denominator of (7). Assume for now that and care known (the selection of and c is considered in the next
section). From (7), it follows that
log P(Dk =j |Xk ) =wT
j x + bj log
mc= 1
ewTc x+bc
.
After replacing the second term on the right-hand side with the
bound in (14), subsequent simplification gives
f(Dk =j, Xk ) = exp
gj + h
Tj x
1
2xT Kj x
wheregj = 1
2
bj
c=j
bc
+ m
2 1
+
mc=1
c2
+ (c )[2c (bc )2 ]
log(1 + ec )
hj = 1
2
wj
c=j
wc
+ 2 m
c= 1
(c )( bc )wc
Kj = 2
mc=1
(c )wc wTc (15)
and where f(Dk , Xk ) P(Dk |Xk ) follows from (14). Sincethe prior can also be expressed as
p(Xk ) = exp
gp + h
Tpx
1
2xT Kp x
wheregp =1
2(log |2| + T Kp )
hp =Kp , Kp = 1 (16)
substitution of (16) and (15) into (12) gives the unnormalized
Gaussian joint pdf approximation
p(Xk , Dk ) = exp
gl + h
Tl x
1
2xT Kl x
gl =gp+ gj , hl =hp+ hj , Kl =Kp + Kj . (17)
-
8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration
6/18
194 IEEE TRANSACTIONS ON ROBOTICS, VOL. 29, NO. 1, FEBRUARY 2013
Normalization of (17) gives the desired variational Gaussian
posterior pdf approximation form 2
p(Xk |Dk ) =N(VB ,VB ) (18)
where VB =K1l , VB =K
1l hl . (19)
The approximate posterior mean and covariance updates in (19)
for discrete measurements bear close resemblance to the cor-responding continuous measurement updates for the Kalman
information filter [14]. With this resemblance in mind, an ex-
amination ofKjand hjsuggests that the softmax weight vectors
wj determine the average information content about Xk con-
tained in each category j. This is intuitively reasonable. As
shown in Fig. 1, large magnitude weights indicate sharp log-
odds boundaries between classes in (8) (i.e., less ambiguity and
greater separability between discrete classes as a function of
Xk ), which leads to more informative updates for Xk since
p(Xk )is squashed more strongly by P(Dk |Xk )via (6). Notethat VB is also independent of the actual discrete observationD
k =j , just as covariance/information matrix updates for the
Kalman filter are independent of observed continuous measure-
ments.
Variational Parameter Optimization: Analytical minimiza-
tion of the right-hand side of (14) with respect to the free varia-
tional parameters andc gives
2c =y2c +
2 2yc (20)
=
m 2
4
+m
c= 1 (c )ycmc= 1 (c )
. (21)
However, these formulas cannot be used to compute (18) di-
rectly since yc depends on Xk , which is unobserved. Therefore,
following the same strategy as [27] for the m= 2case, the vari-ational parameters are chosen to minimize theexpected valueof(14) with respect to the posterior. This is equivalent to maximiz-
ing theapproximatemarginal log-likelihood of the observation
Dk =j
logC= log
p(Xk , Dk )dXk (22)
wherelog C log C. Equation (22) can be expressed in closedform via standard Gaussian identities, but direct maximization
of (22) with respect to and c involves cumbersome cal-
culation of highly nonlinear gradient and Hessian terms. The
expectationmaximization (EM) algorithm [34] can instead be
invoked to iteratively optimize and c via expected values of(20) and (21), while alternately updating p(Xk |Dk ) via sim-ple closed-form expressions. The EM procedure is given in
Algorithm 1, where the yc terms in (20) and (21) are replaced
by their expected values under the current p(Xk |Dk ) estimateat each E step
yc = wTc + bc (23)
y2c
= wTc
VB + VB
TVB
wc+ 2w
Tc VB bc + b
2c . (24)
Since (20) and (21) are coupled, an extra iterative resubstitution
loop is needed for convergence ofc and (nlc =15 iterations
were sufficient for this papers studies).
It is straightforward to show that p(Xk |Dk )is log-concave,which means that the exact baseline Gaussian-softmax poste-
rior is unimodal. Hence, Algorithm 1 satisfies the necessary
and sufficient condition derived in [38] to guarantee monotonic
convergence to a unique set of variational parameters for the
local VB lower bound Gaussian approximation. Convergencecan be gauged by evaluating the change in (22) after each M
step, where
logC= yj +m
c= 1
1
2( + c yc )
(c )[
y2c
2 yc + 2 2c ] log(1 + e
c )
+
n
2
1
2
log
||
VB
+tr(1VB )
+ ( VB )T 1 ( VB )
(25)
and most of the required terms are already used in the E and M
steps. However, it is often more convenient to monitor conver-
gence ofVB between iterations so that the lower bound (25)can be evaluated at the end, if desired.
B. Improved VB Approximation With Importance Sampling
Fig. 2(a) and (b) shows that (17) is generally a close lower
bound approximation of the true joint pdf; as shown in Fig. 2(c)
and (d), a key benefit of the VB approximation is that VB
closely approximates the true mean of (6) true postupon renor-malization. Loosely speaking, this effect stems from the fact that
Algorithm 1 returns c and values that maximize the softmax
lower bound (15) on average; since cand can be uniquely de-
termined from xkvia (20) and (21), c and
tend tolie near the
posterior average (i.e., mean) ofXk .5 However, Fig. 2(c) and (d)
also shows that since C C, the approximate posterior (18) ob-tained from dividing (17) by Cno longer lower bounds the trueposteriorp(Xk |Dk ). In fact, even ifp(Xk , Dk )andp(Xk , Dk )are quite similar, multiplication of (17) by C1 C1 forces
5In [38], give a more technically precise explanation of this effect is given
for the special case of the binary logistic VB lower bound.
-
8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration
7/18
AHMEDet al.: BAYESIAN MULTICATEGORICAL SOFT DATA FUSION FOR HUMANROBOT COLLABORATION 195
Fig. 2. 1-D Bayesian update example for standard normal Gaussian prior(green) and binary softmax likelihood (blue), showing true posterior (magenta),VB softmax lower bound (black dash), and approximate joint pdf (red dash) for(a) small (soft) softmax weights and (b) large (steep) softmax weights. Renor-malized posteriors are shown in (c) and (d), respectively, with correspondingC
and C values.
(18) to be more concentrated around its peak than (6), and there-
fore, VB is optimistic relative to the true posterior covariancetrue post.
6 The goodness ofVB can be outweighed by opti-
mism in VB , since this can lead to severe overconfidence andinconsistencies during recursive Bayesian fusion. The bound
C Calso produces a small bias in VB relative totrue post .However, the unimodality ofp(Xk |Dk )and the fact that VBis close to true post can be exploited by another fast estima-
tion procedure to significantly improve VB and VB in (18).Monte Carlo IS [39] is particularly well suited to this end, since
arbitrary moments of (6) can be quickly estimated using an
importance distribution q(Xk ) that roughly corresponds to
(6). Specifically, givenNs samples{xi }Nsi= 1 Xk drawn from
q(Xk ), IS approximates the expectation of an arbitrary functionz(Xk )with respect top(Xk |Dk )as
z(Xk ) Ns
i= 1
i z(xi ), i p(xi )P(Dk |xi )
q(xi ) (26)
where i is the importance weight for sample i, and
the desired estimates correspond to = Xk and =(Xk )(Xk )T
. Note that (26) uses the fact that
p(Xk |Dk ) only needs to be known up to a normalizing con-stant so that the joint pdfp(xi , Dk ) =p(xi )P(Dk |xi ) can beused to compute i (as is standard practice, i are renormalized
to sum up to 1 [39]). Although q(Xk )can in theory be any pdfthat is easy to sample from and ensures proper support coverage
of p(Xk |Dk ) (i.e., p(Xk |Dk )> 0 q(Xk )> 0), IS is onlyreliable whenq(Xk )is sufficiently close top(Xk |Dk ). Since
6
That is,(true post V B )will be positive semidefinite.
the true posterior (6) is unimodal and has a mean close to VB ,it is natural to specify q(Xk ) as a unimodal pdf whose meanis parameterized by VB . The prior covariance can also be
used to constrain the size/shape ofq(Xk ) to ensure adequatecoverage ofp(Xk |Dk ). This is justified since conditioning onDk reduces the uncertainty in the (unimodal) posterior relative
to the (unimodal) prior such that( true post)is expected tobe positive definite (as in the conventional KF) [14].
These considerations lead to the VBIS algorithm, proposed
here to draw upon the strengths of both VB and IS. An outline
of VBIS is shown in Algorithm 2. The VB estimate in Algorithm
1 is first used to define q(Xk ), which is then applied to (26) toestimateVBIS and VBIS for the approximation p(Xk |Dk ) =
N(VBIS ,VBIS ). This work uses
q(Xk ) =N(VB , ) (27)
since this pdf is easy to sample from, permits convenient calcu-
lation ofi , and performs well in practice. Other, more sophis-
ticated unimodal pdfs could serve as q(Xk )on the basis ofV Band (e.g., heavy-tailed Laplace pdfs or mixture model pdfs).However, compared with (27), the benefits of such alternatives
can be outweighed by the cost of sampling xi and evaluating
i , especially if n 2 (e.g., Bessell functions are needed toevaluate a Laplace pdf with covariance).
C. Likelihood Weighted Importance Sampling
Another possible IS strategy to compute andis to bypassVB altogether in Algorithm 2 and simply set q(Xk ) =p(Xk ) sothati P(Dk |xi ). This approach, which is popularly knownaslikelihood weighted importance sampling(LWIS) [33], [40],
also defines the measurement update step of the standard BPF
[24] and works well ifp(Xk )andp(Xk |Dk )are similar. Whilefaster and nominally more computationally convenient than
VBIS, LWIS suffers if P(Dk |Xk ) is highly peaked relativetop(Xk )or ifDk is surprising with respect top(Xk )(i.e., theprior and posterior are not close) [33]. In such cases,i 0 formany samples, leading to inconsistent LWIS estimates. LWIS is
presented here as a common benchmark algorithm to estimate
complex non-Gaussian densities.
-
8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration
8/18
196 IEEE TRANSACTIONS ON ROBOTICS, VOL. 29, NO. 1, FEBRUARY 2013
Fig. 3. Synthetic 1-D fusion problem using exact and approximate inference methods. (a) Human observation softmax likelihood curves forP(Dk = j |Xk ).(b)(d) Posterior approximation results for human observations that are progressively more surprising relative to p(Xk )(five sample posterior results shown forLWIS and VBIS Gaussian approximations in each case).
TABLE IRESULTS FOR1-D FUSIONPROBLEM INFIG. 3
D. Numerical 1-D Example
Fig. 3(a) gives a hypothetical 1-D softmax likelihood model
for a soft human observation Dk , with m= 5 categories relatingto Xk , the location of a static target relative to a static robot. The
priorp(Xk )at some fixed time step kis shown in gray for threedifferent scenarios in (b)(d), in which k = and the updaterelies solely on Dk . Fig. 3(b)(d) shows the most likely Dk in
each case relative to the true target location xtrue (black star).
Moving from (b) to (d), the prior becomes less accurate (i.e.,
more surprising/inconsistent) compared with xtrue , e.g., due toan inaccurate/highly uncertain target dynamics model.
Fusion results are shown for exact numerical integration VB,
VBIS with Ns = 200, and LWIS with Ns = 200. The true meanand variance(,
2 )of the exact (non-Gaussian) posterior are
shown in Table I, along with the corresponding estimates and
MATLAB computation times for each approximation over 50
runs. The number of EM iterations for VB and VBIS are also
shown.7 Theeffective sample size (ESS)is provided as a mea-
sure to sample efficiency, and hence closeness ofq(Xk ) top(Xk |Dk ), for VBIS and LWIS [39].
In each case, VB is very close to with a small bias,
while 2
VB < 2
. (VBIS , 2
VBIS ) and (LWIS , 2
LWIS ) are ac-curate in (b), since Dk is unsurprising with respect to p(Xk ).However, LWIS becomes steadily worse in (c) and (d) since
p(Xk ) and Dk disagree more, whereas VBIS always main-tains a good approximation with only 200 samples. The poor
performance of LWIS in (c) and (d) is reflected by its dimin-
ishing ESS and the inconsistent nature ofLWIS and 2LWIS .
LWIS improves in (c) with larger Ns , although this has limited
impact in (d). Setting Ns = 10000 matches the computationtime for VBIS but still yields worse performance (ESS = 200,
7
Using a random initial guess for in (21) and a tolerance of 1e-3 on C .
LWIS =0.70 0.75, 2LWIS = 0.47 0.08) thanVBISwithNs = 200.
IV. GENERALIZEDFUSION: NON-GAUSSIANPRIOR AND
MULTIMODALLIKELIHOODINFERENCE
The assumption thatp(Xk )is Gaussian and thatP(Dk |Xk )is well modeled by a basic softmax likelihood (7) with con-
vexly separable classes can be easily violated in practical
humanrobot fusion scenarios. The prior p(Xk ) can be non-
Gaussian through multimodal intial beliefs, or if Xk evolveswith non-Gaussian/nonlinear dynamics (e.g., unobservable dy-
namic mode changes), or if updates via k involve non-
Gaussian likelihoods [2], [23]. Equation (7) can also be in-
adequate to model P(Dk |Xk ), e.g., soft distance observa-tions are better modeled by nonconvex categorical MMS like-
lihoods (see Section II). Fortunately, VBIS can be extended
for recursive Bayesian fusion in such scenarios using GMpdf
approximations.
In the sequel, assume thatp(Xk ) in(6) isgiven byan M-termGM
p(Xk ) =
Mu =1
P(u) p(Xk |u) =
Mu= 1
cu N(u , u ). (28)
The hidden discrete variable U takes values u {1, . . . , M },u Rn andu Rn n are, respectively, theuth componentmean and covariance, and the component weights cu R0+
satisfyM
u =1 cu = 1. The universal approximation property ofGMswas used in [31] to derive recursive non-GaussianBayesian
state estimators for continuous sensor data via parallel banks of
KFs/EKFs. This idea was later extended to incorporate paral-
lel banks of UKFs [26] and PFs [25]. Due to their beneficial
statistical properties and high flexibility, such GM filtering al-
gorithms have since proven useful for many robotic Bayesian
sensor fusion applications (see [17] and [41]). Thus, any such
-
8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration
9/18
AHMEDet al.: BAYESIAN MULTICATEGORICAL SOFT DATA FUSION FOR HUMANROBOT COLLABORATION 197
GM filter can be assumed here to approximate (1) and (2) in the
form of (28), whereMis automatically determined by the GM
filter to balance computational speed and estimation accuracy.
Furthermore, it is assumed that the human sensor likelihood
P(Dk |Xk )is given by the MMS model in (9).
A. Variational Bayesian Importance Sampling With GaussianMixture Priors and Multimodal Softmax Likelihoods
An approximation top(Xk |Dk ) is derived by first consideringthe joint pdf givenDk
p(Dk , Xk , U , R) =P(Dk , R|Xk , U)p(Xk , U)
= P(Dk , R|Xk )p(Xk , U) = P(Dk |R)P(R|Xk )p(Xk |U)P(U)
(29)
where the first line follows from Bayes rule, and the second
line follows from the conditional independence properties of the
MMS model (see Section II). Recall from Section II that 1) R
is a hidden subclass variable with values r {1, . . . , S }, whereeach subclass is deterministically mapped to a single class label
j {1, . . . , m} for the observation Dk , and 2) (j)denotes theset ofsj subclasses mapping toj, where P(Dk =j |r) =I(r(j)). From the law of total probability, the posteriorp(Xk |Dk )is
p(Xk |Dk ) =M
u= 1
r (j )
p(Xk |u,r,Dk )P(u, r|Dk ). (30)
Using Bayes rule and the joint pdf (29), the first term in the
summand of (30) can be written as
p(Xk |u,r,Dk ) = P(Dk |r)P(r|Xk )p(Xk |u)P(u)
P(Dk |r)P(r|Xk )p(Xk |u)P(u)dXk.
(31)
Canceling the terms that are independent ofXk gives
p(Xk |u,r,Dk ) = P(r|Xk )p(Xk |u)P(r|Xk )p(Xk |u)dXk
(32)
which is the conditional posterior given Dk =j , mixing com-ponent u, and subclass r (j). Note that the numerator in (32)is the product of a Gaussianp(Xk |u) =N(u , u )and a soft-max likelihood P(r|Xk ), while the denominator is the marginalsubclass rsoftmax observation likelihood under Gaussian com-
ponent u. Therefore, (32) is a unimodal conditional pdf that canbe well approximated by a Gaussian using the VBIS procedure
in Algorithm 2 so that
p(Xk |u,r,Dk ) p(Xk |Dk , u , r) =N(z r ,z r ). (33)
Next, the second summand in (30) P(r, u|Dk )is
P(u, r|Dk ) = P(u,r,Dk )
P(Dk ) (34)
= P(u,r,Dk )
Mu =1
r (j )P(u,r,Dk )
= 1
CP(u,r,Dk )
(35)
where the numerator can be derived from (29) as
P(u,r,Dk =j ) =
p(Xk |u)P(r|Xk )P(Dk =j |r)P(u)dXk
=P(u)
p(Xk |u)P(r|Xk )dXk (36)
where P(u) =cu from (28), and the last line follows fromP(Dk =j |r) = 1 for r (j), by the definition of the MMSmodel. The integral in (36) is also the denominator in (32)
P(r|u) =
p(Xk |u)P(r|Xk )dXk =Cr u . (37)
Substituting these expressions into (36) and then (35) gives
P(u, r|Dk ) = 1
C cu Cur . (38)
Equation (37) is analytically intractable, but can be estimated in
two ways. First, since VBIS (Algorithm 2) is used to estimate
(32), (37) can be directly approximated by a corresponding VB
lower boundCur Cur obtained via(25) in Algorithm 1. In thiscase, the nominal conditioning on Dk =j in (25) is replacedby joint conditioning on U=u and R= r so that individualur ,ur , c,ur , and ur estimates are used in (25) for each
possibleu and r pairing to computelog Cur . Second, (37) canbe estimated via direct sampling as
Ps (r|u) = 1
Nu
Nul= 1
P(r|Xk =xl ) (39)
where {xl }Nul=1 isa setofNusamples drawn directly from the uth
prior componentN(u , u ). The first approach could bias the
posterior approximation if the bound Cur Cur is too loose.However, the variance ofPs (r|u) is inversely proportional toP(r|u) and Nu , meaning that (39) can fall below the lowerbound Cur ifP(r|u) is very small (i.e., P(r|u) 0.01) andNu is too small. Thus, to obtain a reasonable estimate, Cur isused to floor (39) as a consistency check for fixed Nu
P(r|u) max[exp(log Cur ), Ps (r|u)] P(r|u). (40)
Hence, (38) becomes
P(u, r|Dk ) 1
C cu P(r|u) ur (41)
where C=M
u= 1
r (j )
cu P(r|u). (42)
Finally, combining (32) and (42) into (30) yields a GM approx-
imation top(Xk |Dk ) p(Xk |Dk )
p(Xk |Dk ) =M
u= 1
r (j )
ur N(ur ,ur ) (43)
=K
h= 1
h N(h ,h ) (44)
withK= sj MGaussian components.
-
8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration
10/18
198 IEEE TRANSACTIONS ON ROBOTICS, VOL. 29, NO. 1, FEBRUARY 2013
B. Likelihood Weighted Importance Sampling and Variational
Bayes Gaussian Mixture Fusion
Algorithm 3 summarizes the generalized VBIS fusion algo-
rithm. Note that if VBIS in step 4 of Algorithm 3 is replaced by
LWIS for component h and P(r|u) = Ps (r|u) is instead usedin step 5, an LWIS-based GM approximation to (30) is obtained.
Likewise, a VB GM approximation is obtained by using only
Algorithm 1 in step 4 (i.e., ignoring the IS correction) and setting
P(r|u) = Cur in step 5 (i.e., ignoring step 3). The next exam-ple shows that the VBIS procedure in Algorithm 3 improves
considerably on both alternatives. Note that the VB, VBIS, and
LWIS baseline Gaussian approximations from Section III forGaussian priors and softmax likelihoods are special cases of the
corresponding GM approximations for GM priors and MMS
likelihoods, withM = 1 andsj = 1 j {1, . . . , m}.1) Numerical 1-D Example:Fig. 4 modifies the previous 1-D
humanrobot fusion example in Fig. 3 so that p(Xk )is now anM= 4 component multimodal GM (gray) and Dk now takesthe form of a coarse range-only observation with m = 3non-convex categories (Next To, Nearby, Far From). Shown
are the results of fusing the (surprising) human observation
Dk =Far From via numerical integration to obtain the exactmultimodal posterior pdf (magenta). Also shown are the full
8-component GM posterior approximations that are obtained
with VB and 100 trials of both VBIS (Algorithm 3, Nu =Ns =500) and LWIS (500 samples).
Due to its brittleness to surprising measurements, LWIS
clearly fails to approximate the minor posterior modes on the
positive Xk axis and struggles to approximate the major poste-
rior modes on the negative Xk axis. The VB GM approximation
(which required 11-23 EM steps per component) shows con-
siderable improvement in approximating all posterior modes,
but it still significantly underestimates all component variances
as well as the largest component weight on the left. In contrast,
VBIS provides a very high-fidelity GM approximation to the ex-
act posterior. Fig. 4 also shows the resulting computation times
(using unoptimized MATLAB code) and KullbackLeibler di-
Fig. 4. Synthetic 1-D fusion problem with GM prior and range-only MMSlikelihood model for P(Dk = j |Xk ) derived by grouping the five softmax
classes in Fig. 3(a) into three MMS classes; sample 8-mixand GM posteriorapproximations shown for D k = Far From (likelihood in red dash), alongwith run time and KLD statistics.
vergences (KLDs) between the true posterior p(Xk |Dk )(fromnumerical integration) and each GM approximation p(Xk |Dk ),where the KLD is given by
KL[pp] =
p(Xk |Dk )log
p(Xk |Dk )
p(Xk |Dk )
dXk (45)
and smaller KLDindicates thatp(Xk |Dk ) loses less informationfromp(Xk |Dk )(and is therefore a better approximation to the
true posterior). Clearly, LWIS loses the most information onaverage, while VBIS loses the least. Repeating LWIS with 1500
samples matches the time required for VBIS with 500 samples,
but only reduces the LWIS KLD by about half. In addition,
the VBIS KLD increases to 0.23 0.20 if the direct-samplingestimate Ps (r|u)ofCr u is only used in step 5 of Algorithm 3(i.e., ifCr u from VB is ignored), since Ps (r|u)underestimatesthe weights for the minor GM posterior modes on the positive
Xk axis. This shows that the VB bounds Cr u help improve theposterior GM weight estimates in (40).
C. Practicalities
1) Parallelization:The nested for loops that contain steps 26 of Algorithm 3 can be parallelized into sj M independentVBIS updates. As such, parallelized GM filtering strategies for
k fusion can be readily adapted to incorporate soft categorical
measurements via (3) using Algorithm 3. In particular, if GM
filters are used to approximate (1) and (2), then the complete
hybrid Bayesian fusion cycle can be implemented as a bank of
parallel Gaussian filters that are combined to produce a final
GM posterior approximationp(Xk |1: k , D1: k )at each time stepk.
2) Mixture Condensation: The number of mixands inp(Xk |1: k , D1: k )grows at each time step k if either sj >1 in Algo-
rithm 3 or (1) and (2) marginalize out discrete random variables
-
8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration
11/18
AHMEDet al.: BAYESIAN MULTICATEGORICAL SOFT DATA FUSION FOR HUMANROBOT COLLABORATION 199
Fig. 5. Experimental setup.(a) Pioneer 3-DX robot used forexperiment,featuring Vicon markers foraccuratepose estimation; a HokuyoURG-04 LX lidar sensorfor obstacle avoidance; an onboard Mini ATX-based computer with a 2.00 GHz Intel Core 2 processor, 2 GB of RAM and WiFi networking; and a Unibrain Fire-IOEM Board camera. (b) Base field map used in all search missions, showing locations of six opaque obstacle walls and two generic landmarks. (c) Humanrobotfusion GUI, which runs on a 2.66 GHz Intel Core 2 Duo workstation with 2 GB of RAM.
(e.g., via GM process/measurement noise models). Standard
GM compression methods should thus be applied to maintain
tractability while minimizing a suitable information loss metric
with respect to the full GM posterior approximation [42], [43].
3) Component Gating for Skipping Updates: IfPs (r|u) 1from (39), then ur and ur will be very close to u andu .Step 4 of Algorithm 3 can thus be modified to apply a gating
threshold after step 3 to determine whether the posterior com-
ponent for the pair (u, r) requires EM iterations for the VBISapproximation. IfPs (r|u) , alternative component updatesvia LWIS or prior equivalence (i.e., ur =u and ur = u )are used, and step 5 becomes P(r|u) = Ps (r|u); otherwise,steps 4 and 5 are carried out with VBIS as usual. Note that
should be set close to 1 (e.g., = 0.9999) to ensure only those
components that are definitely not worth updating by VBIS areconservatively skipped.
V. COOPERATIVEMULTITARGETSEARCHEXPERIMENTS
As discussed in [11], [12], and [17], humanrobot informa-
tion fusion is particularly relevant for cooperative target search
applications such as coordinated search and rescue, large-scale
surveillance, and urban reconnaissance. To provide practical
insight on the utility of the proposed soft human information
fusion approach, an experimental application to cooperative in-
door target search missions was conducted with a real human
robot team.8
A. Problem Setup
A single human agent and a single autonomous mobile robot
were tasked with finding and correctly identifying five hidden
targets as quickly as possible under a fixed time constraint.
Fig. 5(b) shows the base map of the 5 m 10.5 m indoor areawhich is used to conduct the multiple search mission experi-
ments, which featured several movable obstacle walls and two
8Similar experiments with 16 different human users were conducted in [44]to examine sensitivity to P(Dk |Xk ); although not discussed here, the resultsfrom that study corroborate this papers findings on the utility of the proposed
fusion approach.
generic landmarks. The walls are placed such that the human
(who remained seated off field at a computer) could only see a
small portion of the search area by direct line of sight. The five
targets were static orange traffic cones labeled with unique ID
numbers (1 through 5) that were hidden at various locations that
differed across four separate search missions.
Each target locationXt R2 is modeled by a GM prior
p(Xt ) =
Mtu = 1
ctu N(tu ,
tu ) (46)
where the number of targets is known a priorifor t {1,.., 5}andp(X1 ) =p(X2 ) = = p(X5 )at mission start; these pri-ors are detailed in Section V-C. Each p(Xt ) is updated over
time using one or both of the following information sources.1) 1: k : the set of all detection/no detection observations made
by the robots visual target detector, and 2) D1: k : the set of all
soft target location data provided by the human.
Fig. 5(a) shows the pioneer 3-DX autonomous mobile robot
that is used in the experiment. The robot is equipped with a
camera and vision-processing software that detects orange traf-
fic cones up to a 1 m range with a 42.5 field of view at 2 Hz.
The robot moves at a constant speed of 0.3 m/s with a known
map of the search area and highly accurate pose data from Vi-
con motion tracking. The robot autonomously navigates toward
intermediate search points (i.e., goal locations) based on the
updatedcombined undetected target posterior GM pdf
p(Xcombk ) =tTk
1
|Tk | p(Xt |1: k , D1: k ) (47)
whereTk is the set of undetected targets at time k . As in [2],
the target pdfs are used to autonomously plan search paths us-
ing a simple suboptimal greedy strategy. Equation (47) is first
discretized to select the highest value (nonobstacle) grid cell
defining the robots next search point; the robot then creates
and follows a path using the D algorithm to ensure that this
point lies at the center of the target detector likelihood function,
shown in Fig. 6(e). The robot immediately repeats this planning
procedure whenever it either reaches its current search point
-
8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration
12/18
200 IEEE TRANSACTIONS ON ROBOTICS, VOL. 29, NO. 1, FEBRUARY 2013
Fig. 6. (a) Example GM target location prior. (b)(d) Base MMS models for prepositions. (e) MMS model for camera detection likelihood. (f)(i) Posterior GMsfrom VBIS after fusingDk in (b)(d) with GM prior in (a). (j) Posterior GM from LWIS fork =No Detection report with GM prior in (a).
without detecting anything or if it receives new information
from the human (described below). While other search strate-gies could be used, this approach works well and is tied to
searches that are based on model predictive control [2], [23].
Comparisons with other search methods are beyond the scope
of this study.
The human remains seated at a computer station facing the
field [coordinates x= 0.8 m, y= 3.3 m in Fig. 5(b)] andcommunicates with the robot through the graphic user interface
(GUI) shown in Fig. 5(c). The human has two tasks: 1) classify-
ing detections by the robot as either false alarms or actual targets,
and 2) voluntarily modifying the target GM pdfs via soft infor-
mation messages Dk . For task 1, the robot streams processed
camera images at 1 Hz to the GUI and pauses to report visual
target detections. If the human declares a false alarm, the robot
notes the objects location to prevent reacquisition. Otherwise,
the robot localizes the target via laser and camera data, and the
GM for the identified target t is removed from (47). For task 2
(the focus of this study), the human can use direct observations
of the field and the robots camera feed to send messages that
update (47) (detailed below). The human also has access to a
2-D surface plot of (47) overlaid on a labeled map of the search
area so that consistent contextual information is available for fu-
sion. The human can only send information andcannotdirectly
command the robot. However, the robot automatically replans
whenever a new Dk is fused, since the maximum of (47) can
change significantly.
B. Online Measurement Updates
Each p(Xt |1: k , D1: k ) is recursively updated online via (2)and (3); (1) is not needed since the targets are all static. The
k updates are skipped for false alarms (assumed to be filtered
out perfectly by the human), whileDk updates occur as human
messages arrive spontaneously.
1) Robot Visual Detection Model and k Updates: The
robots target detector likelihood P(k |Xt ) is a hybrid prob-abilistic mapping from Xt to a discrete observation k
{No Detection, Detection}. As such,P(k |Xt
)is well ap-
TABLE IIHRI GUI CODEBOOKCHOICES
proximated by the 2-D MMS model shown in Fig. 6(e), which
describes the No Detection class likelihood with a high prob-
ability outside the vision cone. The parameters for this model
were learned offline and shifted online to account for the robotspose and known occlusions (e.g., walls). Since P(k |Xt )is anMMS model, the inference methods in Section IV obtain a GM
approximation to (2). LWIS GM fusion with 1000 samples per
component update and a component gate of = 0.9999 gavesufficiently accurate results, due to the robots slow motion.
Fig. 6(j) shows an example LWIS GM fusion update with the
nominal MMS camera model, illustrating the posterior scatter-
ing effect induced by negative information from No Detec-
tion updates [2], [22].
2) Human Observation ModelsandDk Updates: Structured
three-field messages of the form Dk =(Existence) is (Preposi-tion) (Reference) were sent sequentially by the human, where
any combination of the predefined codebook entries shown inTable II could be selected in the GUI via mouse. Existenceal-
lows positive/negative soft observations to be sent, assuming
each targets ID is unavailable until detection (the data associ-
ation problem due to this ambiguity is addressed below). Ref-
erence determines each observations spatial reference point,
whilePrepositiondetermines the MMS model to use for modi-
fying each target GM given the Existenceand Referencefields.
This study used three categorical ranges and two categorical
bearings, givingDk 90 distinct realizations.
Base MMS models for Preposition entries were learned
offline with training data from the single human user who
performed all missions in this study. Fig. 6(b)(e) shows the
-
8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration
13/18
AHMEDet al.: BAYESIAN MULTICATEGORICAL SOFT DATA FUSION FOR HUMANROBOT COLLABORATION 201
Fig. 7. True target locations and initial GM priors for Mission 4, showing(a) uniform and (b) bad search priors. The uniform GM prior in (a) isthe same in all four search missions and the bad priors for Missions 13 arequalitatively similar to (b).
TABLE IIIEXPERIMENTAL SEARCHMISSIONMATRIX
resulting models, whose origins all correspond to a nominal
(0, 0)Reference Locationposition inXt space. The weights ofthese base models are shifted/rotated onlineto be consistent with
the desired Referenceorigin/orientation. Negative (i.e., Noth-
ing is. . .) observations with respect to aPrepositionclassjare
handled via pseudopositive measurement updates with respect
to all other classes t=j in the corresponding MMS model.All Dk updates are performed online with VBIS GM fusion
(Algorithm 3, usingNa =Ns = 500and= 0.9999).3) Data Association: Data association issues arise ifDk is
not target-specific, i.e., Something is. . . could apply to any
one target, while Nothing is. . . applies to all targets. Thisambiguity is handled here through the GM-based probabilistic
data association (PDA) [45], in which (3) is computed for the
hypothesis thatDk describes targettto givep(Xt |1: k , D1: k ),
and the prefusion prior GM p(Xt |1: k , D1: k1 ) is assigned toall hypotheses, where Dkdoes not describe t. Marginalizing out
the association hypothesis gives the updated target t GM
p(Xt |1: k , D1: k ) =() p(Xt |1: k , D1: k )
+ [1 ()] p(Xt |1: k , D1: k 1 ) (48)
where = 1ifDk has positive/Something is. . . data (= 0
otherwise for negative/Nothing is. . . data), and () is the
Fig. 8. Search performance metrics. (a) and (b) Search mission times (sec)under uniform and bad priors. (c) and (d) Number of targets found per missionunder uniform and bad priors.
probability of the hypothesis that Dk describes target t. Here,
(0) = 1 and (1) = 1|Tk | , where |Tk | is the number of unde-
tected targets at timek . The probability of erroneous/falseDkis assumed to be zero for simplicity.9
4) Mixture Compression: Following k and Dk updates,
each p(Xt |1: k , D1: k ) is compressed to M= 15 mixands viaSalmonds joining method [42], which preserves overall GM
mean and covariance. This requires O(M2o)time forMo initialmixands; therefore, only the 100 highest weighted components
ofp(Xt |1: k , D1: k )are used for each merging operation.10
C. Target Priors and Fusion Scenarios
Four sets of search missions were conducted under the three
types of sensor fusion modalities and two types of initial target
GM priors shown in Table III (for a total of 24 search missions).
The same four missions (each characterized by a different set
of true target locations) are used to study all cells of Table III.
Fig. 7(a) and (b) shows the true target locations and priors for
Mission 4 (other maps and priors are not shown due to limited
space). The pseudouniform GM prior was the same for all four
missions; the bad GM priors were highly inconsistent with the
true target locations in each mission, reflecting worst case search
scenarios, where a prioriinformation is badly flawed. To sim-
ulate realistic target discovery with the same human operator,
positive Dk messages were not sent until targets were actually
observed by the human during the mission. All missions ended
if the robot did not find all targets after 15 min (900 s). This
challenging time constraint was chosen after extensive testing
9Nonzero probabilities can generally be incorporated into (48) to maintain aGM fusion pdf [32], [45].
10This typically led to little information loss at each step, since each targetsGM can only have 30255 components following ak orDk update and since
GM weights above 0.01 are always concentrated in 100 mixands.
-
8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration
14/18
202 IEEE TRANSACTIONS ON ROBOTICS, VOL. 29, NO. 1, FEBRUARY 2013
Fig. 9. Two-norm of MAP-estimated errors of each targets location over time based on GM in Mission 4 scenarios; black markers on time axis denote instanceswhere Dk is fused and the red dashed line denotes 0.5 m error mark (error traces end if target is detected). Search with Robot Only, Human Only, and HumanWith Robot fusion shown from left to right; uniform and bad GM search priors shown in top and bottom rows, respectively.
showed it to be the minimum time required for therobots greedy
search to find all targets in all missions without Dk updates.
D. Results: Overall Search Performance
The overall search performance of the humanrobot team is
gauged here via the search completion time and the number of
targets detected in each search mission. Fig. 8 shows the results
for these two metrics over all 24 search missions. Human With
Robot sensing clearly offers the best overall search perfor-
mance, since all five targets were always found in each mission
within 8.513 min. While more targets were found under Hu-
man Only sensing than under Robot Only sensing (which has
the worst overall performance), the mission completion times
for these conditions were about the same. The number of tar-
gets detected for each of these conditions drops slightly whenmoving from uniform to bad GM priors; in contrast, the prior
type did not significantly affect performance with Human With
Robot sensing.
The Robot Only results underscore the nontrivial nature
of the search problem and the inadequacy of the greedy search
strategy when only k is fused. With Dk available, the robot
was more proficient at detecting targets via the greedy search
(though performance was not necessarily optimal in any
sense). The improvement from human input can be explained
by comparing the typical level of informativeness of each GM
p(Xt |D1: k , 1: k )over time under various fusion and prior con-
ditions. Time traces of the MAP-estimated target position error
t =Xttrue Xt , where Xt = arg maxp(Xt |D1: k , 1: k )for
all Mission 4 runs are shown in Fig. 9. All Dk fusion instances
are also shown to illustrate the typical frequency of voluntaryhuman messaging and its influence on each targets posterior
estimate.
While the Xt estimates that are derived from the target GMswith Dk fusion are less precise than, say, estimates derived from
conventional lidar data, Fig. 9, nevertheless, shows that fusion
of soft human information in Dkhelps substantially improve the
robots estimated beliefs over time, i.e., even if a target spotted
by the human is far from the robots own sensor range. Indeed,
with Dk fused, t
-
8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration
15/18
AHMEDet al.: BAYESIAN MULTICATEGORICAL SOFT DATA FUSION FOR HUMANROBOT COLLABORATION 203
E. Results: Diversity of Soft Human Sensor Inputs
The volume and variety of human messages generally in-
creased under either Human Only or bad prior conditions
(detailed results are omitted here due to limited space). There
were also many more positive messages (1376) than negative
messages (296) over all search scenarios, due to the fact that
the contributions of positive/Something is. . . messages weredownweighted via the PDA correction in (48), which favors
the prior GM (i.e., the nonassociation hypothesis) when = 1.Hence, the human often had to resend the same positive Dkmessage two or three times to convince the robot that some-
thing was in fact somewhere. In contrast, negative/Nothing
is. . . messages were much less frequent, due in part to the fact
that they are not downweighted by PDA. This dilution effect on
positive information could potentially be avoided through the
use of alternatives to PDA data association, e.g., multiple hy-
pothesis tracking. However, an evaluation of such alternatives
is outside the scope of this study.
F. Further Insights: Complementary Team Behavior
As noted in [2], a simple greedy search strategy generally
leads to inefficient back and forth search paths over the map
as a direct consequence of the scattering effect from k up-
dates [see Fig. 6(j)]. As such, in Robot Only scenarios, the
robot frequently jumped from one part of the search map to
another without searching thoroughly around its goal points,
leading to slow information gain. Since (47) also diminished
around missed Xttrue following missed detections, the robot
could not remedy missed target detections until after greedily
searching the rest of the map. As Fig. 10 illustrates, in Hu-man With Robot scenarios, the human operator could quickly
correct missed detections by sending relevant soft information
that forced the robot to greedily re-examine areas around actual
target locations.
While Human Only fusion led to more target detections
than did Robot Only fusion, one or two targets remained un-
detected in certain missions, and completion times were not im-
proved consistently (especially with bad priors). This is largely
attributable to the coarse nature of the softDk codebook, since
(47) could not be precisely updated to allow the robot to nudge
closer toward the target if it was just outside of detection range.
As Fig. 11 illustrates, the human spent considerable time in Hu-
man Onlymissions sending manyextra messages to convincethe robot into obtaining a better viewing position to detect some
targets that were right in front of it and just outside of detec-
tion range (especially those that were not close to landmarks,
such as Target 1 in Mission 4). The resulting high volume of
Dk messages (especially in the bad prior missions) is also evi-
dent in Fig. 9; these scenarios led to human frustration in some
cases. However, in Human With Robot cases, scattering via
k = No Detection observations helped shift (47) closer toany targets just outside of detection range, thereby automati-
cally refining the target GM pdfs following Dk fusion. This
also led to smoother interaction between the human and the
robot, as indicated by the significantly improved mission times
Fig. 10. Human With Robot fusion sequence showing human correction ofmissed target detection viaDk updates. Sequence length is under 1 min.
Fig. 11. Human Only fusion sequence showing effects of limited codebookprecision without
kupdates. Sequence length is almost 4 min.
and lower message volume/frequency compared with Human
Only missions.
These results for the two different fusion conditions indicate
that the simple codebook used here to generate Dk messages
produces useful but ultimately limited information to localize
the targets. To enable reliable target localization without fu-
sion of more precise k data, the codebook could be refined
to include more diverse or contextually precise Dk preposi-
tion/reference primitives. Given that the only reference pointsal-
lowed in Dkare discrete landmark/wall locations and the robots
current location, it is not surprising that the set of softmax/MMS
-
8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration
16/18
204 IEEE TRANSACTIONS ON ROBOTICS, VOL. 29, NO. 1, FEBRUARY 2013
Fig. 12. Logarithm of KLDs for each time step for final/remaining targetposterior pdfs under each fusion condition in Mission 4 under uniform (leftcolumn) and bad priors (right column). (Standard deviations over ten MonteCarlo trials omitted fromDk fusion cases for clarity).
likelihoods induced by the three range-only and two bearing-
only preposition classes can be too imprecise for awkwardly
located targets (e.g., in the corner of the search map away from
any walls/landmarks).
G. Results: Accuracy of Approximate Gaussian Mixture
Posteriors
To assess the accuracy of online fusion ofk data via LWIS
and of Dk data via VBIS, the KLD of each GM posterior
p(Xt |1: k , D1: k ) obtained at every time step k was computedoffline for all search missions with respect to recursive grid-
based ground-truth fusion posteriors at 0.1 m 0.1 m gridresolution. To further assess the contribution of VBIS for Dkfusion, KLDs were also computed offline for a separate set of
GM posteriors obtained by using LWIS GM fusion to fuseboth
1: k and D1: k , with 1000 samples per component update (to
match the total number of samples used for VBIS Dk updates).The KLDs for both the online LWIS-VBIS and offline LWIS-
only GM approximations were evaluated over ten independent
Monte Carlo runs to account for random sampling effects. The
effects of the robots closed-loop greedy planner were removed
from the offline fusion results by using the same recorded robot
trajectories,k andDk data, and mixture management methods
(PDA, Salmonds method) as for online LWIS k and VBIS Dkfusion.
Although details are not shown here, only small baseline KLD
losses arose during Robot Only fusion under uniform and bad
priors (i.e., due to LWIS fusion ofk alone along with artifacts
of GM compression), showing that GMs can provide reason-
able approximations to the exact target location posteriors over
the course of a full 15-min search mission. The log KLDs in
Fig. 12(a)(c) and (b)(d) show that the online VBIS and offline
LWISDk updates generally offer comparable accuracy along-
side LWIS k fusion. Note that the LWIS Dk fusion results here
greatly benefit from the JPDA-based positive/Something is. . .
updates in (48), since a substantial portion of the prior appears
in the GM posterior (i.e., 1 () [0.5, 0.8]when |Tk |> 1).The KLDs typically spike with Dk updates; the largest upward
spikes tend to appear after about 10 min (600 s) due to increased
sensitivity to accumulated information losses from baselinekLWIS fusion. For the Human Only cases in particular, the tails
and several small components of the true posteriors after many
Dk messages become very difficult to approximate with only 15
component GMs. The KLDs for both methods are noticeably
smaller for Human With Robot fusion, as k helps reduce
the number ofDk needed to modify the pdfs and thus limits
the overall complexity of the true posteriors. Nevertheless, such
spikes are often less than 1.5 log nats for VBIS Dk fusion
before 600 s; larger KLD spikes typically occur after this time,but are still often less severe than those for LWIS. Indeed, the
VBIS KLDs with bad priors are either statistically compara-
ble with or significantly smaller than the corresponding LWIS
KLDs.12 Fig. 12(d) shows one such discrepancy in accuracy at
aboutk = 100, where a major GM posterior mode is missed byLWIS but not by VBIS.
H. Computation and Implementation Considerations
Although more reliable, the use of EM makes VBIS more ex-
pensive to implement than LWIS. VBIS required approximately
7 ms on average per GM component update in these experiments
using managed C# code, while LWIS required approximately2 ms.13 To overcome the fact that VBEM can converge slowly
if initialized far from the final solution, several code optimiza-
tion strategies (not implemented here) could be used, such as
parallelization of Algorithm 3, clustering of VBEM initializa-
tions across similar GM components, and use of unmanaged
pointer arithmetic. Such optimizations were not required for the
present application, as VBIS did not lead to appreciable delays
for online operation.
An important advantage of GM posterior approximations is
their compactness compared with the offline-computed ground-
truth discrete grids. A 15-component GM for one target at a
single time step requires 720 bytes (double precision), whilethe grid requires approximately 52 times as much memory at
44 064 bytes. Hence, for a full 900-s search mission, the targets
full posterior time history recorded at 1 Hz requires 0.65 MB
with a GM, versus 39.66 MB with a grid. This discrepancy is
even larger ifXk is augmented to include additional states (e.g.,
vertical displacement and velocities). Such storage costs are
highly relevant for applications in which pdfs over multiple time
steps must be stored and/or communicated, e.g., decentralized
data fusion sensor networks [17]. Note that the development of
12Determined using KruskallWallis tests with p= 0.01 on the time-averaged log KLD values.
13
These times did not increase significantly for = 1.
-
8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration
17/18
AHMEDet al.: BAYESIAN MULTICATEGORICAL SOFT DATA FUSION FOR HUMANROBOT COLLABORATION 205
sophisticated, yet computationally affordable, online GM com-
pression methods to avert excessive posterior information loss
in realistic fusion scenarios (e.g., with hundreds or thousands of
mixands) is still an active area of estimation research.
Finally, it is worth considering whether a standard particle
filtering approach is adequate for humanrobot fusion in place
of the GM methods that are discussed here. Additional offline
fusion performance analyses for the multitarget search trials
were performed with the common BPF [24] using different
sample sizes (50010 000 particles) and resampling schemes.
Unlike the GM filtering approaches that are considered here,
the BPF approximates (46)(48) with weighted samples (drawn
initially from the prescribed GM priors at k = 0) and performsall Bayesian updates via likelihood weighted IS. Although full
details are omitted here due to limited space, the BPFs per-
formance (in terms of robustness, consistency, and estimation
accuracy) was generally found to be worse across all sample
sizes compared with the performance of the GM filters. For in-
stance, the BPFs final value for3 is always about 4 m for the
Human with Robot Mission 4 trial under the benign Uni-form prior, whereas the VBIS+LWIS GM filters final valueof3 is always about 0.4 m. This behavior can be traced to par-
ticle degeneracies that arise in the BPF via likelihood weighted
IS and the BPFs inability to explore new Xk values outside
its initial sample set. These issues are neatly addressed by the
proposed GM filter, which also provides a more compact and
completely continuous approximation of the fusion posterior.
VI. CONCLUSION
This paper derived a computationally efficient and accurate
approximation to the recursive hybrid Bayesian inference prob-lem involved in the dynamic fusion of soft categorical human
observations with conventional hard robot sensor data. The pro-
posed VBIS fusion method combines the strengths of fast stand-
alone variational Bayes and Monte Carlo IS inference approxi-
mations to obtain consistent Gaussian posteriors in the baseline
case of Gaussian state priors with softmax likelihood functions.
VBIS was then extended to derive GM posterior approximations
for GM priors with MMS likelihood models in order to handle
more general recursive hybrid data fusion problems. Experi-
mental multitarget search results for a real humanrobot team
showed that soft categorical observations from human sen-
sors, although subject to limited precision and potential data
association ambiguities, can still be highly useful and informa-tive for recursive Bayesian estimation problems that feature a
high degree of uncertainty or inconsistency. The results also
provide valuable practical insight into the reliability of the pro-
posed VBIS GM approximations under a variety of fusion condi-
tions, vis-a-vis LWIS GM and grid-based ground-truth approxi-
mations. Soft categorical human sensor observations can be ex-
ploited in many different dynamic data fusion domains and are
particularly convenient in situations where humans must share
information quickly but do not have enough time to precisely
estimate states of interest (e.g., the precise distance and bearing
to a target in meters and degrees, respectively). Although the
important issues of estimating error/false alarm and likelihood
model uncertainties for human sensors are not addressed here in
detail due to limited space, the proposed data fusion framework
can incorporate these in a fully Bayesian manner [10], [32].
REFERENCES
[1] P. Bladon, P. Day, T. Hughes, and P. Stanley, High-level fusion using
Bayesian networks: Applications in command and control, inProc. Inf.
Fusion Command Support, 2004, pp. 4.44.18.[2] F. Bourgault, Decentralized control in a Bayesian world, Ph.D. dis-
sertation, Sch. Aerosp., Mech. Mechatronic Eng., Univ. Sydney, N.S.W.,Australia, 2005.
[3] T. Fong and I. Nourbakhsh, Interaction challenges in humanrobot spaceexploration, ACM Interact., vol. 12, no. 2, pp. 4245, 2005.
[4] A. Bauer, K. Klasing, G. Lidoris, Q. Muhlhbauer, F. Rohrmuller, S.Sosnowski, T. Xu, K. Kuhnlenz, D. Wollherr, and M. Buss, The au-tonomous cityexplorer: Towardsnaturalhumanrobotinteraction in urbanenvironments, Int. J. Soc. Robot., vol. 1, no. 2, pp. 127140, 2009.
[5] T. Nakamura,T. Nagai,and N. Iwahashi,Bagof multimodal LDA modelsfor concept formation, in Proc. IEEEInt. Conf. Robot. Autom., May2011,pp. 62336238.
[6] E. Topp andH. Christensen, Topological modellingfor human augmentedmapping, in Proc. Int. Conf. Intell. Robots Syst., Beijing, China, 2006,pp. 22572263.
[7] B. Khaleghi, A. Khamis, and F. Karray, Random finite set theoreticbased soft/hard data fusion with application for target tracking, in Proc.Conf. Multisensor Fusion Integ. Intell. Syst., Salt Lake City, UT, 2010,pp. 5055.
[8] D. Hall and J. Jordan,Human-Centered Information Fusion. Boston,MA: Artech House, 2010.
[9] M. Michalowski, S. Sabanovic, C. DiSalvo, D. Busquets,L. Hiatt, N. Mel-chior, and R. Simmons, Socially distributed perception: Grace plays so-cial tag at AAAI 2005, Autonom. Robots, vol. 22, pp. 385397, 2007.
[10] T. Kaupp, Probabilistic humanrobot information fusion Ph.D. disser-tation, Sch. Aerosp., Mech. Mechatronic Eng., Univ. Sydney, N.S.W.,Australia, 2008.
[11] M. Lewis, H. Wang, P. Velgapudi,P.Scerri, andK. Sycara, Usinghumansas sensors in robotic search, in Proc. 12th Int. Conf. Inf. Fusion, Seattle,WA, 2009, pp. 12491256.
[12] F. Bourgault, A. Chokshi, J. Wang, D. Shah, J. Schoenberg, R. Iyer,F. Cedano, and M. Campbell, Scalable Bayesian humanrobot coopera-tion in mobile sensor networks, in Proc. Int. Conf. Intell. Robots Syst. ,2008, pp. 23422349.
[13] S. Thrun, W. Burgard, and D. Fox,Probabilistic Robotics. Cambridge,MA: MIT Press, 2001.
[14] Y. Bar-Shalom, X. Li, and T. Kirubarajan,Estimation with Applicationsto Navigation and Tracking. New York: Wiley, 2001.
[15] T. Kaupp, A. Makaerenko, F. Ramos, B. Upcroft, S. Williams, andH. Durrant-Whyte, Adaptive human sensor model in sensor networks,inProc. 8th Int. Conf. Inf. Fusion, 2005, vol. 1, pp. 748755.
[16] T. Kaupp, A. Makaerenko, S. Kumar, B. Upcroft, and S. Williams, Oper-ators as information sources in sensor networks, in Proc. IEEE/RSJ Int.Conf. Intell. Robots and Syst., 2005, pp. 936941.
[17] T. Kaupp, B. Douillard, F. Ramos,A. Makarenko, andB. Upcroft, Sharedenvironment representation for a humanrobot team performing informa-tion fusion, J. Field Robot., vol. 24, no. 11, pp. 911942, 2007.
[18] M. Skubic, D. Perzanowski, S. Blisard, A. Schultz, W. Adams, M. Buga-
jska, and D. Brock, Spatial language for humanrobot dialogs, IEEETrans. Syst., Man, Cybern. C, Appl. Rev., vol. 34, no. 2, pp. 154167, May2004.
[19] A. Huang, S. Tellex,