2013 bayesian multicategorical soft data fusion for human–robot collaboration

8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration

1/18

IEEE TRANSACTIONS ON ROBOTICS, VOL. 29, NO. 1, FEBRUARY 2013 189

Bayesian Multicategorical Soft Data Fusionfor HumanRobot Collaboration

Nisar R. Ahmed, Member, IEEE, Eric M. Sample, and Mark Campbell, Member, IEEE

AbstractThis paper considers Bayesian data fusion of conven-tional robot sensor information with ambiguous human-generatedcategorical information about continuous world states of inter-est. First, it is shown that such soft information can be gener-ally modeled via hybrid continuous-to-discrete likelihoods thatare based on the softmax function. A new hybrid fusion proce-dure, called variational Bayesian importance sampling (VBIS), isthen introduced to combine the strengths of variational Bayes ap-proximations and fast Monte Carlo methods to produce reliableposterior estimates for Gaussian priors and softmax likelihoods.VBIS is then extended to more general fusion problems that in-volve complex Gaussian mixture (GM) priorsand multimodal soft-max likelihoods, leading to accurate GM approximations of highly

non-Gaussian fusion posteriors for a wide range of robot sensordata and soft human data. Experiments for hardware-based mul-titarget search missions with a cooperative human-autonomousrobot team show that humans can serve as highly informative sen-sors through proper data modeling and fusion, and that VBISprovides reliable and scalable Bayesian fusion estimates via GMs.

Index TermsBayesian methods, Gaussian mixtures, human-robot interaction, machine learning, Monte Carlo methods, recur-sive state estimation, robot sensor fusion, variational Bayes.

I. INTRODUCTION

IN order to behave intelligently in complex environments,

autonomous robots must continuously update their under-standing of the world by combining new data from various

sources. Despite considerable recent advances in autonomous

robot control and perception, human inputs are still required in

many practical settings to overcome various actuation/sensing

limitations and ensure robustness in the presence of uncertain-

ties. As such,data fusionplays an important role in the applica-

tion of collaborative humanrobot teams to diverse areas such

as defense and security [1], search and rescue [2], space ex-

ploration [3], and social robotics [4]. However, looking beyond

the ability to provide supervisory validation or training data for

static abstract phenomena (e.g., categories for object types [5]

or places [6]), the potential richness of human sensor data is

often overlooked for robotics applications.

Manuscript received December 18, 2011; revised May 28, 2012; acceptedAugust 18, 2012. Date of publication September 12, 2012; date of currentversion February 1, 2013. This paper was recommended for publication by As-sociate Editor C. Stachniss and Editor D. Fox upon evaluation of the reviewerscomments. This work was supportedin part by theNationalScience FoundationGraduate Research Fellowship Program and in part by AFOSR MURI FA9550-08-1-0356.

The authors are with the Autonomous Systems Laboratory, Cornell Univer-sity, Ithaca, NY 14850 USA (e-mail: [email protected]; [email protected];[email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TRO.2012.2214556

The problem considered here is the dynamic fusion of con-

ventional robot sensor data (i.e., hard data) with ambiguous

human-generated observations (i.e., soft data) related to un-

certain continuous/physical world states of interest, e.g., object

location, velocity, mass, temperature, etc. This study is moti-

vated by the fact that maintaining full observability over all

states of interest through robotic sensors alone can be challeng-

ing in many applications. For instance, as discussed in [7], a

robot which is equipped with a 2-D horizontal scanning lidar

can track the position and velocity of moving people, but will not

have direct access to height, weight, or goal location informa-

tion that could be used to improve target motion models. Moreimportantly, all target states become unobservable if targets are

occluded, confused with false alarms, or beyond sensor range

for a long time. By acting as an externally available sensor in

such cases, a helpful human agent can furnish the robot with

relevant data that substantially reduce uncertainty or inconsis-

tencies in desired state estimates, e.g., due to poor observability

or previous fusion of faulty information. However, unlike hard

data, soft data are difficult to model from first principles and

are not guaranteed to be provided in a consistent manner, since

they are highly context-specific and subject to uncertainties via

psychocognitive factors (e.g., expertise, stress, fatigue, memory,

and perception bias) [8].Given these considerations, soft data fusion hasbeen explored

in thecontext of several robotics applications, such as navigation

by social interaction [4], [9], cooperative tracking and surveil-

lance [10], and search and rescue [2], [11], [12]. However,

formal modeling and fusion of soft data via the standard

Bayesian state estimation paradigm [13], [14] have been con-

sidered in only a few relatively recent studies. Kaupp et al.de-

veloped a Bayesian method to fuse continuous soft range-with-

bearing data to tracked objects by modeling human sensors via

linear-Gaussian regression models, which were then incorpo-

rated into decentralized Kalman filters [15], [16]. The authors

of [17] extended this work to include probabilistic models of

human visual sensing, which were used to improve data asso-

ciation and object classification accuracy in joint humanrobot

tracking tasks. Bourgaultet al.considered grid-based Bayesian

fusion of binary human visual target detection likelihoods for

a distributed 2-D search problem [12]. Importantly, however,

these existing Bayesian approaches are inadequate to fuse in-

formation related through coarse/fuzzy terminology, which is

a predominant feature of soft data [8]. Some examples include

the following.

1) The car is moving quickly around the block; a bike is

close behind it.

2) Nothing is behind the building, on top of the roof, or near

the truck to the left of me.

1552-3098/$31.00 2012 IEEE


2/18

190 IEEE TRANSACTIONS ON ROBOTICS, VOL. 29, NO. 1, FEBRUARY 2013

3) The sidewalk is very steep; the nearby obstacle is much

lighter than the robot.

The main issue at hand here is how such data can be sta-

tistically modeled and fused with hard robot data in a rigor-

ous Bayesian manner. Structured codebook modeling strate-

gies have already been successfully used in the development of

human-assisted motion planners, which use probabilistic mod-

els of symbolic/linguistic motion primitives to infer constraints

on robot paths (e.g., go around the table and between the

chairs) [4], [18], [19]. However, the models that are devel-

oped for these planners are geared toward characterizing human

motion commands and are thus unsuitable to extract dynamic

state information from purely observational soft inputs. A mixed

fuzzy-Bayesian modeling approach for soft-hard fusion was

proposed by Mahler using random finite sets [20], in which soft

linguistic observation codebook likelihoods are modeled via

fuzzy set interpretations of virtual linear state measurements

(this method was also adopted in [7]). However, such likelihood

models cannot describe ambiguous reports with highly non-

Gaussian uncertainties (e.g., range-only reports such as the caris not too far from me).

Alternatively, it is proposed here to model such soft obser-

vations via multicategorical random variables that are condi-

tionally dependent on the states of interest. As such, terms

like nearby and left of imply uncertain discrete classifi-

cations of the continuous state by a human observer. Though

less precise than typical continuous hard data (e.g., lidar, sonar,

etc.), binary categorical data in the form of negative mea-

surements have already proved quite useful for Bayesian state

estimation in robotic mapping [13], localization [21], and ob-

ject search/tracking [2], [22], [23]. However, dynamic estima-

tion of continuous states from discrete multicategorical datarequires approximation of an analytically intractable hybrid

Bayesian inference problem. Various solutions exist in the esti-

mation [24][26] and machine learning literature [27], [28], but

these all have drawbacks that severely limit their suitability for

online dynamic data fusion.

This paper develops a novel recursive Bayesian fusion frame-

work to efficiently combine hard robot data with soft multi-

categorical observations of dynamic continuous states. Three

contributions are made in this regard. First, it is shown here

that soft multicategorical observations can be generally mod-

eled as discrete random variables via flexible hybrid likelihood

functions that are based on softmax distributions, which are eas-

ily learnable from training data and have convenient propertiesfor online state estimation. Second, a new variational Bayesian

importance sampling (VBIS) algorithm is developed for reli-

able fusion of soft multicategorical data. The VBIS algorithm

overcomes key limitations of other existing hybrid Bayesian

inference algorithms and leads to the rigorous development

of compact Gaussian mixture (GM) posterior approximations

for general hard-soft fusion applications. Finally, the proposed

fusion framework is demonstrated through online multitarget

search experiments that involve a cooperative humanrobot

team operating under various sensing modalities and prior in-

formation conditions. The experimental results show that the

proposed human sensor likelihood modeling approach, VBIS

algorithm, and GM-based recursive fusion framework enable

humancollaborators to serve as effective informationsources for

robotic state estimation tasks. This paper builds on preliminary

work in [29] and [30] by providing a more thorough explanation

and experimental evaluation of the proposed humanrobot data

fusion framework.

II. HUMANROBOTDATAFUSION ANDSOFTCATEGORICAL

DATAMODELING

A. General Problem Statement

The Bayesian data fusion approach proposed here models

soft (i.e., human-generated) descriptions of continuous states

with discrete random variables that represent contextually dis-

tinct sets of state categorizations. These discrete random vari-

able dependences on the state are modeled directly via flexible

continuous-to-discrete hybrid likelihood functions, thus en-

abling recursive Bayesian estimation of the unknown continu-

ous states from multicategorical soft data.

For discrete time index k Z0+ , let Xk Rn be the con-tinuous random state vector of interest with prior probability

density function (pdf) p(X0 ) and transition pdf p(Xk |Xk 1 )arising from known stochastic dynamics. Let k be a vector of

hard robot sensor data, which may contain a mixture of con-

tinuous data (e.g., lidar returns) and discrete data (e.g., detec-

tion/no detection outputs from a vision-based object detector)

with joint conditional observation likelihoodp(k |Xk ). LetDkbe anm-valued discrete random variable that represents a cate-

gorical human observation, where Dk has a conditional likeli-

hood function P(Dk =j |Xk ) forj {1, . . . , m} and m Z+ .Them possible realizations ofDk are assumed to be mutually

exclusive and exhaustive so thatm

j =1 P(Dk =j |Xk ) = 1.The sequences of all k and Dk until time k are denoted as1: k {1 , . . . , k } andD1: k {D1 , . . . , Dk }, respectively.

This paperadopts a recursiveBayesianprocess to sequentially

fuse 1: k and D1: k information at each time step kto update the

pdf forXk . Given1: k1 andD1: k1 , the dynamics prediction

step propagates the most recent pdf ofXk 1 forward in time via

the ChapmanKolmogorov equation [14]

p(Xk |1: k 1 , D1: k 1 )

=

p(Xk |Xk1 )p(Xk1 |1: k1 , D1: k1 )dXk1 . (1)

The robot measurement update step fuses the result of (1) withrobot-generated information ink via Bayes rule

p(Xk |1: k , D1: k 1 )

= p(k |Xk )p(Xk |1: k1 , D1: k1 )p(k |Xk )p(Xk |1: k1 , D1: k1 )dXk

. (2)

Finally, the human measurement update step fuses (2) with

human-generated information inDk via Bayes rule

p(Xk |1: k , D1: k )

= P(Dk |Xk )p(Xk |1: k , D1: k 1 )P(Dk |Xk )p(Xk |1: k , D1: k 1 )dXk

. (3)


3/18

AHMEDet al.: BAYESIAN MULTICATEGORICAL SOFT DATA FUSION FOR HUMANROBOT COLLABORATION 191

The main problem then is to determine the posterior pdf

p(Xk |1: k , D1: k ) (i.e., the filtering density), which representsthe uncertainty in Xk given all information up to time k.

1

It is assumed without loss of generality that the pdfs in (1)

and (2) can be, respectively, estimated by the prediction and

measurement update steps of conventional filters, such as the

(extended/unscented) Kalman filter [14], particle filter [24], or

Gaussian sum filter [31].

This paper focuses primarily on the measurement update

which is defined by (3); conditioning on Dk =j, 1: k andD1: k 1 is hereafter suppressed so that

p(Xk ) p(Xk |1: k , D1: k1 ) (4)

and p(Xk |Dk ) p(Xk |1: k , D1: k1 , Dk =j ) (5)

are the Bayesian prior and posterior in (3), respectively. Substi-

tuting these expressions into (3) gives

p(Xk |Dk ) = P(Dk |Xk )p(Xk )

P(Dk |Xk )p(Xk )dX =

p(Xk , Dk )

P(Dk ) (6)

where p(Xk , Dk ) is the joint pdf, and P(Dk ) is the marginalobservation likelihood.

For any given continuous state Xk , the possible realizations

forDk can be quite large and must be suitably tailored for each

practical application. Hence, just as raw lidar data or camera

images must be processed to generate meaningful k data, soft

observations are assumed to be processed by an application-

dependent interpreter to generate contextually recognizable Dkdata. As in most humanrobot interaction applications, such an

interpreter could be based on a predefined communication pro-

tocol that relies on a dictionary of known descriptor models and

contextual reference values, to ensure consistent communica-

tion [4], [19]. It is assumed for simplicity that the mpossiblevalues ofDkrepresent all desired human categorizations ofXk .

However,Dk can also be a vector whose elements are discrete

random variables that represent different types of categories

over arbitrary subsets ofXk (e.g., separate range-only bins and

bearing-only bins), in which case (3) is performed sequentially

for each element ofDk .2

Since Xk is continuous and Dk discrete, (6) defines a hy-

brid Bayesian inferenceproblem [33], for which two key issues

must be addressed3: 1) How to specify an appropriate human

sensor likelihood model P(Dk |Xk ), and 2) how to subsequentlyevaluate (6) for any givenp(Xk )?

B. Basic and Extended Softmax Models for Human Sensors

For eachj {1, . . . , m} , P(Dk =j |Xk ) must map Xk =xto the interval [0, 1] such that

mj =1 P(Dk =j |Xk =x) = 1.

1Ifk = orDk =, then (2) or (3) is skipped, accordingly.2The vector model also allows binary categories to be defined, as in nearby

versus not nearby and next to versus not next to. This offers an alternative tolumping nearby and next to into exclusive realizations of the same randomvariable so that different likelihoods for similar labels are obtained as a functionofXk . However, the interpreter must then ensure that contradictory realizationswithin Dk (i.e., where elements have joint likelihood of zero) are either avoidedor handled via Bayesian conflict resolution [32].

3As shown in Section V, the techniques that are developed here for D k can

be applied to categorical k data as well.

While many functions satisfy this criterion, this study exclu-

sively considers likelihoods that are defined via the softmax

function

P(Dk =j |Xk ) = ew

Tj

x+bj

mh= 1e

w Th

x +bh(7)

where wj , wh Rn

and bj , bh R1

are, respectively, vectorweights and scalar biases for classes j, h {1, . . . , m}. Thesoftmax function (also known as the multinomial logistic func-

tion) is widely used in statistical pattern recognition [34] and

is naturally well suited to modeling hybrid continuous-to-

discrete mappings in complex stochastic systems with state-

dependent switching behavior [33], [35]. An interesting feature

of (7) is that the log-odds ratio between any categoriesj andc

for a givenXk =x yields a linear hyperplane

logP(Dk =j |Xk )

P(Dk =c|Xk ) = (wj wc )

T x + (bj bc ) (8)

which implies that the probabilistic boundaries between cat-

egories for a given likelihood ratio are also linear and com-pletely specified by the parameter sets W ={w1 , . . . , wm } andB = {b1 , . . . , bm }. Note that the elements of W control thesteepness of the probability surface between categories and the

locations of the class boundaries, while the elements ofB enable

shifts from the origin. The authors of [35] prove that boundaries

defined via (8) always lead to a complete convex decomposition

ofRn so thatXk can always be fully partitioned among the m

classes ofDk .

Fig. 1(a) shows one possible softmax likelihood model for

a human providing one of 16 soft location labels (in terms of

categorical ranges and bearings) to indicate the relative 2-D

positionXk = [X, Y]T

of an object relative to some arbitraryorigin. This example shows how the model in (8) represents cat-

egorical ambiguities as a function ofXk ; softer weights lead to

fuzzier probability contours between class labels (in range di-

rections, for this example), while steeper weights lead to nearly

deterministic probabilities over geometrically convex regions

defining classes (across bearing directions). W and B can be

learned from labeled training data using convex optimization

procedures that are based on maximum likelihood or maximum

a posterioriestimation [34].

Equation (7) can be generalized by introducing hidden vari-

ables to induce nonconvex/multimodal categorical partitions

of Xk . One such generalization is the multimodal softmax

(MMS) model [36], which represents each observable class

j {1, . . . , m} as a collection of sj hidden subclasses de-pendent on Xk that are mutually exclusive and exhaustive,

where sj 1, andm

j =1 sj =S is the total number of sub-classes. Let R represent the hidden subclass variable, which can

take valuesr {1, . . . , S }4, and defineDk to be conditionallyindependent ofXk given R so that P(Dk =j, R= r|Xk ) =P(Dk =j |R= r)P(R= r|Xk ). Furthermore, define (j) tobe the set of all sj subclasses of classj, where (j)

(c) =

for j=c. If P(Dk =j |R= r) =I(r (j)) (the indicator

4Assume without loss of generality that the subclasses are indexed sequen-

tially in class order.


4/18


Fig. 1. (a) Probability surfaces for example softmax likelihood model, where class labels take on a discrete range in{Next To,Nearby,Far From}and/ora canonical bearing {N, NE, E, SE,. . . ,NW}. (b) Probability surfaces for example MMS range-only model, where labels with similar range categoriesfrom (a) are treated as subclasses that define one geometrically convex class (Next To with s1 = 1) and two nonconvex ones (Nearby with s2 = 6and FarFrom withs3 = 8). (c) and (d) Calibration training data and learned MMS range-only probabilities forDk =Nearby for two different human subjects.

function) andP(R= r|Xk ) is defined via the softmax model,then marginalization ofR fromP(Dk , R|Xk )gives

P(Dk |Xk ) =S

r =1

P(Dk |r)P(r|Xk ) =

r (j )e

w Tr x+ brSc= 1e

w Tc x+bc.

(9)

Hence, the MMS likelihood forDk =j givenXk is the sum ofall sj subclass softmax likelihoods that are associated with class

j. Given an appropriate subclass configuration [s1 , . . . , sm ], (9)can model an arbitrary continuous-to-discrete likelihood func-

tion using an embedded softmax model to produce piecewise

linear class boundaries. Fig. 1(b) shows a simple example of

an MMS model that is derived from the basic softmax model

in Fig. 1(a). In this example, the MMS subclass weights are di-

rectly obtained from the model in Fig. 1(a), as any basic softmax

model can be trivially converted to an MMS model. However,

it is also generally possible to estimate MMS model parame-

ters directly from training data using maximum likelihood or

Bayesian learning techniques, when a basic softmax model is

unavailable [36]. Fig. 1(c) and (d) shows estimated MMS range-

only models for two different human sensors using maximum

likelihood learning with actual data. Thelabeled (Xk , Dk ) train-ing data points shown in these plots were acquired through an

experimental calibration procedure that requires human sub-

jects to provideDk observations under controlled conditions,

where Xk is known exactly. This principled statistical proce-

dure is very similar to the one described by Kaupp [10] tomodel continuous range-with-bearing human observations, ex-

cept that discrete multicategorical data are recorded instead of

continuous data, and nonlinear optimization techniques are used

for offline softmax/MMS model identification instead of linear

regression.

C. Hybrid Bayesian Inference for Soft Data Fusion

Although softmax-based functions are well suited to model-

ing P(Dk |Xk ), they unfortunately do not lead to closed-form

posteriors p(Xk |Dk ) for any choice of p(Xk ). For instance,

substituting (7) into (6) for anyp(Xk )yields

p(Xk |Dk ) = 1

C p(Xk )

exp

wTj x + bj

mh =1

exp wTh

x + bh (10)where C=

p(Xk )exp

wTj x + bj

m

h =1 exp

wThx + bhdX. (11)

Equation (10) cannot be represented in closed form since the in-

tegral for the normalization constant Chas no analytical solution

for anyp(Xk ). Furthermore, even whenp(Xk ) is a well-behavedpdf such as a uniform or Gaussian pdf, the softmax denomina-

tor in (10) cannot be absorbed along with the numerator and

prior into a known parametric pdf family. Therefore, (6) must

be approximated, as in all hybrid Bayesian inference problems

that involve continuous-to-discrete dependences [33].

Although standard EKF/UKF updates are not applicable,grid-based [2], [13] or Monte Carlo particle approximations

[13], [23], [24] of (6) could be used. Grids naturally support

recursive Bayesian fusion with arbitrary priors and likelihoods,

although they scale poorly with state dimension n, do not pro-

vide a compact posterior representation, and do not mesh easily

with typical filters fork data (e.g., EKFs/UKFs). Particle filter

approximations overcome the latter problem, but do not pro-

vide a compact approximation if many samples are needed. In

principle, particles could be compressed into a single Gaussian

pdf for filtering [25], although this leads to significant informa-

tion loss when (6) is highly non-Gaussian or multimodal. While

particles could also be compressed to flexible GM pdfs via on-

line EM learning [25], [26], this is prone to poor local maximaand high computational expense. Particle approximations also

require special care to ensure accuracy and mitigate undesirable

phenomena such as sample degeneracy. For instance, the per-

formance of the standard bootstrap particle filter (BPF) [24] can

degrade significantly ifn is large or the observation likelihood

is small, e.g., for a surprising observation [33].

Another possible approach to hybrid Bayesian inference

comes from variational Bayes (VB) methods, which attempt

to maximize the similarity between analytically intractable pos-

teriors and well-behaved posterior approximation pdfs that

are defined through freely optimizable parameters [34]. Mur-

phy proposed a local VB lower bound approximation to (6) for


5/18


the special case of Gaussian priors and m= 2 binary logis-tic likelihood functions [27]; this approximation uses the fact

that the posterior pdf is well approximated by a Gaussian pdf

and admits a lower bound to Cthrough a convex lower bound

to the logistic likelihood function proposed in [37]. While this

VB approach leads to a scalable, deterministic, and accurate

Gaussian approximation of the true posterior with guarantees

onC, it is limited to P(Dk |Xk )for the special case ofm = 2.Furthermore, the VB posterior leads to an optimistic posterior

covariance estimate, which is highly undesirable in recursive

state estimation [14]. Bouchard [28] proposes to generalize the

VB method to softmax likelihoods with m 2, but only consid-ers the dual problem to infer (W, B)from(Xk , Dk )trainingdata for simple softmax models and thus does not generalize

Murphys method to approximate (6), present solutions to the

persistent optimistic covariance issue, or consider MMS models

for multimodal posteriors.

These issues are tackled next through new hybrid inference

approximations that not only generalize VB approximations to

m 2 softmax likelihood functions, but also address the op-timistic VB posterior covariance issue via novel application

of fast Monte Carlo importance sampling (IS), generalize to

inference with multimodal posterior distributions (induced by

non-Gaussian priors and MMS likelihoods), and guarantee con-

vergence to unique solutions (i.e., no poor local minima). These

approximations naturally lead to a fusion framework based on

compact GM pdf approximations, whichare especially desirable

for humanrobot fusion applications since they 1) lead to com-

putational costs that scale well with n and number of categories

mand 2) greatly facilitate online storage, communication, and

fusion with hardk data.

III. BASELINEFUSION: GAUSSIAN-SOFTMAXINFERENCE

A. Baseline Variational Bayes Approximation

Assume a Gaussian prior p(Xk ) =N(, ) with mean Rn and covariance matrix Rn n , and let P(Dk |Xk )be given by (7) for m 2. The local VB approximation de-rived here uses the fact that the analytically intractable joint pdf

p(Xk , Dk )in (6) can be well approximated by an unnormalizedGaussian lower bound pdf; this in turn leads to a VB Gaus-

sian posterior approximation p(Xk |Dk ) upon renormalizationthat also guarantees a lower bound to C. This proposed VB

inference approach generalizes the method [27] derived for the

special case ofm = 2.Let f(Dk , Xk ) be an unnormalized Gaussian function that

approximates the softmax likelihoodP(Dk |Xk ). The joint pdfand normalization constant (11) are approximated as

p(Xk , Dk ) p(Xk , Dk ) =p(Xk )f(Dk , Xk ) (12)

C C=

p(Xk , Dk )dXk . (13)

Note that p(Xk , Dk ) is an unnormalized Gaussian, since it isthe product of two Gaussians. This permits Cto be evaluated inclosed form as an approximation to the marginal likelihood of

the discrete observation,C=P(Dk =j ), in (6).

For m 2, f(Dk , Xk )is derived here via the upper bound tothe problematic softmax denominator in (7) proposed in [28],

which uses a variational product of m unnormalized Gaus-

sians. Specifically, for any set of scalars , c and yc for

c {1, . . . , m}, [28] proves that

log m

c= 1e

y c +

mc=1

yc c

2

+ (c )[(yc )2 2c ]+log(1 + e

c ) (14)

where (c ) = 1

2c

1

1 + ec

1

2

, andyc =w

Tc x + bc .

The variables and c are free variational parameters; given

yc , and c are selected to minimize the upper bound in (14),

thus providing the tightest possible upper bounding approxima-

tion to the denominator of (7). Assume for now that and care known (the selection of and c is considered in the next

section). From (7), it follows that

log P(Dk =j |Xk ) =wT

j x + bj log

mc= 1

ewTc x+bc

.

After replacing the second term on the right-hand side with the

bound in (14), subsequent simplification gives

f(Dk =j, Xk ) = exp

gj + h

Tj x

1

2xT Kj x

wheregj = 1

2

bj

c=j

bc

+ m

2 1

+

mc=1

c2

+ (c )[2c (bc )2 ]

log(1 + ec )

hj = 1

2

wj

c=j

wc

+ 2 m

c= 1

(c )( bc )wc

Kj = 2

mc=1

(c )wc wTc (15)

and where f(Dk , Xk ) P(Dk |Xk ) follows from (14). Sincethe prior can also be expressed as

p(Xk ) = exp

gp + h

Tpx

1

2xT Kp x

wheregp =1

2(log |2| + T Kp )

hp =Kp , Kp = 1 (16)

substitution of (16) and (15) into (12) gives the unnormalized

Gaussian joint pdf approximation

p(Xk , Dk ) = exp

gl + h

Tl x

1

2xT Kl x

gl =gp+ gj , hl =hp+ hj , Kl =Kp + Kj . (17)


6/18


Normalization of (17) gives the desired variational Gaussian

posterior pdf approximation form 2

p(Xk |Dk ) =N(VB ,VB ) (18)

where VB =K1l , VB =K

1l hl . (19)

The approximate posterior mean and covariance updates in (19)

for discrete measurements bear close resemblance to the cor-responding continuous measurement updates for the Kalman

information filter [14]. With this resemblance in mind, an ex-

amination ofKjand hjsuggests that the softmax weight vectors

wj determine the average information content about Xk con-

tained in each category j. This is intuitively reasonable. As

shown in Fig. 1, large magnitude weights indicate sharp log-

odds boundaries between classes in (8) (i.e., less ambiguity and

greater separability between discrete classes as a function of

Xk ), which leads to more informative updates for Xk since

p(Xk )is squashed more strongly by P(Dk |Xk )via (6). Notethat VB is also independent of the actual discrete observationD

k =j , just as covariance/information matrix updates for the

Kalman filter are independent of observed continuous measure-

ments.

Variational Parameter Optimization: Analytical minimiza-

tion of the right-hand side of (14) with respect to the free varia-

tional parameters andc gives

2c =y2c +

2 2yc (20)

=

m 2

4

+m

c= 1 (c )ycmc= 1 (c )

. (21)

However, these formulas cannot be used to compute (18) di-

rectly since yc depends on Xk , which is unobserved. Therefore,

following the same strategy as [27] for the m= 2case, the vari-ational parameters are chosen to minimize theexpected valueof(14) with respect to the posterior. This is equivalent to maximiz-

ing theapproximatemarginal log-likelihood of the observation

Dk =j

logC= log

p(Xk , Dk )dXk (22)

wherelog C log C. Equation (22) can be expressed in closedform via standard Gaussian identities, but direct maximization

of (22) with respect to and c involves cumbersome cal-

culation of highly nonlinear gradient and Hessian terms. The

expectationmaximization (EM) algorithm [34] can instead be

invoked to iteratively optimize and c via expected values of(20) and (21), while alternately updating p(Xk |Dk ) via sim-ple closed-form expressions. The EM procedure is given in

Algorithm 1, where the yc terms in (20) and (21) are replaced

by their expected values under the current p(Xk |Dk ) estimateat each E step

yc = wTc + bc (23)

y2c

= wTc

VB + VB

TVB

wc+ 2w

Tc VB bc + b

2c . (24)

Since (20) and (21) are coupled, an extra iterative resubstitution

loop is needed for convergence ofc and (nlc =15 iterations

were sufficient for this papers studies).

It is straightforward to show that p(Xk |Dk )is log-concave,which means that the exact baseline Gaussian-softmax poste-

rior is unimodal. Hence, Algorithm 1 satisfies the necessary

and sufficient condition derived in [38] to guarantee monotonic

convergence to a unique set of variational parameters for the

local VB lower bound Gaussian approximation. Convergencecan be gauged by evaluating the change in (22) after each M

step, where

logC= yj +m

c= 1

1

2( + c yc )

(c )[

y2c

2 yc + 2 2c ] log(1 + e

c )

+

n

2

1

2

log

||

VB

+tr(1VB )

+ ( VB )T 1 ( VB )

(25)

and most of the required terms are already used in the E and M

steps. However, it is often more convenient to monitor conver-

gence ofVB between iterations so that the lower bound (25)can be evaluated at the end, if desired.

B. Improved VB Approximation With Importance Sampling

Fig. 2(a) and (b) shows that (17) is generally a close lower

bound approximation of the true joint pdf; as shown in Fig. 2(c)

and (d), a key benefit of the VB approximation is that VB

closely approximates the true mean of (6) true postupon renor-malization. Loosely speaking, this effect stems from the fact that

Algorithm 1 returns c and values that maximize the softmax

lower bound (15) on average; since cand can be uniquely de-

termined from xkvia (20) and (21), c and

tend tolie near the

posterior average (i.e., mean) ofXk .5 However, Fig. 2(c) and (d)

also shows that since C C, the approximate posterior (18) ob-tained from dividing (17) by Cno longer lower bounds the trueposteriorp(Xk |Dk ). In fact, even ifp(Xk , Dk )andp(Xk , Dk )are quite similar, multiplication of (17) by C1 C1 forces

5In [38], give a more technically precise explanation of this effect is given

for the special case of the binary logistic VB lower bound.


7/18


Fig. 2. 1-D Bayesian update example for standard normal Gaussian prior(green) and binary softmax likelihood (blue), showing true posterior (magenta),VB softmax lower bound (black dash), and approximate joint pdf (red dash) for(a) small (soft) softmax weights and (b) large (steep) softmax weights. Renor-malized posteriors are shown in (c) and (d), respectively, with correspondingC

and C values.

(18) to be more concentrated around its peak than (6), and there-

fore, VB is optimistic relative to the true posterior covariancetrue post.

6 The goodness ofVB can be outweighed by opti-

mism in VB , since this can lead to severe overconfidence andinconsistencies during recursive Bayesian fusion. The bound

C Calso produces a small bias in VB relative totrue post .However, the unimodality ofp(Xk |Dk )and the fact that VBis close to true post can be exploited by another fast estima-

tion procedure to significantly improve VB and VB in (18).Monte Carlo IS [39] is particularly well suited to this end, since

arbitrary moments of (6) can be quickly estimated using an

importance distribution q(Xk ) that roughly corresponds to

(6). Specifically, givenNs samples{xi }Nsi= 1 Xk drawn from

q(Xk ), IS approximates the expectation of an arbitrary functionz(Xk )with respect top(Xk |Dk )as

z(Xk ) Ns

i= 1

i z(xi ), i p(xi )P(Dk |xi )

q(xi ) (26)

where i is the importance weight for sample i, and

the desired estimates correspond to = Xk and =(Xk )(Xk )T

. Note that (26) uses the fact that

p(Xk |Dk ) only needs to be known up to a normalizing con-stant so that the joint pdfp(xi , Dk ) =p(xi )P(Dk |xi ) can beused to compute i (as is standard practice, i are renormalized

to sum up to 1 [39]). Although q(Xk )can in theory be any pdfthat is easy to sample from and ensures proper support coverage

of p(Xk |Dk ) (i.e., p(Xk |Dk )> 0 q(Xk )> 0), IS is onlyreliable whenq(Xk )is sufficiently close top(Xk |Dk ). Since

6

That is,(true post V B )will be positive semidefinite.

the true posterior (6) is unimodal and has a mean close to VB ,it is natural to specify q(Xk ) as a unimodal pdf whose meanis parameterized by VB . The prior covariance can also be

used to constrain the size/shape ofq(Xk ) to ensure adequatecoverage ofp(Xk |Dk ). This is justified since conditioning onDk reduces the uncertainty in the (unimodal) posterior relative

to the (unimodal) prior such that( true post)is expected tobe positive definite (as in the conventional KF) [14].

These considerations lead to the VBIS algorithm, proposed

here to draw upon the strengths of both VB and IS. An outline

of VBIS is shown in Algorithm 2. The VB estimate in Algorithm

1 is first used to define q(Xk ), which is then applied to (26) toestimateVBIS and VBIS for the approximation p(Xk |Dk ) =

N(VBIS ,VBIS ). This work uses

q(Xk ) =N(VB , ) (27)

since this pdf is easy to sample from, permits convenient calcu-

lation ofi , and performs well in practice. Other, more sophis-

ticated unimodal pdfs could serve as q(Xk )on the basis ofV Band (e.g., heavy-tailed Laplace pdfs or mixture model pdfs).However, compared with (27), the benefits of such alternatives

can be outweighed by the cost of sampling xi and evaluating

i , especially if n 2 (e.g., Bessell functions are needed toevaluate a Laplace pdf with covariance).

C. Likelihood Weighted Importance Sampling

Another possible IS strategy to compute andis to bypassVB altogether in Algorithm 2 and simply set q(Xk ) =p(Xk ) sothati P(Dk |xi ). This approach, which is popularly knownaslikelihood weighted importance sampling(LWIS) [33], [40],

also defines the measurement update step of the standard BPF

[24] and works well ifp(Xk )andp(Xk |Dk )are similar. Whilefaster and nominally more computationally convenient than

VBIS, LWIS suffers if P(Dk |Xk ) is highly peaked relativetop(Xk )or ifDk is surprising with respect top(Xk )(i.e., theprior and posterior are not close) [33]. In such cases,i 0 formany samples, leading to inconsistent LWIS estimates. LWIS is

presented here as a common benchmark algorithm to estimate

complex non-Gaussian densities.


8/18


Fig. 3. Synthetic 1-D fusion problem using exact and approximate inference methods. (a) Human observation softmax likelihood curves forP(Dk = j |Xk ).(b)(d) Posterior approximation results for human observations that are progressively more surprising relative to p(Xk )(five sample posterior results shown forLWIS and VBIS Gaussian approximations in each case).

TABLE IRESULTS FOR1-D FUSIONPROBLEM INFIG. 3

D. Numerical 1-D Example

Fig. 3(a) gives a hypothetical 1-D softmax likelihood model

for a soft human observation Dk , with m= 5 categories relatingto Xk , the location of a static target relative to a static robot. The

priorp(Xk )at some fixed time step kis shown in gray for threedifferent scenarios in (b)(d), in which k = and the updaterelies solely on Dk . Fig. 3(b)(d) shows the most likely Dk in

each case relative to the true target location xtrue (black star).

Moving from (b) to (d), the prior becomes less accurate (i.e.,

more surprising/inconsistent) compared with xtrue , e.g., due toan inaccurate/highly uncertain target dynamics model.

Fusion results are shown for exact numerical integration VB,

VBIS with Ns = 200, and LWIS with Ns = 200. The true meanand variance(,

2 )of the exact (non-Gaussian) posterior are

shown in Table I, along with the corresponding estimates and

MATLAB computation times for each approximation over 50

runs. The number of EM iterations for VB and VBIS are also

shown.7 Theeffective sample size (ESS)is provided as a mea-

sure to sample efficiency, and hence closeness ofq(Xk ) top(Xk |Dk ), for VBIS and LWIS [39].

In each case, VB is very close to with a small bias,

while 2

VB < 2

. (VBIS , 2

VBIS ) and (LWIS , 2

LWIS ) are ac-curate in (b), since Dk is unsurprising with respect to p(Xk ).However, LWIS becomes steadily worse in (c) and (d) since

p(Xk ) and Dk disagree more, whereas VBIS always main-tains a good approximation with only 200 samples. The poor

performance of LWIS in (c) and (d) is reflected by its dimin-

ishing ESS and the inconsistent nature ofLWIS and 2LWIS .

LWIS improves in (c) with larger Ns , although this has limited

impact in (d). Setting Ns = 10000 matches the computationtime for VBIS but still yields worse performance (ESS = 200,

7

Using a random initial guess for in (21) and a tolerance of 1e-3 on C .

LWIS =0.70 0.75, 2LWIS = 0.47 0.08) thanVBISwithNs = 200.

IV. GENERALIZEDFUSION: NON-GAUSSIANPRIOR AND

MULTIMODALLIKELIHOODINFERENCE

The assumption thatp(Xk )is Gaussian and thatP(Dk |Xk )is well modeled by a basic softmax likelihood (7) with con-

vexly separable classes can be easily violated in practical

humanrobot fusion scenarios. The prior p(Xk ) can be non-

Gaussian through multimodal intial beliefs, or if Xk evolveswith non-Gaussian/nonlinear dynamics (e.g., unobservable dy-

namic mode changes), or if updates via k involve non-

Gaussian likelihoods [2], [23]. Equation (7) can also be in-

adequate to model P(Dk |Xk ), e.g., soft distance observa-tions are better modeled by nonconvex categorical MMS like-

lihoods (see Section II). Fortunately, VBIS can be extended

for recursive Bayesian fusion in such scenarios using GMpdf

approximations.

In the sequel, assume thatp(Xk ) in(6) isgiven byan M-termGM

p(Xk ) =

Mu =1

P(u) p(Xk |u) =

Mu= 1

cu N(u , u ). (28)

The hidden discrete variable U takes values u {1, . . . , M },u Rn andu Rn n are, respectively, theuth componentmean and covariance, and the component weights cu R0+

satisfyM

u =1 cu = 1. The universal approximation property ofGMswas used in [31] to derive recursive non-GaussianBayesian

state estimators for continuous sensor data via parallel banks of

KFs/EKFs. This idea was later extended to incorporate paral-

lel banks of UKFs [26] and PFs [25]. Due to their beneficial

statistical properties and high flexibility, such GM filtering al-

gorithms have since proven useful for many robotic Bayesian

sensor fusion applications (see [17] and [41]). Thus, any such


9/18


GM filter can be assumed here to approximate (1) and (2) in the

form of (28), whereMis automatically determined by the GM

filter to balance computational speed and estimation accuracy.

Furthermore, it is assumed that the human sensor likelihood

P(Dk |Xk )is given by the MMS model in (9).

A. Variational Bayesian Importance Sampling With GaussianMixture Priors and Multimodal Softmax Likelihoods

An approximation top(Xk |Dk ) is derived by first consideringthe joint pdf givenDk

p(Dk , Xk , U , R) =P(Dk , R|Xk , U)p(Xk , U)

= P(Dk , R|Xk )p(Xk , U) = P(Dk |R)P(R|Xk )p(Xk |U)P(U)

(29)

where the first line follows from Bayes rule, and the second

line follows from the conditional independence properties of the

MMS model (see Section II). Recall from Section II that 1) R

is a hidden subclass variable with values r {1, . . . , S }, whereeach subclass is deterministically mapped to a single class label

j {1, . . . , m} for the observation Dk , and 2) (j)denotes theset ofsj subclasses mapping toj, where P(Dk =j |r) =I(r(j)). From the law of total probability, the posteriorp(Xk |Dk )is

p(Xk |Dk ) =M

u= 1

r (j )

p(Xk |u,r,Dk )P(u, r|Dk ). (30)

Using Bayes rule and the joint pdf (29), the first term in the

summand of (30) can be written as

p(Xk |u,r,Dk ) = P(Dk |r)P(r|Xk )p(Xk |u)P(u)

P(Dk |r)P(r|Xk )p(Xk |u)P(u)dXk.

(31)

Canceling the terms that are independent ofXk gives

p(Xk |u,r,Dk ) = P(r|Xk )p(Xk |u)P(r|Xk )p(Xk |u)dXk

(32)

which is the conditional posterior given Dk =j , mixing com-ponent u, and subclass r (j). Note that the numerator in (32)is the product of a Gaussianp(Xk |u) =N(u , u )and a soft-max likelihood P(r|Xk ), while the denominator is the marginalsubclass rsoftmax observation likelihood under Gaussian com-

ponent u. Therefore, (32) is a unimodal conditional pdf that canbe well approximated by a Gaussian using the VBIS procedure

in Algorithm 2 so that

p(Xk |u,r,Dk ) p(Xk |Dk , u , r) =N(z r ,z r ). (33)

Next, the second summand in (30) P(r, u|Dk )is

P(u, r|Dk ) = P(u,r,Dk )

P(Dk ) (34)

= P(u,r,Dk )

Mu =1

r (j )P(u,r,Dk )

= 1

CP(u,r,Dk )

(35)

where the numerator can be derived from (29) as

P(u,r,Dk =j ) =

p(Xk |u)P(r|Xk )P(Dk =j |r)P(u)dXk

=P(u)

p(Xk |u)P(r|Xk )dXk (36)

where P(u) =cu from (28), and the last line follows fromP(Dk =j |r) = 1 for r (j), by the definition of the MMSmodel. The integral in (36) is also the denominator in (32)

P(r|u) =

p(Xk |u)P(r|Xk )dXk =Cr u . (37)

Substituting these expressions into (36) and then (35) gives

P(u, r|Dk ) = 1

C cu Cur . (38)

Equation (37) is analytically intractable, but can be estimated in

two ways. First, since VBIS (Algorithm 2) is used to estimate

(32), (37) can be directly approximated by a corresponding VB

lower boundCur Cur obtained via(25) in Algorithm 1. In thiscase, the nominal conditioning on Dk =j in (25) is replacedby joint conditioning on U=u and R= r so that individualur ,ur , c,ur , and ur estimates are used in (25) for each

possibleu and r pairing to computelog Cur . Second, (37) canbe estimated via direct sampling as

Ps (r|u) = 1

Nu

Nul= 1

P(r|Xk =xl ) (39)

where {xl }Nul=1 isa setofNusamples drawn directly from the uth

prior componentN(u , u ). The first approach could bias the

posterior approximation if the bound Cur Cur is too loose.However, the variance ofPs (r|u) is inversely proportional toP(r|u) and Nu , meaning that (39) can fall below the lowerbound Cur ifP(r|u) is very small (i.e., P(r|u) 0.01) andNu is too small. Thus, to obtain a reasonable estimate, Cur isused to floor (39) as a consistency check for fixed Nu

P(r|u) max[exp(log Cur ), Ps (r|u)] P(r|u). (40)

Hence, (38) becomes

P(u, r|Dk ) 1

C cu P(r|u) ur (41)

where C=M

u= 1

r (j )

cu P(r|u). (42)

Finally, combining (32) and (42) into (30) yields a GM approx-

imation top(Xk |Dk ) p(Xk |Dk )

p(Xk |Dk ) =M

u= 1

r (j )

ur N(ur ,ur ) (43)

=K

h= 1

h N(h ,h ) (44)

withK= sj MGaussian components.


10/18


B. Likelihood Weighted Importance Sampling and Variational

Bayes Gaussian Mixture Fusion

Algorithm 3 summarizes the generalized VBIS fusion algo-

rithm. Note that if VBIS in step 4 of Algorithm 3 is replaced by

LWIS for component h and P(r|u) = Ps (r|u) is instead usedin step 5, an LWIS-based GM approximation to (30) is obtained.

Likewise, a VB GM approximation is obtained by using only

Algorithm 1 in step 4 (i.e., ignoring the IS correction) and setting

P(r|u) = Cur in step 5 (i.e., ignoring step 3). The next exam-ple shows that the VBIS procedure in Algorithm 3 improves

considerably on both alternatives. Note that the VB, VBIS, and

LWIS baseline Gaussian approximations from Section III forGaussian priors and softmax likelihoods are special cases of the

corresponding GM approximations for GM priors and MMS

likelihoods, withM = 1 andsj = 1 j {1, . . . , m}.1) Numerical 1-D Example:Fig. 4 modifies the previous 1-D

humanrobot fusion example in Fig. 3 so that p(Xk )is now anM= 4 component multimodal GM (gray) and Dk now takesthe form of a coarse range-only observation with m = 3non-convex categories (Next To, Nearby, Far From). Shown

are the results of fusing the (surprising) human observation

Dk =Far From via numerical integration to obtain the exactmultimodal posterior pdf (magenta). Also shown are the full

8-component GM posterior approximations that are obtained

with VB and 100 trials of both VBIS (Algorithm 3, Nu =Ns =500) and LWIS (500 samples).

Due to its brittleness to surprising measurements, LWIS

clearly fails to approximate the minor posterior modes on the

positive Xk axis and struggles to approximate the major poste-

rior modes on the negative Xk axis. The VB GM approximation

(which required 11-23 EM steps per component) shows con-

siderable improvement in approximating all posterior modes,

but it still significantly underestimates all component variances

as well as the largest component weight on the left. In contrast,

VBIS provides a very high-fidelity GM approximation to the ex-

act posterior. Fig. 4 also shows the resulting computation times

(using unoptimized MATLAB code) and KullbackLeibler di-

Fig. 4. Synthetic 1-D fusion problem with GM prior and range-only MMSlikelihood model for P(Dk = j |Xk ) derived by grouping the five softmax

classes in Fig. 3(a) into three MMS classes; sample 8-mixand GM posteriorapproximations shown for D k = Far From (likelihood in red dash), alongwith run time and KLD statistics.

vergences (KLDs) between the true posterior p(Xk |Dk )(fromnumerical integration) and each GM approximation p(Xk |Dk ),where the KLD is given by

KL[pp] =

p(Xk |Dk )log

p(Xk |Dk )

p(Xk |Dk )

dXk (45)

and smaller KLDindicates thatp(Xk |Dk ) loses less informationfromp(Xk |Dk )(and is therefore a better approximation to the

true posterior). Clearly, LWIS loses the most information onaverage, while VBIS loses the least. Repeating LWIS with 1500

samples matches the time required for VBIS with 500 samples,

but only reduces the LWIS KLD by about half. In addition,

the VBIS KLD increases to 0.23 0.20 if the direct-samplingestimate Ps (r|u)ofCr u is only used in step 5 of Algorithm 3(i.e., ifCr u from VB is ignored), since Ps (r|u)underestimatesthe weights for the minor GM posterior modes on the positive

Xk axis. This shows that the VB bounds Cr u help improve theposterior GM weight estimates in (40).

C. Practicalities

1) Parallelization:The nested for loops that contain steps 26 of Algorithm 3 can be parallelized into sj M independentVBIS updates. As such, parallelized GM filtering strategies for

k fusion can be readily adapted to incorporate soft categorical

measurements via (3) using Algorithm 3. In particular, if GM

filters are used to approximate (1) and (2), then the complete

hybrid Bayesian fusion cycle can be implemented as a bank of

parallel Gaussian filters that are combined to produce a final

GM posterior approximationp(Xk |1: k , D1: k )at each time stepk.

2) Mixture Condensation: The number of mixands inp(Xk |1: k , D1: k )grows at each time step k if either sj >1 in Algo-

rithm 3 or (1) and (2) marginalize out discrete random variables


11/18


Fig. 5. Experimental setup.(a) Pioneer 3-DX robot used forexperiment,featuring Vicon markers foraccuratepose estimation; a HokuyoURG-04 LX lidar sensorfor obstacle avoidance; an onboard Mini ATX-based computer with a 2.00 GHz Intel Core 2 processor, 2 GB of RAM and WiFi networking; and a Unibrain Fire-IOEM Board camera. (b) Base field map used in all search missions, showing locations of six opaque obstacle walls and two generic landmarks. (c) Humanrobotfusion GUI, which runs on a 2.66 GHz Intel Core 2 Duo workstation with 2 GB of RAM.

(e.g., via GM process/measurement noise models). Standard

GM compression methods should thus be applied to maintain

tractability while minimizing a suitable information loss metric

with respect to the full GM posterior approximation [42], [43].

3) Component Gating for Skipping Updates: IfPs (r|u) 1from (39), then ur and ur will be very close to u andu .Step 4 of Algorithm 3 can thus be modified to apply a gating

threshold after step 3 to determine whether the posterior com-

ponent for the pair (u, r) requires EM iterations for the VBISapproximation. IfPs (r|u) , alternative component updatesvia LWIS or prior equivalence (i.e., ur =u and ur = u )are used, and step 5 becomes P(r|u) = Ps (r|u); otherwise,steps 4 and 5 are carried out with VBIS as usual. Note that

should be set close to 1 (e.g., = 0.9999) to ensure only those

components that are definitely not worth updating by VBIS areconservatively skipped.

V. COOPERATIVEMULTITARGETSEARCHEXPERIMENTS

As discussed in [11], [12], and [17], humanrobot informa-

tion fusion is particularly relevant for cooperative target search

applications such as coordinated search and rescue, large-scale

surveillance, and urban reconnaissance. To provide practical

insight on the utility of the proposed soft human information

fusion approach, an experimental application to cooperative in-

door target search missions was conducted with a real human

robot team.8

A. Problem Setup

A single human agent and a single autonomous mobile robot

were tasked with finding and correctly identifying five hidden

targets as quickly as possible under a fixed time constraint.

Fig. 5(b) shows the base map of the 5 m 10.5 m indoor areawhich is used to conduct the multiple search mission experi-

ments, which featured several movable obstacle walls and two

8Similar experiments with 16 different human users were conducted in [44]to examine sensitivity to P(Dk |Xk ); although not discussed here, the resultsfrom that study corroborate this papers findings on the utility of the proposed

fusion approach.

generic landmarks. The walls are placed such that the human

(who remained seated off field at a computer) could only see a

small portion of the search area by direct line of sight. The five

targets were static orange traffic cones labeled with unique ID

numbers (1 through 5) that were hidden at various locations that

differed across four separate search missions.

Each target locationXt R2 is modeled by a GM prior

p(Xt ) =

Mtu = 1

ctu N(tu ,

tu ) (46)

where the number of targets is known a priorifor t {1,.., 5}andp(X1 ) =p(X2 ) = = p(X5 )at mission start; these pri-ors are detailed in Section V-C. Each p(Xt ) is updated over

time using one or both of the following information sources.1) 1: k : the set of all detection/no detection observations made

by the robots visual target detector, and 2) D1: k : the set of all

soft target location data provided by the human.

Fig. 5(a) shows the pioneer 3-DX autonomous mobile robot

that is used in the experiment. The robot is equipped with a

camera and vision-processing software that detects orange traf-

fic cones up to a 1 m range with a 42.5 field of view at 2 Hz.

The robot moves at a constant speed of 0.3 m/s with a known

map of the search area and highly accurate pose data from Vi-

con motion tracking. The robot autonomously navigates toward

intermediate search points (i.e., goal locations) based on the

updatedcombined undetected target posterior GM pdf

p(Xcombk ) =tTk

1

|Tk | p(Xt |1: k , D1: k ) (47)

whereTk is the set of undetected targets at time k . As in [2],

the target pdfs are used to autonomously plan search paths us-

ing a simple suboptimal greedy strategy. Equation (47) is first

discretized to select the highest value (nonobstacle) grid cell

defining the robots next search point; the robot then creates

and follows a path using the D algorithm to ensure that this

point lies at the center of the target detector likelihood function,

shown in Fig. 6(e). The robot immediately repeats this planning

procedure whenever it either reaches its current search point


12/18


Fig. 6. (a) Example GM target location prior. (b)(d) Base MMS models for prepositions. (e) MMS model for camera detection likelihood. (f)(i) Posterior GMsfrom VBIS after fusingDk in (b)(d) with GM prior in (a). (j) Posterior GM from LWIS fork =No Detection report with GM prior in (a).

without detecting anything or if it receives new information

from the human (described below). While other search strate-gies could be used, this approach works well and is tied to

searches that are based on model predictive control [2], [23].

Comparisons with other search methods are beyond the scope

of this study.

The human remains seated at a computer station facing the

field [coordinates x= 0.8 m, y= 3.3 m in Fig. 5(b)] andcommunicates with the robot through the graphic user interface

(GUI) shown in Fig. 5(c). The human has two tasks: 1) classify-

ing detections by the robot as either false alarms or actual targets,

and 2) voluntarily modifying the target GM pdfs via soft infor-

mation messages Dk . For task 1, the robot streams processed

camera images at 1 Hz to the GUI and pauses to report visual

target detections. If the human declares a false alarm, the robot

notes the objects location to prevent reacquisition. Otherwise,

the robot localizes the target via laser and camera data, and the

GM for the identified target t is removed from (47). For task 2

(the focus of this study), the human can use direct observations

of the field and the robots camera feed to send messages that

update (47) (detailed below). The human also has access to a

2-D surface plot of (47) overlaid on a labeled map of the search

area so that consistent contextual information is available for fu-

sion. The human can only send information andcannotdirectly

command the robot. However, the robot automatically replans

whenever a new Dk is fused, since the maximum of (47) can

change significantly.

B. Online Measurement Updates

Each p(Xt |1: k , D1: k ) is recursively updated online via (2)and (3); (1) is not needed since the targets are all static. The

k updates are skipped for false alarms (assumed to be filtered

out perfectly by the human), whileDk updates occur as human

messages arrive spontaneously.

1) Robot Visual Detection Model and k Updates: The

robots target detector likelihood P(k |Xt ) is a hybrid prob-abilistic mapping from Xt to a discrete observation k

{No Detection, Detection}. As such,P(k |Xt

)is well ap-

TABLE IIHRI GUI CODEBOOKCHOICES

proximated by the 2-D MMS model shown in Fig. 6(e), which

describes the No Detection class likelihood with a high prob-

ability outside the vision cone. The parameters for this model

were learned offline and shifted online to account for the robotspose and known occlusions (e.g., walls). Since P(k |Xt )is anMMS model, the inference methods in Section IV obtain a GM

approximation to (2). LWIS GM fusion with 1000 samples per

component update and a component gate of = 0.9999 gavesufficiently accurate results, due to the robots slow motion.

Fig. 6(j) shows an example LWIS GM fusion update with the

nominal MMS camera model, illustrating the posterior scatter-

ing effect induced by negative information from No Detec-

tion updates [2], [22].

2) Human Observation ModelsandDk Updates: Structured

three-field messages of the form Dk =(Existence) is (Preposi-tion) (Reference) were sent sequentially by the human, where

any combination of the predefined codebook entries shown inTable II could be selected in the GUI via mouse. Existenceal-

lows positive/negative soft observations to be sent, assuming

each targets ID is unavailable until detection (the data associ-

ation problem due to this ambiguity is addressed below). Ref-

erence determines each observations spatial reference point,

whilePrepositiondetermines the MMS model to use for modi-

fying each target GM given the Existenceand Referencefields.

This study used three categorical ranges and two categorical

bearings, givingDk 90 distinct realizations.

Base MMS models for Preposition entries were learned

offline with training data from the single human user who

performed all missions in this study. Fig. 6(b)(e) shows the


13/18


Fig. 7. True target locations and initial GM priors for Mission 4, showing(a) uniform and (b) bad search priors. The uniform GM prior in (a) isthe same in all four search missions and the bad priors for Missions 13 arequalitatively similar to (b).

TABLE IIIEXPERIMENTAL SEARCHMISSIONMATRIX

resulting models, whose origins all correspond to a nominal

(0, 0)Reference Locationposition inXt space. The weights ofthese base models are shifted/rotated onlineto be consistent with

the desired Referenceorigin/orientation. Negative (i.e., Noth-

ing is. . .) observations with respect to aPrepositionclassjare

handled via pseudopositive measurement updates with respect

to all other classes t=j in the corresponding MMS model.All Dk updates are performed online with VBIS GM fusion

(Algorithm 3, usingNa =Ns = 500and= 0.9999).3) Data Association: Data association issues arise ifDk is

not target-specific, i.e., Something is. . . could apply to any

one target, while Nothing is. . . applies to all targets. Thisambiguity is handled here through the GM-based probabilistic

data association (PDA) [45], in which (3) is computed for the

hypothesis thatDk describes targettto givep(Xt |1: k , D1: k ),

and the prefusion prior GM p(Xt |1: k , D1: k1 ) is assigned toall hypotheses, where Dkdoes not describe t. Marginalizing out

the association hypothesis gives the updated target t GM

p(Xt |1: k , D1: k ) =() p(Xt |1: k , D1: k )

+ [1 ()] p(Xt |1: k , D1: k 1 ) (48)

where = 1ifDk has positive/Something is. . . data (= 0

otherwise for negative/Nothing is. . . data), and () is the

Fig. 8. Search performance metrics. (a) and (b) Search mission times (sec)under uniform and bad priors. (c) and (d) Number of targets found per missionunder uniform and bad priors.

probability of the hypothesis that Dk describes target t. Here,

(0) = 1 and (1) = 1|Tk | , where |Tk | is the number of unde-

tected targets at timek . The probability of erroneous/falseDkis assumed to be zero for simplicity.9

4) Mixture Compression: Following k and Dk updates,

each p(Xt |1: k , D1: k ) is compressed to M= 15 mixands viaSalmonds joining method [42], which preserves overall GM

mean and covariance. This requires O(M2o)time forMo initialmixands; therefore, only the 100 highest weighted components

ofp(Xt |1: k , D1: k )are used for each merging operation.10

C. Target Priors and Fusion Scenarios

Four sets of search missions were conducted under the three

types of sensor fusion modalities and two types of initial target

GM priors shown in Table III (for a total of 24 search missions).

The same four missions (each characterized by a different set

of true target locations) are used to study all cells of Table III.

Fig. 7(a) and (b) shows the true target locations and priors for

Mission 4 (other maps and priors are not shown due to limited

space). The pseudouniform GM prior was the same for all four

missions; the bad GM priors were highly inconsistent with the

true target locations in each mission, reflecting worst case search

scenarios, where a prioriinformation is badly flawed. To sim-

ulate realistic target discovery with the same human operator,

positive Dk messages were not sent until targets were actually

observed by the human during the mission. All missions ended

if the robot did not find all targets after 15 min (900 s). This

challenging time constraint was chosen after extensive testing

9Nonzero probabilities can generally be incorporated into (48) to maintain aGM fusion pdf [32], [45].

10This typically led to little information loss at each step, since each targetsGM can only have 30255 components following ak orDk update and since

GM weights above 0.01 are always concentrated in 100 mixands.


14/18


Fig. 9. Two-norm of MAP-estimated errors of each targets location over time based on GM in Mission 4 scenarios; black markers on time axis denote instanceswhere Dk is fused and the red dashed line denotes 0.5 m error mark (error traces end if target is detected). Search with Robot Only, Human Only, and HumanWith Robot fusion shown from left to right; uniform and bad GM search priors shown in top and bottom rows, respectively.

showed it to be the minimum time required for therobots greedy

search to find all targets in all missions without Dk updates.

D. Results: Overall Search Performance

The overall search performance of the humanrobot team is

gauged here via the search completion time and the number of

targets detected in each search mission. Fig. 8 shows the results

for these two metrics over all 24 search missions. Human With

Robot sensing clearly offers the best overall search perfor-

mance, since all five targets were always found in each mission

within 8.513 min. While more targets were found under Hu-

man Only sensing than under Robot Only sensing (which has

the worst overall performance), the mission completion times

for these conditions were about the same. The number of tar-

gets detected for each of these conditions drops slightly whenmoving from uniform to bad GM priors; in contrast, the prior

type did not significantly affect performance with Human With

Robot sensing.

The Robot Only results underscore the nontrivial nature

of the search problem and the inadequacy of the greedy search

strategy when only k is fused. With Dk available, the robot

was more proficient at detecting targets via the greedy search

(though performance was not necessarily optimal in any

sense). The improvement from human input can be explained

by comparing the typical level of informativeness of each GM

p(Xt |D1: k , 1: k )over time under various fusion and prior con-

ditions. Time traces of the MAP-estimated target position error

t =Xttrue Xt , where Xt = arg maxp(Xt |D1: k , 1: k )for

all Mission 4 runs are shown in Fig. 9. All Dk fusion instances

are also shown to illustrate the typical frequency of voluntaryhuman messaging and its influence on each targets posterior

estimate.

While the Xt estimates that are derived from the target GMswith Dk fusion are less precise than, say, estimates derived from

conventional lidar data, Fig. 9, nevertheless, shows that fusion

of soft human information in Dkhelps substantially improve the

robots estimated beliefs over time, i.e., even if a target spotted

by the human is far from the robots own sensor range. Indeed,

with Dk fused, t


15/18


E. Results: Diversity of Soft Human Sensor Inputs

The volume and variety of human messages generally in-

creased under either Human Only or bad prior conditions

(detailed results are omitted here due to limited space). There

were also many more positive messages (1376) than negative

messages (296) over all search scenarios, due to the fact that

the contributions of positive/Something is. . . messages weredownweighted via the PDA correction in (48), which favors

the prior GM (i.e., the nonassociation hypothesis) when = 1.Hence, the human often had to resend the same positive Dkmessage two or three times to convince the robot that some-

thing was in fact somewhere. In contrast, negative/Nothing

is. . . messages were much less frequent, due in part to the fact

that they are not downweighted by PDA. This dilution effect on

positive information could potentially be avoided through the

use of alternatives to PDA data association, e.g., multiple hy-

pothesis tracking. However, an evaluation of such alternatives

is outside the scope of this study.

F. Further Insights: Complementary Team Behavior

As noted in [2], a simple greedy search strategy generally

leads to inefficient back and forth search paths over the map

as a direct consequence of the scattering effect from k up-

dates [see Fig. 6(j)]. As such, in Robot Only scenarios, the

robot frequently jumped from one part of the search map to

another without searching thoroughly around its goal points,

leading to slow information gain. Since (47) also diminished

around missed Xttrue following missed detections, the robot

could not remedy missed target detections until after greedily

searching the rest of the map. As Fig. 10 illustrates, in Hu-man With Robot scenarios, the human operator could quickly

correct missed detections by sending relevant soft information

that forced the robot to greedily re-examine areas around actual

target locations.

While Human Only fusion led to more target detections

than did Robot Only fusion, one or two targets remained un-

detected in certain missions, and completion times were not im-

proved consistently (especially with bad priors). This is largely

attributable to the coarse nature of the softDk codebook, since

(47) could not be precisely updated to allow the robot to nudge

closer toward the target if it was just outside of detection range.

As Fig. 11 illustrates, the human spent considerable time in Hu-

man Onlymissions sending manyextra messages to convincethe robot into obtaining a better viewing position to detect some

targets that were right in front of it and just outside of detec-

tion range (especially those that were not close to landmarks,

such as Target 1 in Mission 4). The resulting high volume of

Dk messages (especially in the bad prior missions) is also evi-

dent in Fig. 9; these scenarios led to human frustration in some

cases. However, in Human With Robot cases, scattering via

k = No Detection observations helped shift (47) closer toany targets just outside of detection range, thereby automati-

cally refining the target GM pdfs following Dk fusion. This

also led to smoother interaction between the human and the

robot, as indicated by the significantly improved mission times

Fig. 10. Human With Robot fusion sequence showing human correction ofmissed target detection viaDk updates. Sequence length is under 1 min.

Fig. 11. Human Only fusion sequence showing effects of limited codebookprecision without

kupdates. Sequence length is almost 4 min.

and lower message volume/frequency compared with Human

Only missions.

These results for the two different fusion conditions indicate

that the simple codebook used here to generate Dk messages

produces useful but ultimately limited information to localize

the targets. To enable reliable target localization without fu-

sion of more precise k data, the codebook could be refined

to include more diverse or contextually precise Dk preposi-

tion/reference primitives. Given that the only reference pointsal-

lowed in Dkare discrete landmark/wall locations and the robots

current location, it is not surprising that the set of softmax/MMS


16/18


Fig. 12. Logarithm of KLDs for each time step for final/remaining targetposterior pdfs under each fusion condition in Mission 4 under uniform (leftcolumn) and bad priors (right column). (Standard deviations over ten MonteCarlo trials omitted fromDk fusion cases for clarity).

likelihoods induced by the three range-only and two bearing-

only preposition classes can be too imprecise for awkwardly

located targets (e.g., in the corner of the search map away from

any walls/landmarks).

G. Results: Accuracy of Approximate Gaussian Mixture

Posteriors

To assess the accuracy of online fusion ofk data via LWIS

and of Dk data via VBIS, the KLD of each GM posterior

p(Xt |1: k , D1: k ) obtained at every time step k was computedoffline for all search missions with respect to recursive grid-

based ground-truth fusion posteriors at 0.1 m 0.1 m gridresolution. To further assess the contribution of VBIS for Dkfusion, KLDs were also computed offline for a separate set of

GM posteriors obtained by using LWIS GM fusion to fuseboth

1: k and D1: k , with 1000 samples per component update (to

match the total number of samples used for VBIS Dk updates).The KLDs for both the online LWIS-VBIS and offline LWIS-

only GM approximations were evaluated over ten independent

Monte Carlo runs to account for random sampling effects. The

effects of the robots closed-loop greedy planner were removed

from the offline fusion results by using the same recorded robot

trajectories,k andDk data, and mixture management methods

(PDA, Salmonds method) as for online LWIS k and VBIS Dkfusion.

Although details are not shown here, only small baseline KLD

losses arose during Robot Only fusion under uniform and bad

priors (i.e., due to LWIS fusion ofk alone along with artifacts

of GM compression), showing that GMs can provide reason-

able approximations to the exact target location posteriors over

the course of a full 15-min search mission. The log KLDs in

Fig. 12(a)(c) and (b)(d) show that the online VBIS and offline

LWISDk updates generally offer comparable accuracy along-

side LWIS k fusion. Note that the LWIS Dk fusion results here

greatly benefit from the JPDA-based positive/Something is. . .

updates in (48), since a substantial portion of the prior appears

in the GM posterior (i.e., 1 () [0.5, 0.8]when |Tk |> 1).The KLDs typically spike with Dk updates; the largest upward

spikes tend to appear after about 10 min (600 s) due to increased

sensitivity to accumulated information losses from baselinekLWIS fusion. For the Human Only cases in particular, the tails

and several small components of the true posteriors after many

Dk messages become very difficult to approximate with only 15

component GMs. The KLDs for both methods are noticeably

smaller for Human With Robot fusion, as k helps reduce

the number ofDk needed to modify the pdfs and thus limits

the overall complexity of the true posteriors. Nevertheless, such

spikes are often less than 1.5 log nats for VBIS Dk fusion

before 600 s; larger KLD spikes typically occur after this time,but are still often less severe than those for LWIS. Indeed, the

VBIS KLDs with bad priors are either statistically compara-

ble with or significantly smaller than the corresponding LWIS

KLDs.12 Fig. 12(d) shows one such discrepancy in accuracy at

aboutk = 100, where a major GM posterior mode is missed byLWIS but not by VBIS.

H. Computation and Implementation Considerations

Although more reliable, the use of EM makes VBIS more ex-

pensive to implement than LWIS. VBIS required approximately

7 ms on average per GM component update in these experiments

using managed C# code, while LWIS required approximately2 ms.13 To overcome the fact that VBEM can converge slowly

if initialized far from the final solution, several code optimiza-

tion strategies (not implemented here) could be used, such as

parallelization of Algorithm 3, clustering of VBEM initializa-

tions across similar GM components, and use of unmanaged

pointer arithmetic. Such optimizations were not required for the

present application, as VBIS did not lead to appreciable delays

for online operation.

An important advantage of GM posterior approximations is

their compactness compared with the offline-computed ground-

truth discrete grids. A 15-component GM for one target at a

single time step requires 720 bytes (double precision), whilethe grid requires approximately 52 times as much memory at

44 064 bytes. Hence, for a full 900-s search mission, the targets

full posterior time history recorded at 1 Hz requires 0.65 MB

with a GM, versus 39.66 MB with a grid. This discrepancy is

even larger ifXk is augmented to include additional states (e.g.,

vertical displacement and velocities). Such storage costs are

highly relevant for applications in which pdfs over multiple time

steps must be stored and/or communicated, e.g., decentralized

data fusion sensor networks [17]. Note that the development of

12Determined using KruskallWallis tests with p= 0.01 on the time-averaged log KLD values.

13

These times did not increase significantly for = 1.


17/18


sophisticated, yet computationally affordable, online GM com-

pression methods to avert excessive posterior information loss

in realistic fusion scenarios (e.g., with hundreds or thousands of

mixands) is still an active area of estimation research.

Finally, it is worth considering whether a standard particle

filtering approach is adequate for humanrobot fusion in place

of the GM methods that are discussed here. Additional offline

fusion performance analyses for the multitarget search trials

were performed with the common BPF [24] using different

sample sizes (50010 000 particles) and resampling schemes.

Unlike the GM filtering approaches that are considered here,

the BPF approximates (46)(48) with weighted samples (drawn

initially from the prescribed GM priors at k = 0) and performsall Bayesian updates via likelihood weighted IS. Although full

details are omitted here due to limited space, the BPFs per-

formance (in terms of robustness, consistency, and estimation

accuracy) was generally found to be worse across all sample

sizes compared with the performance of the GM filters. For in-

stance, the BPFs final value for3 is always about 4 m for the

Human with Robot Mission 4 trial under the benign Uni-form prior, whereas the VBIS+LWIS GM filters final valueof3 is always about 0.4 m. This behavior can be traced to par-

ticle degeneracies that arise in the BPF via likelihood weighted

IS and the BPFs inability to explore new Xk values outside

its initial sample set. These issues are neatly addressed by the

proposed GM filter, which also provides a more compact and

completely continuous approximation of the fusion posterior.

VI. CONCLUSION

This paper derived a computationally efficient and accurate

approximation to the recursive hybrid Bayesian inference prob-lem involved in the dynamic fusion of soft categorical human

observations with conventional hard robot sensor data. The pro-

posed VBIS fusion method combines the strengths of fast stand-

alone variational Bayes and Monte Carlo IS inference approxi-

mations to obtain consistent Gaussian posteriors in the baseline

case of Gaussian state priors with softmax likelihood functions.

VBIS was then extended to derive GM posterior approximations

for GM priors with MMS likelihood models in order to handle

more general recursive hybrid data fusion problems. Experi-

mental multitarget search results for a real humanrobot team

showed that soft categorical observations from human sen-

sors, although subject to limited precision and potential data

association ambiguities, can still be highly useful and informa-tive for recursive Bayesian estimation problems that feature a

high degree of uncertainty or inconsistency. The results also

provide valuable practical insight into the reliability of the pro-

posed VBIS GM approximations under a variety of fusion condi-

tions, vis-a-vis LWIS GM and grid-based ground-truth approxi-

mations. Soft categorical human sensor observations can be ex-

ploited in many different dynamic data fusion domains and are

particularly convenient in situations where humans must share

information quickly but do not have enough time to precisely

estimate states of interest (e.g., the precise distance and bearing

to a target in meters and degrees, respectively). Although the

important issues of estimating error/false alarm and likelihood

model uncertainties for human sensors are not addressed here in

detail due to limited space, the proposed data fusion framework

can incorporate these in a fully Bayesian manner [10], [32].

REFERENCES

[1] P. Bladon, P. Day, T. Hughes, and P. Stanley, High-level fusion using

Bayesian networks: Applications in command and control, inProc. Inf.

Fusion Command Support, 2004, pp. 4.44.18.[2] F. Bourgault, Decentralized control in a Bayesian world, Ph.D. dis-

sertation, Sch. Aerosp., Mech. Mechatronic Eng., Univ. Sydney, N.S.W.,Australia, 2005.

[3] T. Fong and I. Nourbakhsh, Interaction challenges in humanrobot spaceexploration, ACM Interact., vol. 12, no. 2, pp. 4245, 2005.

[4] A. Bauer, K. Klasing, G. Lidoris, Q. Muhlhbauer, F. Rohrmuller, S.Sosnowski, T. Xu, K. Kuhnlenz, D. Wollherr, and M. Buss, The au-tonomous cityexplorer: Towardsnaturalhumanrobotinteraction in urbanenvironments, Int. J. Soc. Robot., vol. 1, no. 2, pp. 127140, 2009.

[5] T. Nakamura,T. Nagai,and N. Iwahashi,Bagof multimodal LDA modelsfor concept formation, in Proc. IEEEInt. Conf. Robot. Autom., May2011,pp. 62336238.

[6] E. Topp andH. Christensen, Topological modellingfor human augmentedmapping, in Proc. Int. Conf. Intell. Robots Syst., Beijing, China, 2006,pp. 22572263.

[7] B. Khaleghi, A. Khamis, and F. Karray, Random finite set theoreticbased soft/hard data fusion with application for target tracking, in Proc.Conf. Multisensor Fusion Integ. Intell. Syst., Salt Lake City, UT, 2010,pp. 5055.

[8] D. Hall and J. Jordan,Human-Centered Information Fusion. Boston,MA: Artech House, 2010.

[9] M. Michalowski, S. Sabanovic, C. DiSalvo, D. Busquets,L. Hiatt, N. Mel-chior, and R. Simmons, Socially distributed perception: Grace plays so-cial tag at AAAI 2005, Autonom. Robots, vol. 22, pp. 385397, 2007.

[10] T. Kaupp, Probabilistic humanrobot information fusion Ph.D. disser-tation, Sch. Aerosp., Mech. Mechatronic Eng., Univ. Sydney, N.S.W.,Australia, 2008.

[11] M. Lewis, H. Wang, P. Velgapudi,P.Scerri, andK. Sycara, Usinghumansas sensors in robotic search, in Proc. 12th Int. Conf. Inf. Fusion, Seattle,WA, 2009, pp. 12491256.

[12] F. Bourgault, A. Chokshi, J. Wang, D. Shah, J. Schoenberg, R. Iyer,F. Cedano, and M. Campbell, Scalable Bayesian humanrobot coopera-tion in mobile sensor networks, in Proc. Int. Conf. Intell. Robots Syst. ,2008, pp. 23422349.

[13] S. Thrun, W. Burgard, and D. Fox,Probabilistic Robotics. Cambridge,MA: MIT Press, 2001.

[14] Y. Bar-Shalom, X. Li, and T. Kirubarajan,Estimation with Applicationsto Navigation and Tracking. New York: Wiley, 2001.

[15] T. Kaupp, A. Makaerenko, F. Ramos, B. Upcroft, S. Williams, andH. Durrant-Whyte, Adaptive human sensor model in sensor networks,inProc. 8th Int. Conf. Inf. Fusion, 2005, vol. 1, pp. 748755.

[16] T. Kaupp, A. Makaerenko, S. Kumar, B. Upcroft, and S. Williams, Oper-ators as information sources in sensor networks, in Proc. IEEE/RSJ Int.Conf. Intell. Robots and Syst., 2005, pp. 936941.

[17] T. Kaupp, B. Douillard, F. Ramos,A. Makarenko, andB. Upcroft, Sharedenvironment representation for a humanrobot team performing informa-tion fusion, J. Field Robot., vol. 24, no. 11, pp. 911942, 2007.

[18] M. Skubic, D. Perzanowski, S. Blisard, A. Schultz, W. Adams, M. Buga-

jska, and D. Brock, Spatial language for humanrobot dialogs, IEEETrans. Syst., Man, Cybern. C, Appl. Rev., vol. 34, no. 2, pp. 154167, May2004.

[19] A. Huang, S. Tellex,

2013 bayesian multicategorical soft data fusion for human–robot collaboration

Documents