2013 bayesian multicategorical soft data fusion for human–robot collaboration

Upload: mhacksahu

Post on 02-Jun-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration

    1/18

    IEEE TRANSACTIONS ON ROBOTICS, VOL. 29, NO. 1, FEBRUARY 2013 189

    Bayesian Multicategorical Soft Data Fusionfor HumanRobot Collaboration

    Nisar R. Ahmed, Member, IEEE, Eric M. Sample, and Mark Campbell, Member, IEEE

    AbstractThis paper considers Bayesian data fusion of conven-tional robot sensor information with ambiguous human-generatedcategorical information about continuous world states of inter-est. First, it is shown that such soft information can be gener-ally modeled via hybrid continuous-to-discrete likelihoods thatare based on the softmax function. A new hybrid fusion proce-dure, called variational Bayesian importance sampling (VBIS), isthen introduced to combine the strengths of variational Bayes ap-proximations and fast Monte Carlo methods to produce reliableposterior estimates for Gaussian priors and softmax likelihoods.VBIS is then extended to more general fusion problems that in-volve complex Gaussian mixture (GM) priorsand multimodal soft-max likelihoods, leading to accurate GM approximations of highly

    non-Gaussian fusion posteriors for a wide range of robot sensordata and soft human data. Experiments for hardware-based mul-titarget search missions with a cooperative human-autonomousrobot team show that humans can serve as highly informative sen-sors through proper data modeling and fusion, and that VBISprovides reliable and scalable Bayesian fusion estimates via GMs.

    Index TermsBayesian methods, Gaussian mixtures, human-robot interaction, machine learning, Monte Carlo methods, recur-sive state estimation, robot sensor fusion, variational Bayes.

    I. INTRODUCTION

    IN order to behave intelligently in complex environments,

    autonomous robots must continuously update their under-standing of the world by combining new data from various

    sources. Despite considerable recent advances in autonomous

    robot control and perception, human inputs are still required in

    many practical settings to overcome various actuation/sensing

    limitations and ensure robustness in the presence of uncertain-

    ties. As such,data fusionplays an important role in the applica-

    tion of collaborative humanrobot teams to diverse areas such

    as defense and security [1], search and rescue [2], space ex-

    ploration [3], and social robotics [4]. However, looking beyond

    the ability to provide supervisory validation or training data for

    static abstract phenomena (e.g., categories for object types [5]

    or places [6]), the potential richness of human sensor data is

    often overlooked for robotics applications.

    Manuscript received December 18, 2011; revised May 28, 2012; acceptedAugust 18, 2012. Date of publication September 12, 2012; date of currentversion February 1, 2013. This paper was recommended for publication by As-sociate Editor C. Stachniss and Editor D. Fox upon evaluation of the reviewerscomments. This work was supportedin part by theNationalScience FoundationGraduate Research Fellowship Program and in part by AFOSR MURI FA9550-08-1-0356.

    The authors are with the Autonomous Systems Laboratory, Cornell Univer-sity, Ithaca, NY 14850 USA (e-mail: [email protected]; [email protected];[email protected]).

    Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

    Digital Object Identifier 10.1109/TRO.2012.2214556

    The problem considered here is the dynamic fusion of con-

    ventional robot sensor data (i.e., hard data) with ambiguous

    human-generated observations (i.e., soft data) related to un-

    certain continuous/physical world states of interest, e.g., object

    location, velocity, mass, temperature, etc. This study is moti-

    vated by the fact that maintaining full observability over all

    states of interest through robotic sensors alone can be challeng-

    ing in many applications. For instance, as discussed in [7], a

    robot which is equipped with a 2-D horizontal scanning lidar

    can track the position and velocity of moving people, but will not

    have direct access to height, weight, or goal location informa-

    tion that could be used to improve target motion models. Moreimportantly, all target states become unobservable if targets are

    occluded, confused with false alarms, or beyond sensor range

    for a long time. By acting as an externally available sensor in

    such cases, a helpful human agent can furnish the robot with

    relevant data that substantially reduce uncertainty or inconsis-

    tencies in desired state estimates, e.g., due to poor observability

    or previous fusion of faulty information. However, unlike hard

    data, soft data are difficult to model from first principles and

    are not guaranteed to be provided in a consistent manner, since

    they are highly context-specific and subject to uncertainties via

    psychocognitive factors (e.g., expertise, stress, fatigue, memory,

    and perception bias) [8].Given these considerations, soft data fusion hasbeen explored

    in thecontext of several robotics applications, such as navigation

    by social interaction [4], [9], cooperative tracking and surveil-

    lance [10], and search and rescue [2], [11], [12]. However,

    formal modeling and fusion of soft data via the standard

    Bayesian state estimation paradigm [13], [14] have been con-

    sidered in only a few relatively recent studies. Kaupp et al.de-

    veloped a Bayesian method to fuse continuous soft range-with-

    bearing data to tracked objects by modeling human sensors via

    linear-Gaussian regression models, which were then incorpo-

    rated into decentralized Kalman filters [15], [16]. The authors

    of [17] extended this work to include probabilistic models of

    human visual sensing, which were used to improve data asso-

    ciation and object classification accuracy in joint humanrobot

    tracking tasks. Bourgaultet al.considered grid-based Bayesian

    fusion of binary human visual target detection likelihoods for

    a distributed 2-D search problem [12]. Importantly, however,

    these existing Bayesian approaches are inadequate to fuse in-

    formation related through coarse/fuzzy terminology, which is

    a predominant feature of soft data [8]. Some examples include

    the following.

    1) The car is moving quickly around the block; a bike is

    close behind it.

    2) Nothing is behind the building, on top of the roof, or near

    the truck to the left of me.

    1552-3098/$31.00 2012 IEEE

  • 8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration

    2/18

    190 IEEE TRANSACTIONS ON ROBOTICS, VOL. 29, NO. 1, FEBRUARY 2013

    3) The sidewalk is very steep; the nearby obstacle is much

    lighter than the robot.

    The main issue at hand here is how such data can be sta-

    tistically modeled and fused with hard robot data in a rigor-

    ous Bayesian manner. Structured codebook modeling strate-

    gies have already been successfully used in the development of

    human-assisted motion planners, which use probabilistic mod-

    els of symbolic/linguistic motion primitives to infer constraints

    on robot paths (e.g., go around the table and between the

    chairs) [4], [18], [19]. However, the models that are devel-

    oped for these planners are geared toward characterizing human

    motion commands and are thus unsuitable to extract dynamic

    state information from purely observational soft inputs. A mixed

    fuzzy-Bayesian modeling approach for soft-hard fusion was

    proposed by Mahler using random finite sets [20], in which soft

    linguistic observation codebook likelihoods are modeled via

    fuzzy set interpretations of virtual linear state measurements

    (this method was also adopted in [7]). However, such likelihood

    models cannot describe ambiguous reports with highly non-

    Gaussian uncertainties (e.g., range-only reports such as the caris not too far from me).

    Alternatively, it is proposed here to model such soft obser-

    vations via multicategorical random variables that are condi-

    tionally dependent on the states of interest. As such, terms

    like nearby and left of imply uncertain discrete classifi-

    cations of the continuous state by a human observer. Though

    less precise than typical continuous hard data (e.g., lidar, sonar,

    etc.), binary categorical data in the form of negative mea-

    surements have already proved quite useful for Bayesian state

    estimation in robotic mapping [13], localization [21], and ob-

    ject search/tracking [2], [22], [23]. However, dynamic estima-

    tion of continuous states from discrete multicategorical datarequires approximation of an analytically intractable hybrid

    Bayesian inference problem. Various solutions exist in the esti-

    mation [24][26] and machine learning literature [27], [28], but

    these all have drawbacks that severely limit their suitability for

    online dynamic data fusion.

    This paper develops a novel recursive Bayesian fusion frame-

    work to efficiently combine hard robot data with soft multi-

    categorical observations of dynamic continuous states. Three

    contributions are made in this regard. First, it is shown here

    that soft multicategorical observations can be generally mod-

    eled as discrete random variables via flexible hybrid likelihood

    functions that are based on softmax distributions, which are eas-

    ily learnable from training data and have convenient propertiesfor online state estimation. Second, a new variational Bayesian

    importance sampling (VBIS) algorithm is developed for reli-

    able fusion of soft multicategorical data. The VBIS algorithm

    overcomes key limitations of other existing hybrid Bayesian

    inference algorithms and leads to the rigorous development

    of compact Gaussian mixture (GM) posterior approximations

    for general hard-soft fusion applications. Finally, the proposed

    fusion framework is demonstrated through online multitarget

    search experiments that involve a cooperative humanrobot

    team operating under various sensing modalities and prior in-

    formation conditions. The experimental results show that the

    proposed human sensor likelihood modeling approach, VBIS

    algorithm, and GM-based recursive fusion framework enable

    humancollaborators to serve as effective informationsources for

    robotic state estimation tasks. This paper builds on preliminary

    work in [29] and [30] by providing a more thorough explanation

    and experimental evaluation of the proposed humanrobot data

    fusion framework.

    II. HUMANROBOTDATAFUSION ANDSOFTCATEGORICAL

    DATAMODELING

    A. General Problem Statement

    The Bayesian data fusion approach proposed here models

    soft (i.e., human-generated) descriptions of continuous states

    with discrete random variables that represent contextually dis-

    tinct sets of state categorizations. These discrete random vari-

    able dependences on the state are modeled directly via flexible

    continuous-to-discrete hybrid likelihood functions, thus en-

    abling recursive Bayesian estimation of the unknown continu-

    ous states from multicategorical soft data.

    For discrete time index k Z0+ , let Xk Rn be the con-tinuous random state vector of interest with prior probability

    density function (pdf) p(X0 ) and transition pdf p(Xk |Xk 1 )arising from known stochastic dynamics. Let k be a vector of

    hard robot sensor data, which may contain a mixture of con-

    tinuous data (e.g., lidar returns) and discrete data (e.g., detec-

    tion/no detection outputs from a vision-based object detector)

    with joint conditional observation likelihoodp(k |Xk ). LetDkbe anm-valued discrete random variable that represents a cate-

    gorical human observation, where Dk has a conditional likeli-

    hood function P(Dk =j |Xk ) forj {1, . . . , m} and m Z+ .Them possible realizations ofDk are assumed to be mutually

    exclusive and exhaustive so thatm

    j =1 P(Dk =j |Xk ) = 1.The sequences of all k and Dk until time k are denoted as1: k {1 , . . . , k } andD1: k {D1 , . . . , Dk }, respectively.

    This paperadopts a recursiveBayesianprocess to sequentially

    fuse 1: k and D1: k information at each time step kto update the

    pdf forXk . Given1: k1 andD1: k1 , the dynamics prediction

    step propagates the most recent pdf ofXk 1 forward in time via

    the ChapmanKolmogorov equation [14]

    p(Xk |1: k 1 , D1: k 1 )

    =

    p(Xk |Xk1 )p(Xk1 |1: k1 , D1: k1 )dXk1 . (1)

    The robot measurement update step fuses the result of (1) withrobot-generated information ink via Bayes rule

    p(Xk |1: k , D1: k 1 )

    = p(k |Xk )p(Xk |1: k1 , D1: k1 )p(k |Xk )p(Xk |1: k1 , D1: k1 )dXk

    . (2)

    Finally, the human measurement update step fuses (2) with

    human-generated information inDk via Bayes rule

    p(Xk |1: k , D1: k )

    = P(Dk |Xk )p(Xk |1: k , D1: k 1 )P(Dk |Xk )p(Xk |1: k , D1: k 1 )dXk

    . (3)

  • 8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration

    3/18

    AHMEDet al.: BAYESIAN MULTICATEGORICAL SOFT DATA FUSION FOR HUMANROBOT COLLABORATION 191

    The main problem then is to determine the posterior pdf

    p(Xk |1: k , D1: k ) (i.e., the filtering density), which representsthe uncertainty in Xk given all information up to time k.

    1

    It is assumed without loss of generality that the pdfs in (1)

    and (2) can be, respectively, estimated by the prediction and

    measurement update steps of conventional filters, such as the

    (extended/unscented) Kalman filter [14], particle filter [24], or

    Gaussian sum filter [31].

    This paper focuses primarily on the measurement update

    which is defined by (3); conditioning on Dk =j, 1: k andD1: k 1 is hereafter suppressed so that

    p(Xk ) p(Xk |1: k , D1: k1 ) (4)

    and p(Xk |Dk ) p(Xk |1: k , D1: k1 , Dk =j ) (5)

    are the Bayesian prior and posterior in (3), respectively. Substi-

    tuting these expressions into (3) gives

    p(Xk |Dk ) = P(Dk |Xk )p(Xk )

    P(Dk |Xk )p(Xk )dX =

    p(Xk , Dk )

    P(Dk ) (6)

    where p(Xk , Dk ) is the joint pdf, and P(Dk ) is the marginalobservation likelihood.

    For any given continuous state Xk , the possible realizations

    forDk can be quite large and must be suitably tailored for each

    practical application. Hence, just as raw lidar data or camera

    images must be processed to generate meaningful k data, soft

    observations are assumed to be processed by an application-

    dependent interpreter to generate contextually recognizable Dkdata. As in most humanrobot interaction applications, such an

    interpreter could be based on a predefined communication pro-

    tocol that relies on a dictionary of known descriptor models and

    contextual reference values, to ensure consistent communica-

    tion [4], [19]. It is assumed for simplicity that the mpossiblevalues ofDkrepresent all desired human categorizations ofXk .

    However,Dk can also be a vector whose elements are discrete

    random variables that represent different types of categories

    over arbitrary subsets ofXk (e.g., separate range-only bins and

    bearing-only bins), in which case (3) is performed sequentially

    for each element ofDk .2

    Since Xk is continuous and Dk discrete, (6) defines a hy-

    brid Bayesian inferenceproblem [33], for which two key issues

    must be addressed3: 1) How to specify an appropriate human

    sensor likelihood model P(Dk |Xk ), and 2) how to subsequentlyevaluate (6) for any givenp(Xk )?

    B. Basic and Extended Softmax Models for Human Sensors

    For eachj {1, . . . , m} , P(Dk =j |Xk ) must map Xk =xto the interval [0, 1] such that

    mj =1 P(Dk =j |Xk =x) = 1.

    1Ifk = orDk =, then (2) or (3) is skipped, accordingly.2The vector model also allows binary categories to be defined, as in nearby

    versus not nearby and next to versus not next to. This offers an alternative tolumping nearby and next to into exclusive realizations of the same randomvariable so that different likelihoods for similar labels are obtained as a functionofXk . However, the interpreter must then ensure that contradictory realizationswithin Dk (i.e., where elements have joint likelihood of zero) are either avoidedor handled via Bayesian conflict resolution [32].

    3As shown in Section V, the techniques that are developed here for D k can

    be applied to categorical k data as well.

    While many functions satisfy this criterion, this study exclu-

    sively considers likelihoods that are defined via the softmax

    function

    P(Dk =j |Xk ) = ew

    Tj

    x+bj

    mh= 1e

    w Th

    x +bh(7)

    where wj , wh Rn

    and bj , bh R1

    are, respectively, vectorweights and scalar biases for classes j, h {1, . . . , m}. Thesoftmax function (also known as the multinomial logistic func-

    tion) is widely used in statistical pattern recognition [34] and

    is naturally well suited to modeling hybrid continuous-to-

    discrete mappings in complex stochastic systems with state-

    dependent switching behavior [33], [35]. An interesting feature

    of (7) is that the log-odds ratio between any categoriesj andc

    for a givenXk =x yields a linear hyperplane

    logP(Dk =j |Xk )

    P(Dk =c|Xk ) = (wj wc )

    T x + (bj bc ) (8)

    which implies that the probabilistic boundaries between cat-

    egories for a given likelihood ratio are also linear and com-pletely specified by the parameter sets W ={w1 , . . . , wm } andB = {b1 , . . . , bm }. Note that the elements of W control thesteepness of the probability surface between categories and the

    locations of the class boundaries, while the elements ofB enable

    shifts from the origin. The authors of [35] prove that boundaries

    defined via (8) always lead to a complete convex decomposition

    ofRn so thatXk can always be fully partitioned among the m

    classes ofDk .

    Fig. 1(a) shows one possible softmax likelihood model for

    a human providing one of 16 soft location labels (in terms of

    categorical ranges and bearings) to indicate the relative 2-D

    positionXk = [X, Y]T

    of an object relative to some arbitraryorigin. This example shows how the model in (8) represents cat-

    egorical ambiguities as a function ofXk ; softer weights lead to

    fuzzier probability contours between class labels (in range di-

    rections, for this example), while steeper weights lead to nearly

    deterministic probabilities over geometrically convex regions

    defining classes (across bearing directions). W and B can be

    learned from labeled training data using convex optimization

    procedures that are based on maximum likelihood or maximum

    a posterioriestimation [34].

    Equation (7) can be generalized by introducing hidden vari-

    ables to induce nonconvex/multimodal categorical partitions

    of Xk . One such generalization is the multimodal softmax

    (MMS) model [36], which represents each observable class

    j {1, . . . , m} as a collection of sj hidden subclasses de-pendent on Xk that are mutually exclusive and exhaustive,

    where sj 1, andm

    j =1 sj =S is the total number of sub-classes. Let R represent the hidden subclass variable, which can

    take valuesr {1, . . . , S }4, and defineDk to be conditionallyindependent ofXk given R so that P(Dk =j, R= r|Xk ) =P(Dk =j |R= r)P(R= r|Xk ). Furthermore, define (j) tobe the set of all sj subclasses of classj, where (j)

    (c) =

    for j=c. If P(Dk =j |R= r) =I(r (j)) (the indicator

    4Assume without loss of generality that the subclasses are indexed sequen-

    tially in class order.

  • 8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration

    4/18

    192 IEEE TRANSACTIONS ON ROBOTICS, VOL. 29, NO. 1, FEBRUARY 2013

    Fig. 1. (a) Probability surfaces for example softmax likelihood model, where class labels take on a discrete range in{Next To,Nearby,Far From}and/ora canonical bearing {N, NE, E, SE,. . . ,NW}. (b) Probability surfaces for example MMS range-only model, where labels with similar range categoriesfrom (a) are treated as subclasses that define one geometrically convex class (Next To with s1 = 1) and two nonconvex ones (Nearby with s2 = 6and FarFrom withs3 = 8). (c) and (d) Calibration training data and learned MMS range-only probabilities forDk =Nearby for two different human subjects.

    function) andP(R= r|Xk ) is defined via the softmax model,then marginalization ofR fromP(Dk , R|Xk )gives

    P(Dk |Xk ) =S

    r =1

    P(Dk |r)P(r|Xk ) =

    r (j )e

    w Tr x+ brSc= 1e

    w Tc x+bc.

    (9)

    Hence, the MMS likelihood forDk =j givenXk is the sum ofall sj subclass softmax likelihoods that are associated with class

    j. Given an appropriate subclass configuration [s1 , . . . , sm ], (9)can model an arbitrary continuous-to-discrete likelihood func-

    tion using an embedded softmax model to produce piecewise

    linear class boundaries. Fig. 1(b) shows a simple example of

    an MMS model that is derived from the basic softmax model

    in Fig. 1(a). In this example, the MMS subclass weights are di-

    rectly obtained from the model in Fig. 1(a), as any basic softmax

    model can be trivially converted to an MMS model. However,

    it is also generally possible to estimate MMS model parame-

    ters directly from training data using maximum likelihood or

    Bayesian learning techniques, when a basic softmax model is

    unavailable [36]. Fig. 1(c) and (d) shows estimated MMS range-

    only models for two different human sensors using maximum

    likelihood learning with actual data. Thelabeled (Xk , Dk ) train-ing data points shown in these plots were acquired through an

    experimental calibration procedure that requires human sub-

    jects to provideDk observations under controlled conditions,

    where Xk is known exactly. This principled statistical proce-

    dure is very similar to the one described by Kaupp [10] tomodel continuous range-with-bearing human observations, ex-

    cept that discrete multicategorical data are recorded instead of

    continuous data, and nonlinear optimization techniques are used

    for offline softmax/MMS model identification instead of linear

    regression.

    C. Hybrid Bayesian Inference for Soft Data Fusion

    Although softmax-based functions are well suited to model-

    ing P(Dk |Xk ), they unfortunately do not lead to closed-form

    posteriors p(Xk |Dk ) for any choice of p(Xk ). For instance,

    substituting (7) into (6) for anyp(Xk )yields

    p(Xk |Dk ) = 1

    C p(Xk )

    exp

    wTj x + bj

    mh =1

    exp wTh

    x + bh (10)where C=

    p(Xk )exp

    wTj x + bj

    m

    h =1 exp

    wThx + bhdX. (11)

    Equation (10) cannot be represented in closed form since the in-

    tegral for the normalization constant Chas no analytical solution

    for anyp(Xk ). Furthermore, even whenp(Xk ) is a well-behavedpdf such as a uniform or Gaussian pdf, the softmax denomina-

    tor in (10) cannot be absorbed along with the numerator and

    prior into a known parametric pdf family. Therefore, (6) must

    be approximated, as in all hybrid Bayesian inference problems

    that involve continuous-to-discrete dependences [33].

    Although standard EKF/UKF updates are not applicable,grid-based [2], [13] or Monte Carlo particle approximations

    [13], [23], [24] of (6) could be used. Grids naturally support

    recursive Bayesian fusion with arbitrary priors and likelihoods,

    although they scale poorly with state dimension n, do not pro-

    vide a compact posterior representation, and do not mesh easily

    with typical filters fork data (e.g., EKFs/UKFs). Particle filter

    approximations overcome the latter problem, but do not pro-

    vide a compact approximation if many samples are needed. In

    principle, particles could be compressed into a single Gaussian

    pdf for filtering [25], although this leads to significant informa-

    tion loss when (6) is highly non-Gaussian or multimodal. While

    particles could also be compressed to flexible GM pdfs via on-

    line EM learning [25], [26], this is prone to poor local maximaand high computational expense. Particle approximations also

    require special care to ensure accuracy and mitigate undesirable

    phenomena such as sample degeneracy. For instance, the per-

    formance of the standard bootstrap particle filter (BPF) [24] can

    degrade significantly ifn is large or the observation likelihood

    is small, e.g., for a surprising observation [33].

    Another possible approach to hybrid Bayesian inference

    comes from variational Bayes (VB) methods, which attempt

    to maximize the similarity between analytically intractable pos-

    teriors and well-behaved posterior approximation pdfs that

    are defined through freely optimizable parameters [34]. Mur-

    phy proposed a local VB lower bound approximation to (6) for

  • 8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration

    5/18

    AHMEDet al.: BAYESIAN MULTICATEGORICAL SOFT DATA FUSION FOR HUMANROBOT COLLABORATION 193

    the special case of Gaussian priors and m= 2 binary logis-tic likelihood functions [27]; this approximation uses the fact

    that the posterior pdf is well approximated by a Gaussian pdf

    and admits a lower bound to Cthrough a convex lower bound

    to the logistic likelihood function proposed in [37]. While this

    VB approach leads to a scalable, deterministic, and accurate

    Gaussian approximation of the true posterior with guarantees

    onC, it is limited to P(Dk |Xk )for the special case ofm = 2.Furthermore, the VB posterior leads to an optimistic posterior

    covariance estimate, which is highly undesirable in recursive

    state estimation [14]. Bouchard [28] proposes to generalize the

    VB method to softmax likelihoods with m 2, but only consid-ers the dual problem to infer (W, B)from(Xk , Dk )trainingdata for simple softmax models and thus does not generalize

    Murphys method to approximate (6), present solutions to the

    persistent optimistic covariance issue, or consider MMS models

    for multimodal posteriors.

    These issues are tackled next through new hybrid inference

    approximations that not only generalize VB approximations to

    m 2 softmax likelihood functions, but also address the op-timistic VB posterior covariance issue via novel application

    of fast Monte Carlo importance sampling (IS), generalize to

    inference with multimodal posterior distributions (induced by

    non-Gaussian priors and MMS likelihoods), and guarantee con-

    vergence to unique solutions (i.e., no poor local minima). These

    approximations naturally lead to a fusion framework based on

    compact GM pdf approximations, whichare especially desirable

    for humanrobot fusion applications since they 1) lead to com-

    putational costs that scale well with n and number of categories

    mand 2) greatly facilitate online storage, communication, and

    fusion with hardk data.

    III. BASELINEFUSION: GAUSSIAN-SOFTMAXINFERENCE

    A. Baseline Variational Bayes Approximation

    Assume a Gaussian prior p(Xk ) =N(, ) with mean Rn and covariance matrix Rn n , and let P(Dk |Xk )be given by (7) for m 2. The local VB approximation de-rived here uses the fact that the analytically intractable joint pdf

    p(Xk , Dk )in (6) can be well approximated by an unnormalizedGaussian lower bound pdf; this in turn leads to a VB Gaus-

    sian posterior approximation p(Xk |Dk ) upon renormalizationthat also guarantees a lower bound to C. This proposed VB

    inference approach generalizes the method [27] derived for the

    special case ofm = 2.Let f(Dk , Xk ) be an unnormalized Gaussian function that

    approximates the softmax likelihoodP(Dk |Xk ). The joint pdfand normalization constant (11) are approximated as

    p(Xk , Dk ) p(Xk , Dk ) =p(Xk )f(Dk , Xk ) (12)

    C C=

    p(Xk , Dk )dXk . (13)

    Note that p(Xk , Dk ) is an unnormalized Gaussian, since it isthe product of two Gaussians. This permits Cto be evaluated inclosed form as an approximation to the marginal likelihood of

    the discrete observation,C=P(Dk =j ), in (6).

    For m 2, f(Dk , Xk )is derived here via the upper bound tothe problematic softmax denominator in (7) proposed in [28],

    which uses a variational product of m unnormalized Gaus-

    sians. Specifically, for any set of scalars , c and yc for

    c {1, . . . , m}, [28] proves that

    log m

    c= 1e

    y c +

    mc=1

    yc c

    2

    + (c )[(yc )2 2c ]+log(1 + e

    c ) (14)

    where (c ) = 1

    2c

    1

    1 + ec

    1

    2

    , andyc =w

    Tc x + bc .

    The variables and c are free variational parameters; given

    yc , and c are selected to minimize the upper bound in (14),

    thus providing the tightest possible upper bounding approxima-

    tion to the denominator of (7). Assume for now that and care known (the selection of and c is considered in the next

    section). From (7), it follows that

    log P(Dk =j |Xk ) =wT

    j x + bj log

    mc= 1

    ewTc x+bc

    .

    After replacing the second term on the right-hand side with the

    bound in (14), subsequent simplification gives

    f(Dk =j, Xk ) = exp

    gj + h

    Tj x

    1

    2xT Kj x

    wheregj = 1

    2

    bj

    c=j

    bc

    + m

    2 1

    +

    mc=1

    c2

    + (c )[2c (bc )2 ]

    log(1 + ec )

    hj = 1

    2

    wj

    c=j

    wc

    + 2 m

    c= 1

    (c )( bc )wc

    Kj = 2

    mc=1

    (c )wc wTc (15)

    and where f(Dk , Xk ) P(Dk |Xk ) follows from (14). Sincethe prior can also be expressed as

    p(Xk ) = exp

    gp + h

    Tpx

    1

    2xT Kp x

    wheregp =1

    2(log |2| + T Kp )

    hp =Kp , Kp = 1 (16)

    substitution of (16) and (15) into (12) gives the unnormalized

    Gaussian joint pdf approximation

    p(Xk , Dk ) = exp

    gl + h

    Tl x

    1

    2xT Kl x

    gl =gp+ gj , hl =hp+ hj , Kl =Kp + Kj . (17)

  • 8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration

    6/18

    194 IEEE TRANSACTIONS ON ROBOTICS, VOL. 29, NO. 1, FEBRUARY 2013

    Normalization of (17) gives the desired variational Gaussian

    posterior pdf approximation form 2

    p(Xk |Dk ) =N(VB ,VB ) (18)

    where VB =K1l , VB =K

    1l hl . (19)

    The approximate posterior mean and covariance updates in (19)

    for discrete measurements bear close resemblance to the cor-responding continuous measurement updates for the Kalman

    information filter [14]. With this resemblance in mind, an ex-

    amination ofKjand hjsuggests that the softmax weight vectors

    wj determine the average information content about Xk con-

    tained in each category j. This is intuitively reasonable. As

    shown in Fig. 1, large magnitude weights indicate sharp log-

    odds boundaries between classes in (8) (i.e., less ambiguity and

    greater separability between discrete classes as a function of

    Xk ), which leads to more informative updates for Xk since

    p(Xk )is squashed more strongly by P(Dk |Xk )via (6). Notethat VB is also independent of the actual discrete observationD

    k =j , just as covariance/information matrix updates for the

    Kalman filter are independent of observed continuous measure-

    ments.

    Variational Parameter Optimization: Analytical minimiza-

    tion of the right-hand side of (14) with respect to the free varia-

    tional parameters andc gives

    2c =y2c +

    2 2yc (20)

    =

    m 2

    4

    +m

    c= 1 (c )ycmc= 1 (c )

    . (21)

    However, these formulas cannot be used to compute (18) di-

    rectly since yc depends on Xk , which is unobserved. Therefore,

    following the same strategy as [27] for the m= 2case, the vari-ational parameters are chosen to minimize theexpected valueof(14) with respect to the posterior. This is equivalent to maximiz-

    ing theapproximatemarginal log-likelihood of the observation

    Dk =j

    logC= log

    p(Xk , Dk )dXk (22)

    wherelog C log C. Equation (22) can be expressed in closedform via standard Gaussian identities, but direct maximization

    of (22) with respect to and c involves cumbersome cal-

    culation of highly nonlinear gradient and Hessian terms. The

    expectationmaximization (EM) algorithm [34] can instead be

    invoked to iteratively optimize and c via expected values of(20) and (21), while alternately updating p(Xk |Dk ) via sim-ple closed-form expressions. The EM procedure is given in

    Algorithm 1, where the yc terms in (20) and (21) are replaced

    by their expected values under the current p(Xk |Dk ) estimateat each E step

    yc = wTc + bc (23)

    y2c

    = wTc

    VB + VB

    TVB

    wc+ 2w

    Tc VB bc + b

    2c . (24)

    Since (20) and (21) are coupled, an extra iterative resubstitution

    loop is needed for convergence ofc and (nlc =15 iterations

    were sufficient for this papers studies).

    It is straightforward to show that p(Xk |Dk )is log-concave,which means that the exact baseline Gaussian-softmax poste-

    rior is unimodal. Hence, Algorithm 1 satisfies the necessary

    and sufficient condition derived in [38] to guarantee monotonic

    convergence to a unique set of variational parameters for the

    local VB lower bound Gaussian approximation. Convergencecan be gauged by evaluating the change in (22) after each M

    step, where

    logC= yj +m

    c= 1

    1

    2( + c yc )

    (c )[

    y2c

    2 yc + 2 2c ] log(1 + e

    c )

    +

    n

    2

    1

    2

    log

    ||

    VB

    +tr(1VB )

    + ( VB )T 1 ( VB )

    (25)

    and most of the required terms are already used in the E and M

    steps. However, it is often more convenient to monitor conver-

    gence ofVB between iterations so that the lower bound (25)can be evaluated at the end, if desired.

    B. Improved VB Approximation With Importance Sampling

    Fig. 2(a) and (b) shows that (17) is generally a close lower

    bound approximation of the true joint pdf; as shown in Fig. 2(c)

    and (d), a key benefit of the VB approximation is that VB

    closely approximates the true mean of (6) true postupon renor-malization. Loosely speaking, this effect stems from the fact that

    Algorithm 1 returns c and values that maximize the softmax

    lower bound (15) on average; since cand can be uniquely de-

    termined from xkvia (20) and (21), c and

    tend tolie near the

    posterior average (i.e., mean) ofXk .5 However, Fig. 2(c) and (d)

    also shows that since C C, the approximate posterior (18) ob-tained from dividing (17) by Cno longer lower bounds the trueposteriorp(Xk |Dk ). In fact, even ifp(Xk , Dk )andp(Xk , Dk )are quite similar, multiplication of (17) by C1 C1 forces

    5In [38], give a more technically precise explanation of this effect is given

    for the special case of the binary logistic VB lower bound.

  • 8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration

    7/18

    AHMEDet al.: BAYESIAN MULTICATEGORICAL SOFT DATA FUSION FOR HUMANROBOT COLLABORATION 195

    Fig. 2. 1-D Bayesian update example for standard normal Gaussian prior(green) and binary softmax likelihood (blue), showing true posterior (magenta),VB softmax lower bound (black dash), and approximate joint pdf (red dash) for(a) small (soft) softmax weights and (b) large (steep) softmax weights. Renor-malized posteriors are shown in (c) and (d), respectively, with correspondingC

    and C values.

    (18) to be more concentrated around its peak than (6), and there-

    fore, VB is optimistic relative to the true posterior covariancetrue post.

    6 The goodness ofVB can be outweighed by opti-

    mism in VB , since this can lead to severe overconfidence andinconsistencies during recursive Bayesian fusion. The bound

    C Calso produces a small bias in VB relative totrue post .However, the unimodality ofp(Xk |Dk )and the fact that VBis close to true post can be exploited by another fast estima-

    tion procedure to significantly improve VB and VB in (18).Monte Carlo IS [39] is particularly well suited to this end, since

    arbitrary moments of (6) can be quickly estimated using an

    importance distribution q(Xk ) that roughly corresponds to

    (6). Specifically, givenNs samples{xi }Nsi= 1 Xk drawn from

    q(Xk ), IS approximates the expectation of an arbitrary functionz(Xk )with respect top(Xk |Dk )as

    z(Xk ) Ns

    i= 1

    i z(xi ), i p(xi )P(Dk |xi )

    q(xi ) (26)

    where i is the importance weight for sample i, and

    the desired estimates correspond to = Xk and =(Xk )(Xk )T

    . Note that (26) uses the fact that

    p(Xk |Dk ) only needs to be known up to a normalizing con-stant so that the joint pdfp(xi , Dk ) =p(xi )P(Dk |xi ) can beused to compute i (as is standard practice, i are renormalized

    to sum up to 1 [39]). Although q(Xk )can in theory be any pdfthat is easy to sample from and ensures proper support coverage

    of p(Xk |Dk ) (i.e., p(Xk |Dk )> 0 q(Xk )> 0), IS is onlyreliable whenq(Xk )is sufficiently close top(Xk |Dk ). Since

    6

    That is,(true post V B )will be positive semidefinite.

    the true posterior (6) is unimodal and has a mean close to VB ,it is natural to specify q(Xk ) as a unimodal pdf whose meanis parameterized by VB . The prior covariance can also be

    used to constrain the size/shape ofq(Xk ) to ensure adequatecoverage ofp(Xk |Dk ). This is justified since conditioning onDk reduces the uncertainty in the (unimodal) posterior relative

    to the (unimodal) prior such that( true post)is expected tobe positive definite (as in the conventional KF) [14].

    These considerations lead to the VBIS algorithm, proposed

    here to draw upon the strengths of both VB and IS. An outline

    of VBIS is shown in Algorithm 2. The VB estimate in Algorithm

    1 is first used to define q(Xk ), which is then applied to (26) toestimateVBIS and VBIS for the approximation p(Xk |Dk ) =

    N(VBIS ,VBIS ). This work uses

    q(Xk ) =N(VB , ) (27)

    since this pdf is easy to sample from, permits convenient calcu-

    lation ofi , and performs well in practice. Other, more sophis-

    ticated unimodal pdfs could serve as q(Xk )on the basis ofV Band (e.g., heavy-tailed Laplace pdfs or mixture model pdfs).However, compared with (27), the benefits of such alternatives

    can be outweighed by the cost of sampling xi and evaluating

    i , especially if n 2 (e.g., Bessell functions are needed toevaluate a Laplace pdf with covariance).

    C. Likelihood Weighted Importance Sampling

    Another possible IS strategy to compute andis to bypassVB altogether in Algorithm 2 and simply set q(Xk ) =p(Xk ) sothati P(Dk |xi ). This approach, which is popularly knownaslikelihood weighted importance sampling(LWIS) [33], [40],

    also defines the measurement update step of the standard BPF

    [24] and works well ifp(Xk )andp(Xk |Dk )are similar. Whilefaster and nominally more computationally convenient than

    VBIS, LWIS suffers if P(Dk |Xk ) is highly peaked relativetop(Xk )or ifDk is surprising with respect top(Xk )(i.e., theprior and posterior are not close) [33]. In such cases,i 0 formany samples, leading to inconsistent LWIS estimates. LWIS is

    presented here as a common benchmark algorithm to estimate

    complex non-Gaussian densities.

  • 8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration

    8/18

    196 IEEE TRANSACTIONS ON ROBOTICS, VOL. 29, NO. 1, FEBRUARY 2013

    Fig. 3. Synthetic 1-D fusion problem using exact and approximate inference methods. (a) Human observation softmax likelihood curves forP(Dk = j |Xk ).(b)(d) Posterior approximation results for human observations that are progressively more surprising relative to p(Xk )(five sample posterior results shown forLWIS and VBIS Gaussian approximations in each case).

    TABLE IRESULTS FOR1-D FUSIONPROBLEM INFIG. 3

    D. Numerical 1-D Example

    Fig. 3(a) gives a hypothetical 1-D softmax likelihood model

    for a soft human observation Dk , with m= 5 categories relatingto Xk , the location of a static target relative to a static robot. The

    priorp(Xk )at some fixed time step kis shown in gray for threedifferent scenarios in (b)(d), in which k = and the updaterelies solely on Dk . Fig. 3(b)(d) shows the most likely Dk in

    each case relative to the true target location xtrue (black star).

    Moving from (b) to (d), the prior becomes less accurate (i.e.,

    more surprising/inconsistent) compared with xtrue , e.g., due toan inaccurate/highly uncertain target dynamics model.

    Fusion results are shown for exact numerical integration VB,

    VBIS with Ns = 200, and LWIS with Ns = 200. The true meanand variance(,

    2 )of the exact (non-Gaussian) posterior are

    shown in Table I, along with the corresponding estimates and

    MATLAB computation times for each approximation over 50

    runs. The number of EM iterations for VB and VBIS are also

    shown.7 Theeffective sample size (ESS)is provided as a mea-

    sure to sample efficiency, and hence closeness ofq(Xk ) top(Xk |Dk ), for VBIS and LWIS [39].

    In each case, VB is very close to with a small bias,

    while 2

    VB < 2

    . (VBIS , 2

    VBIS ) and (LWIS , 2

    LWIS ) are ac-curate in (b), since Dk is unsurprising with respect to p(Xk ).However, LWIS becomes steadily worse in (c) and (d) since

    p(Xk ) and Dk disagree more, whereas VBIS always main-tains a good approximation with only 200 samples. The poor

    performance of LWIS in (c) and (d) is reflected by its dimin-

    ishing ESS and the inconsistent nature ofLWIS and 2LWIS .

    LWIS improves in (c) with larger Ns , although this has limited

    impact in (d). Setting Ns = 10000 matches the computationtime for VBIS but still yields worse performance (ESS = 200,

    7

    Using a random initial guess for in (21) and a tolerance of 1e-3 on C .

    LWIS =0.70 0.75, 2LWIS = 0.47 0.08) thanVBISwithNs = 200.

    IV. GENERALIZEDFUSION: NON-GAUSSIANPRIOR AND

    MULTIMODALLIKELIHOODINFERENCE

    The assumption thatp(Xk )is Gaussian and thatP(Dk |Xk )is well modeled by a basic softmax likelihood (7) with con-

    vexly separable classes can be easily violated in practical

    humanrobot fusion scenarios. The prior p(Xk ) can be non-

    Gaussian through multimodal intial beliefs, or if Xk evolveswith non-Gaussian/nonlinear dynamics (e.g., unobservable dy-

    namic mode changes), or if updates via k involve non-

    Gaussian likelihoods [2], [23]. Equation (7) can also be in-

    adequate to model P(Dk |Xk ), e.g., soft distance observa-tions are better modeled by nonconvex categorical MMS like-

    lihoods (see Section II). Fortunately, VBIS can be extended

    for recursive Bayesian fusion in such scenarios using GMpdf

    approximations.

    In the sequel, assume thatp(Xk ) in(6) isgiven byan M-termGM

    p(Xk ) =

    Mu =1

    P(u) p(Xk |u) =

    Mu= 1

    cu N(u , u ). (28)

    The hidden discrete variable U takes values u {1, . . . , M },u Rn andu Rn n are, respectively, theuth componentmean and covariance, and the component weights cu R0+

    satisfyM

    u =1 cu = 1. The universal approximation property ofGMswas used in [31] to derive recursive non-GaussianBayesian

    state estimators for continuous sensor data via parallel banks of

    KFs/EKFs. This idea was later extended to incorporate paral-

    lel banks of UKFs [26] and PFs [25]. Due to their beneficial

    statistical properties and high flexibility, such GM filtering al-

    gorithms have since proven useful for many robotic Bayesian

    sensor fusion applications (see [17] and [41]). Thus, any such

  • 8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration

    9/18

    AHMEDet al.: BAYESIAN MULTICATEGORICAL SOFT DATA FUSION FOR HUMANROBOT COLLABORATION 197

    GM filter can be assumed here to approximate (1) and (2) in the

    form of (28), whereMis automatically determined by the GM

    filter to balance computational speed and estimation accuracy.

    Furthermore, it is assumed that the human sensor likelihood

    P(Dk |Xk )is given by the MMS model in (9).

    A. Variational Bayesian Importance Sampling With GaussianMixture Priors and Multimodal Softmax Likelihoods

    An approximation top(Xk |Dk ) is derived by first consideringthe joint pdf givenDk

    p(Dk , Xk , U , R) =P(Dk , R|Xk , U)p(Xk , U)

    = P(Dk , R|Xk )p(Xk , U) = P(Dk |R)P(R|Xk )p(Xk |U)P(U)

    (29)

    where the first line follows from Bayes rule, and the second

    line follows from the conditional independence properties of the

    MMS model (see Section II). Recall from Section II that 1) R

    is a hidden subclass variable with values r {1, . . . , S }, whereeach subclass is deterministically mapped to a single class label

    j {1, . . . , m} for the observation Dk , and 2) (j)denotes theset ofsj subclasses mapping toj, where P(Dk =j |r) =I(r(j)). From the law of total probability, the posteriorp(Xk |Dk )is

    p(Xk |Dk ) =M

    u= 1

    r (j )

    p(Xk |u,r,Dk )P(u, r|Dk ). (30)

    Using Bayes rule and the joint pdf (29), the first term in the

    summand of (30) can be written as

    p(Xk |u,r,Dk ) = P(Dk |r)P(r|Xk )p(Xk |u)P(u)

    P(Dk |r)P(r|Xk )p(Xk |u)P(u)dXk.

    (31)

    Canceling the terms that are independent ofXk gives

    p(Xk |u,r,Dk ) = P(r|Xk )p(Xk |u)P(r|Xk )p(Xk |u)dXk

    (32)

    which is the conditional posterior given Dk =j , mixing com-ponent u, and subclass r (j). Note that the numerator in (32)is the product of a Gaussianp(Xk |u) =N(u , u )and a soft-max likelihood P(r|Xk ), while the denominator is the marginalsubclass rsoftmax observation likelihood under Gaussian com-

    ponent u. Therefore, (32) is a unimodal conditional pdf that canbe well approximated by a Gaussian using the VBIS procedure

    in Algorithm 2 so that

    p(Xk |u,r,Dk ) p(Xk |Dk , u , r) =N(z r ,z r ). (33)

    Next, the second summand in (30) P(r, u|Dk )is

    P(u, r|Dk ) = P(u,r,Dk )

    P(Dk ) (34)

    = P(u,r,Dk )

    Mu =1

    r (j )P(u,r,Dk )

    = 1

    CP(u,r,Dk )

    (35)

    where the numerator can be derived from (29) as

    P(u,r,Dk =j ) =

    p(Xk |u)P(r|Xk )P(Dk =j |r)P(u)dXk

    =P(u)

    p(Xk |u)P(r|Xk )dXk (36)

    where P(u) =cu from (28), and the last line follows fromP(Dk =j |r) = 1 for r (j), by the definition of the MMSmodel. The integral in (36) is also the denominator in (32)

    P(r|u) =

    p(Xk |u)P(r|Xk )dXk =Cr u . (37)

    Substituting these expressions into (36) and then (35) gives

    P(u, r|Dk ) = 1

    C cu Cur . (38)

    Equation (37) is analytically intractable, but can be estimated in

    two ways. First, since VBIS (Algorithm 2) is used to estimate

    (32), (37) can be directly approximated by a corresponding VB

    lower boundCur Cur obtained via(25) in Algorithm 1. In thiscase, the nominal conditioning on Dk =j in (25) is replacedby joint conditioning on U=u and R= r so that individualur ,ur , c,ur , and ur estimates are used in (25) for each

    possibleu and r pairing to computelog Cur . Second, (37) canbe estimated via direct sampling as

    Ps (r|u) = 1

    Nu

    Nul= 1

    P(r|Xk =xl ) (39)

    where {xl }Nul=1 isa setofNusamples drawn directly from the uth

    prior componentN(u , u ). The first approach could bias the

    posterior approximation if the bound Cur Cur is too loose.However, the variance ofPs (r|u) is inversely proportional toP(r|u) and Nu , meaning that (39) can fall below the lowerbound Cur ifP(r|u) is very small (i.e., P(r|u) 0.01) andNu is too small. Thus, to obtain a reasonable estimate, Cur isused to floor (39) as a consistency check for fixed Nu

    P(r|u) max[exp(log Cur ), Ps (r|u)] P(r|u). (40)

    Hence, (38) becomes

    P(u, r|Dk ) 1

    C cu P(r|u) ur (41)

    where C=M

    u= 1

    r (j )

    cu P(r|u). (42)

    Finally, combining (32) and (42) into (30) yields a GM approx-

    imation top(Xk |Dk ) p(Xk |Dk )

    p(Xk |Dk ) =M

    u= 1

    r (j )

    ur N(ur ,ur ) (43)

    =K

    h= 1

    h N(h ,h ) (44)

    withK= sj MGaussian components.

  • 8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration

    10/18

    198 IEEE TRANSACTIONS ON ROBOTICS, VOL. 29, NO. 1, FEBRUARY 2013

    B. Likelihood Weighted Importance Sampling and Variational

    Bayes Gaussian Mixture Fusion

    Algorithm 3 summarizes the generalized VBIS fusion algo-

    rithm. Note that if VBIS in step 4 of Algorithm 3 is replaced by

    LWIS for component h and P(r|u) = Ps (r|u) is instead usedin step 5, an LWIS-based GM approximation to (30) is obtained.

    Likewise, a VB GM approximation is obtained by using only

    Algorithm 1 in step 4 (i.e., ignoring the IS correction) and setting

    P(r|u) = Cur in step 5 (i.e., ignoring step 3). The next exam-ple shows that the VBIS procedure in Algorithm 3 improves

    considerably on both alternatives. Note that the VB, VBIS, and

    LWIS baseline Gaussian approximations from Section III forGaussian priors and softmax likelihoods are special cases of the

    corresponding GM approximations for GM priors and MMS

    likelihoods, withM = 1 andsj = 1 j {1, . . . , m}.1) Numerical 1-D Example:Fig. 4 modifies the previous 1-D

    humanrobot fusion example in Fig. 3 so that p(Xk )is now anM= 4 component multimodal GM (gray) and Dk now takesthe form of a coarse range-only observation with m = 3non-convex categories (Next To, Nearby, Far From). Shown

    are the results of fusing the (surprising) human observation

    Dk =Far From via numerical integration to obtain the exactmultimodal posterior pdf (magenta). Also shown are the full

    8-component GM posterior approximations that are obtained

    with VB and 100 trials of both VBIS (Algorithm 3, Nu =Ns =500) and LWIS (500 samples).

    Due to its brittleness to surprising measurements, LWIS

    clearly fails to approximate the minor posterior modes on the

    positive Xk axis and struggles to approximate the major poste-

    rior modes on the negative Xk axis. The VB GM approximation

    (which required 11-23 EM steps per component) shows con-

    siderable improvement in approximating all posterior modes,

    but it still significantly underestimates all component variances

    as well as the largest component weight on the left. In contrast,

    VBIS provides a very high-fidelity GM approximation to the ex-

    act posterior. Fig. 4 also shows the resulting computation times

    (using unoptimized MATLAB code) and KullbackLeibler di-

    Fig. 4. Synthetic 1-D fusion problem with GM prior and range-only MMSlikelihood model for P(Dk = j |Xk ) derived by grouping the five softmax

    classes in Fig. 3(a) into three MMS classes; sample 8-mixand GM posteriorapproximations shown for D k = Far From (likelihood in red dash), alongwith run time and KLD statistics.

    vergences (KLDs) between the true posterior p(Xk |Dk )(fromnumerical integration) and each GM approximation p(Xk |Dk ),where the KLD is given by

    KL[pp] =

    p(Xk |Dk )log

    p(Xk |Dk )

    p(Xk |Dk )

    dXk (45)

    and smaller KLDindicates thatp(Xk |Dk ) loses less informationfromp(Xk |Dk )(and is therefore a better approximation to the

    true posterior). Clearly, LWIS loses the most information onaverage, while VBIS loses the least. Repeating LWIS with 1500

    samples matches the time required for VBIS with 500 samples,

    but only reduces the LWIS KLD by about half. In addition,

    the VBIS KLD increases to 0.23 0.20 if the direct-samplingestimate Ps (r|u)ofCr u is only used in step 5 of Algorithm 3(i.e., ifCr u from VB is ignored), since Ps (r|u)underestimatesthe weights for the minor GM posterior modes on the positive

    Xk axis. This shows that the VB bounds Cr u help improve theposterior GM weight estimates in (40).

    C. Practicalities

    1) Parallelization:The nested for loops that contain steps 26 of Algorithm 3 can be parallelized into sj M independentVBIS updates. As such, parallelized GM filtering strategies for

    k fusion can be readily adapted to incorporate soft categorical

    measurements via (3) using Algorithm 3. In particular, if GM

    filters are used to approximate (1) and (2), then the complete

    hybrid Bayesian fusion cycle can be implemented as a bank of

    parallel Gaussian filters that are combined to produce a final

    GM posterior approximationp(Xk |1: k , D1: k )at each time stepk.

    2) Mixture Condensation: The number of mixands inp(Xk |1: k , D1: k )grows at each time step k if either sj >1 in Algo-

    rithm 3 or (1) and (2) marginalize out discrete random variables

  • 8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration

    11/18

    AHMEDet al.: BAYESIAN MULTICATEGORICAL SOFT DATA FUSION FOR HUMANROBOT COLLABORATION 199

    Fig. 5. Experimental setup.(a) Pioneer 3-DX robot used forexperiment,featuring Vicon markers foraccuratepose estimation; a HokuyoURG-04 LX lidar sensorfor obstacle avoidance; an onboard Mini ATX-based computer with a 2.00 GHz Intel Core 2 processor, 2 GB of RAM and WiFi networking; and a Unibrain Fire-IOEM Board camera. (b) Base field map used in all search missions, showing locations of six opaque obstacle walls and two generic landmarks. (c) Humanrobotfusion GUI, which runs on a 2.66 GHz Intel Core 2 Duo workstation with 2 GB of RAM.

    (e.g., via GM process/measurement noise models). Standard

    GM compression methods should thus be applied to maintain

    tractability while minimizing a suitable information loss metric

    with respect to the full GM posterior approximation [42], [43].

    3) Component Gating for Skipping Updates: IfPs (r|u) 1from (39), then ur and ur will be very close to u andu .Step 4 of Algorithm 3 can thus be modified to apply a gating

    threshold after step 3 to determine whether the posterior com-

    ponent for the pair (u, r) requires EM iterations for the VBISapproximation. IfPs (r|u) , alternative component updatesvia LWIS or prior equivalence (i.e., ur =u and ur = u )are used, and step 5 becomes P(r|u) = Ps (r|u); otherwise,steps 4 and 5 are carried out with VBIS as usual. Note that

    should be set close to 1 (e.g., = 0.9999) to ensure only those

    components that are definitely not worth updating by VBIS areconservatively skipped.

    V. COOPERATIVEMULTITARGETSEARCHEXPERIMENTS

    As discussed in [11], [12], and [17], humanrobot informa-

    tion fusion is particularly relevant for cooperative target search

    applications such as coordinated search and rescue, large-scale

    surveillance, and urban reconnaissance. To provide practical

    insight on the utility of the proposed soft human information

    fusion approach, an experimental application to cooperative in-

    door target search missions was conducted with a real human

    robot team.8

    A. Problem Setup

    A single human agent and a single autonomous mobile robot

    were tasked with finding and correctly identifying five hidden

    targets as quickly as possible under a fixed time constraint.

    Fig. 5(b) shows the base map of the 5 m 10.5 m indoor areawhich is used to conduct the multiple search mission experi-

    ments, which featured several movable obstacle walls and two

    8Similar experiments with 16 different human users were conducted in [44]to examine sensitivity to P(Dk |Xk ); although not discussed here, the resultsfrom that study corroborate this papers findings on the utility of the proposed

    fusion approach.

    generic landmarks. The walls are placed such that the human

    (who remained seated off field at a computer) could only see a

    small portion of the search area by direct line of sight. The five

    targets were static orange traffic cones labeled with unique ID

    numbers (1 through 5) that were hidden at various locations that

    differed across four separate search missions.

    Each target locationXt R2 is modeled by a GM prior

    p(Xt ) =

    Mtu = 1

    ctu N(tu ,

    tu ) (46)

    where the number of targets is known a priorifor t {1,.., 5}andp(X1 ) =p(X2 ) = = p(X5 )at mission start; these pri-ors are detailed in Section V-C. Each p(Xt ) is updated over

    time using one or both of the following information sources.1) 1: k : the set of all detection/no detection observations made

    by the robots visual target detector, and 2) D1: k : the set of all

    soft target location data provided by the human.

    Fig. 5(a) shows the pioneer 3-DX autonomous mobile robot

    that is used in the experiment. The robot is equipped with a

    camera and vision-processing software that detects orange traf-

    fic cones up to a 1 m range with a 42.5 field of view at 2 Hz.

    The robot moves at a constant speed of 0.3 m/s with a known

    map of the search area and highly accurate pose data from Vi-

    con motion tracking. The robot autonomously navigates toward

    intermediate search points (i.e., goal locations) based on the

    updatedcombined undetected target posterior GM pdf

    p(Xcombk ) =tTk

    1

    |Tk | p(Xt |1: k , D1: k ) (47)

    whereTk is the set of undetected targets at time k . As in [2],

    the target pdfs are used to autonomously plan search paths us-

    ing a simple suboptimal greedy strategy. Equation (47) is first

    discretized to select the highest value (nonobstacle) grid cell

    defining the robots next search point; the robot then creates

    and follows a path using the D algorithm to ensure that this

    point lies at the center of the target detector likelihood function,

    shown in Fig. 6(e). The robot immediately repeats this planning

    procedure whenever it either reaches its current search point

  • 8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration

    12/18

    200 IEEE TRANSACTIONS ON ROBOTICS, VOL. 29, NO. 1, FEBRUARY 2013

    Fig. 6. (a) Example GM target location prior. (b)(d) Base MMS models for prepositions. (e) MMS model for camera detection likelihood. (f)(i) Posterior GMsfrom VBIS after fusingDk in (b)(d) with GM prior in (a). (j) Posterior GM from LWIS fork =No Detection report with GM prior in (a).

    without detecting anything or if it receives new information

    from the human (described below). While other search strate-gies could be used, this approach works well and is tied to

    searches that are based on model predictive control [2], [23].

    Comparisons with other search methods are beyond the scope

    of this study.

    The human remains seated at a computer station facing the

    field [coordinates x= 0.8 m, y= 3.3 m in Fig. 5(b)] andcommunicates with the robot through the graphic user interface

    (GUI) shown in Fig. 5(c). The human has two tasks: 1) classify-

    ing detections by the robot as either false alarms or actual targets,

    and 2) voluntarily modifying the target GM pdfs via soft infor-

    mation messages Dk . For task 1, the robot streams processed

    camera images at 1 Hz to the GUI and pauses to report visual

    target detections. If the human declares a false alarm, the robot

    notes the objects location to prevent reacquisition. Otherwise,

    the robot localizes the target via laser and camera data, and the

    GM for the identified target t is removed from (47). For task 2

    (the focus of this study), the human can use direct observations

    of the field and the robots camera feed to send messages that

    update (47) (detailed below). The human also has access to a

    2-D surface plot of (47) overlaid on a labeled map of the search

    area so that consistent contextual information is available for fu-

    sion. The human can only send information andcannotdirectly

    command the robot. However, the robot automatically replans

    whenever a new Dk is fused, since the maximum of (47) can

    change significantly.

    B. Online Measurement Updates

    Each p(Xt |1: k , D1: k ) is recursively updated online via (2)and (3); (1) is not needed since the targets are all static. The

    k updates are skipped for false alarms (assumed to be filtered

    out perfectly by the human), whileDk updates occur as human

    messages arrive spontaneously.

    1) Robot Visual Detection Model and k Updates: The

    robots target detector likelihood P(k |Xt ) is a hybrid prob-abilistic mapping from Xt to a discrete observation k

    {No Detection, Detection}. As such,P(k |Xt

    )is well ap-

    TABLE IIHRI GUI CODEBOOKCHOICES

    proximated by the 2-D MMS model shown in Fig. 6(e), which

    describes the No Detection class likelihood with a high prob-

    ability outside the vision cone. The parameters for this model

    were learned offline and shifted online to account for the robotspose and known occlusions (e.g., walls). Since P(k |Xt )is anMMS model, the inference methods in Section IV obtain a GM

    approximation to (2). LWIS GM fusion with 1000 samples per

    component update and a component gate of = 0.9999 gavesufficiently accurate results, due to the robots slow motion.

    Fig. 6(j) shows an example LWIS GM fusion update with the

    nominal MMS camera model, illustrating the posterior scatter-

    ing effect induced by negative information from No Detec-

    tion updates [2], [22].

    2) Human Observation ModelsandDk Updates: Structured

    three-field messages of the form Dk =(Existence) is (Preposi-tion) (Reference) were sent sequentially by the human, where

    any combination of the predefined codebook entries shown inTable II could be selected in the GUI via mouse. Existenceal-

    lows positive/negative soft observations to be sent, assuming

    each targets ID is unavailable until detection (the data associ-

    ation problem due to this ambiguity is addressed below). Ref-

    erence determines each observations spatial reference point,

    whilePrepositiondetermines the MMS model to use for modi-

    fying each target GM given the Existenceand Referencefields.

    This study used three categorical ranges and two categorical

    bearings, givingDk 90 distinct realizations.

    Base MMS models for Preposition entries were learned

    offline with training data from the single human user who

    performed all missions in this study. Fig. 6(b)(e) shows the

  • 8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration

    13/18

    AHMEDet al.: BAYESIAN MULTICATEGORICAL SOFT DATA FUSION FOR HUMANROBOT COLLABORATION 201

    Fig. 7. True target locations and initial GM priors for Mission 4, showing(a) uniform and (b) bad search priors. The uniform GM prior in (a) isthe same in all four search missions and the bad priors for Missions 13 arequalitatively similar to (b).

    TABLE IIIEXPERIMENTAL SEARCHMISSIONMATRIX

    resulting models, whose origins all correspond to a nominal

    (0, 0)Reference Locationposition inXt space. The weights ofthese base models are shifted/rotated onlineto be consistent with

    the desired Referenceorigin/orientation. Negative (i.e., Noth-

    ing is. . .) observations with respect to aPrepositionclassjare

    handled via pseudopositive measurement updates with respect

    to all other classes t=j in the corresponding MMS model.All Dk updates are performed online with VBIS GM fusion

    (Algorithm 3, usingNa =Ns = 500and= 0.9999).3) Data Association: Data association issues arise ifDk is

    not target-specific, i.e., Something is. . . could apply to any

    one target, while Nothing is. . . applies to all targets. Thisambiguity is handled here through the GM-based probabilistic

    data association (PDA) [45], in which (3) is computed for the

    hypothesis thatDk describes targettto givep(Xt |1: k , D1: k ),

    and the prefusion prior GM p(Xt |1: k , D1: k1 ) is assigned toall hypotheses, where Dkdoes not describe t. Marginalizing out

    the association hypothesis gives the updated target t GM

    p(Xt |1: k , D1: k ) =() p(Xt |1: k , D1: k )

    + [1 ()] p(Xt |1: k , D1: k 1 ) (48)

    where = 1ifDk has positive/Something is. . . data (= 0

    otherwise for negative/Nothing is. . . data), and () is the

    Fig. 8. Search performance metrics. (a) and (b) Search mission times (sec)under uniform and bad priors. (c) and (d) Number of targets found per missionunder uniform and bad priors.

    probability of the hypothesis that Dk describes target t. Here,

    (0) = 1 and (1) = 1|Tk | , where |Tk | is the number of unde-

    tected targets at timek . The probability of erroneous/falseDkis assumed to be zero for simplicity.9

    4) Mixture Compression: Following k and Dk updates,

    each p(Xt |1: k , D1: k ) is compressed to M= 15 mixands viaSalmonds joining method [42], which preserves overall GM

    mean and covariance. This requires O(M2o)time forMo initialmixands; therefore, only the 100 highest weighted components

    ofp(Xt |1: k , D1: k )are used for each merging operation.10

    C. Target Priors and Fusion Scenarios

    Four sets of search missions were conducted under the three

    types of sensor fusion modalities and two types of initial target

    GM priors shown in Table III (for a total of 24 search missions).

    The same four missions (each characterized by a different set

    of true target locations) are used to study all cells of Table III.

    Fig. 7(a) and (b) shows the true target locations and priors for

    Mission 4 (other maps and priors are not shown due to limited

    space). The pseudouniform GM prior was the same for all four

    missions; the bad GM priors were highly inconsistent with the

    true target locations in each mission, reflecting worst case search

    scenarios, where a prioriinformation is badly flawed. To sim-

    ulate realistic target discovery with the same human operator,

    positive Dk messages were not sent until targets were actually

    observed by the human during the mission. All missions ended

    if the robot did not find all targets after 15 min (900 s). This

    challenging time constraint was chosen after extensive testing

    9Nonzero probabilities can generally be incorporated into (48) to maintain aGM fusion pdf [32], [45].

    10This typically led to little information loss at each step, since each targetsGM can only have 30255 components following ak orDk update and since

    GM weights above 0.01 are always concentrated in 100 mixands.

  • 8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration

    14/18

    202 IEEE TRANSACTIONS ON ROBOTICS, VOL. 29, NO. 1, FEBRUARY 2013

    Fig. 9. Two-norm of MAP-estimated errors of each targets location over time based on GM in Mission 4 scenarios; black markers on time axis denote instanceswhere Dk is fused and the red dashed line denotes 0.5 m error mark (error traces end if target is detected). Search with Robot Only, Human Only, and HumanWith Robot fusion shown from left to right; uniform and bad GM search priors shown in top and bottom rows, respectively.

    showed it to be the minimum time required for therobots greedy

    search to find all targets in all missions without Dk updates.

    D. Results: Overall Search Performance

    The overall search performance of the humanrobot team is

    gauged here via the search completion time and the number of

    targets detected in each search mission. Fig. 8 shows the results

    for these two metrics over all 24 search missions. Human With

    Robot sensing clearly offers the best overall search perfor-

    mance, since all five targets were always found in each mission

    within 8.513 min. While more targets were found under Hu-

    man Only sensing than under Robot Only sensing (which has

    the worst overall performance), the mission completion times

    for these conditions were about the same. The number of tar-

    gets detected for each of these conditions drops slightly whenmoving from uniform to bad GM priors; in contrast, the prior

    type did not significantly affect performance with Human With

    Robot sensing.

    The Robot Only results underscore the nontrivial nature

    of the search problem and the inadequacy of the greedy search

    strategy when only k is fused. With Dk available, the robot

    was more proficient at detecting targets via the greedy search

    (though performance was not necessarily optimal in any

    sense). The improvement from human input can be explained

    by comparing the typical level of informativeness of each GM

    p(Xt |D1: k , 1: k )over time under various fusion and prior con-

    ditions. Time traces of the MAP-estimated target position error

    t =Xttrue Xt , where Xt = arg maxp(Xt |D1: k , 1: k )for

    all Mission 4 runs are shown in Fig. 9. All Dk fusion instances

    are also shown to illustrate the typical frequency of voluntaryhuman messaging and its influence on each targets posterior

    estimate.

    While the Xt estimates that are derived from the target GMswith Dk fusion are less precise than, say, estimates derived from

    conventional lidar data, Fig. 9, nevertheless, shows that fusion

    of soft human information in Dkhelps substantially improve the

    robots estimated beliefs over time, i.e., even if a target spotted

    by the human is far from the robots own sensor range. Indeed,

    with Dk fused, t

  • 8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration

    15/18

    AHMEDet al.: BAYESIAN MULTICATEGORICAL SOFT DATA FUSION FOR HUMANROBOT COLLABORATION 203

    E. Results: Diversity of Soft Human Sensor Inputs

    The volume and variety of human messages generally in-

    creased under either Human Only or bad prior conditions

    (detailed results are omitted here due to limited space). There

    were also many more positive messages (1376) than negative

    messages (296) over all search scenarios, due to the fact that

    the contributions of positive/Something is. . . messages weredownweighted via the PDA correction in (48), which favors

    the prior GM (i.e., the nonassociation hypothesis) when = 1.Hence, the human often had to resend the same positive Dkmessage two or three times to convince the robot that some-

    thing was in fact somewhere. In contrast, negative/Nothing

    is. . . messages were much less frequent, due in part to the fact

    that they are not downweighted by PDA. This dilution effect on

    positive information could potentially be avoided through the

    use of alternatives to PDA data association, e.g., multiple hy-

    pothesis tracking. However, an evaluation of such alternatives

    is outside the scope of this study.

    F. Further Insights: Complementary Team Behavior

    As noted in [2], a simple greedy search strategy generally

    leads to inefficient back and forth search paths over the map

    as a direct consequence of the scattering effect from k up-

    dates [see Fig. 6(j)]. As such, in Robot Only scenarios, the

    robot frequently jumped from one part of the search map to

    another without searching thoroughly around its goal points,

    leading to slow information gain. Since (47) also diminished

    around missed Xttrue following missed detections, the robot

    could not remedy missed target detections until after greedily

    searching the rest of the map. As Fig. 10 illustrates, in Hu-man With Robot scenarios, the human operator could quickly

    correct missed detections by sending relevant soft information

    that forced the robot to greedily re-examine areas around actual

    target locations.

    While Human Only fusion led to more target detections

    than did Robot Only fusion, one or two targets remained un-

    detected in certain missions, and completion times were not im-

    proved consistently (especially with bad priors). This is largely

    attributable to the coarse nature of the softDk codebook, since

    (47) could not be precisely updated to allow the robot to nudge

    closer toward the target if it was just outside of detection range.

    As Fig. 11 illustrates, the human spent considerable time in Hu-

    man Onlymissions sending manyextra messages to convincethe robot into obtaining a better viewing position to detect some

    targets that were right in front of it and just outside of detec-

    tion range (especially those that were not close to landmarks,

    such as Target 1 in Mission 4). The resulting high volume of

    Dk messages (especially in the bad prior missions) is also evi-

    dent in Fig. 9; these scenarios led to human frustration in some

    cases. However, in Human With Robot cases, scattering via

    k = No Detection observations helped shift (47) closer toany targets just outside of detection range, thereby automati-

    cally refining the target GM pdfs following Dk fusion. This

    also led to smoother interaction between the human and the

    robot, as indicated by the significantly improved mission times

    Fig. 10. Human With Robot fusion sequence showing human correction ofmissed target detection viaDk updates. Sequence length is under 1 min.

    Fig. 11. Human Only fusion sequence showing effects of limited codebookprecision without

    kupdates. Sequence length is almost 4 min.

    and lower message volume/frequency compared with Human

    Only missions.

    These results for the two different fusion conditions indicate

    that the simple codebook used here to generate Dk messages

    produces useful but ultimately limited information to localize

    the targets. To enable reliable target localization without fu-

    sion of more precise k data, the codebook could be refined

    to include more diverse or contextually precise Dk preposi-

    tion/reference primitives. Given that the only reference pointsal-

    lowed in Dkare discrete landmark/wall locations and the robots

    current location, it is not surprising that the set of softmax/MMS

  • 8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration

    16/18

    204 IEEE TRANSACTIONS ON ROBOTICS, VOL. 29, NO. 1, FEBRUARY 2013

    Fig. 12. Logarithm of KLDs for each time step for final/remaining targetposterior pdfs under each fusion condition in Mission 4 under uniform (leftcolumn) and bad priors (right column). (Standard deviations over ten MonteCarlo trials omitted fromDk fusion cases for clarity).

    likelihoods induced by the three range-only and two bearing-

    only preposition classes can be too imprecise for awkwardly

    located targets (e.g., in the corner of the search map away from

    any walls/landmarks).

    G. Results: Accuracy of Approximate Gaussian Mixture

    Posteriors

    To assess the accuracy of online fusion ofk data via LWIS

    and of Dk data via VBIS, the KLD of each GM posterior

    p(Xt |1: k , D1: k ) obtained at every time step k was computedoffline for all search missions with respect to recursive grid-

    based ground-truth fusion posteriors at 0.1 m 0.1 m gridresolution. To further assess the contribution of VBIS for Dkfusion, KLDs were also computed offline for a separate set of

    GM posteriors obtained by using LWIS GM fusion to fuseboth

    1: k and D1: k , with 1000 samples per component update (to

    match the total number of samples used for VBIS Dk updates).The KLDs for both the online LWIS-VBIS and offline LWIS-

    only GM approximations were evaluated over ten independent

    Monte Carlo runs to account for random sampling effects. The

    effects of the robots closed-loop greedy planner were removed

    from the offline fusion results by using the same recorded robot

    trajectories,k andDk data, and mixture management methods

    (PDA, Salmonds method) as for online LWIS k and VBIS Dkfusion.

    Although details are not shown here, only small baseline KLD

    losses arose during Robot Only fusion under uniform and bad

    priors (i.e., due to LWIS fusion ofk alone along with artifacts

    of GM compression), showing that GMs can provide reason-

    able approximations to the exact target location posteriors over

    the course of a full 15-min search mission. The log KLDs in

    Fig. 12(a)(c) and (b)(d) show that the online VBIS and offline

    LWISDk updates generally offer comparable accuracy along-

    side LWIS k fusion. Note that the LWIS Dk fusion results here

    greatly benefit from the JPDA-based positive/Something is. . .

    updates in (48), since a substantial portion of the prior appears

    in the GM posterior (i.e., 1 () [0.5, 0.8]when |Tk |> 1).The KLDs typically spike with Dk updates; the largest upward

    spikes tend to appear after about 10 min (600 s) due to increased

    sensitivity to accumulated information losses from baselinekLWIS fusion. For the Human Only cases in particular, the tails

    and several small components of the true posteriors after many

    Dk messages become very difficult to approximate with only 15

    component GMs. The KLDs for both methods are noticeably

    smaller for Human With Robot fusion, as k helps reduce

    the number ofDk needed to modify the pdfs and thus limits

    the overall complexity of the true posteriors. Nevertheless, such

    spikes are often less than 1.5 log nats for VBIS Dk fusion

    before 600 s; larger KLD spikes typically occur after this time,but are still often less severe than those for LWIS. Indeed, the

    VBIS KLDs with bad priors are either statistically compara-

    ble with or significantly smaller than the corresponding LWIS

    KLDs.12 Fig. 12(d) shows one such discrepancy in accuracy at

    aboutk = 100, where a major GM posterior mode is missed byLWIS but not by VBIS.

    H. Computation and Implementation Considerations

    Although more reliable, the use of EM makes VBIS more ex-

    pensive to implement than LWIS. VBIS required approximately

    7 ms on average per GM component update in these experiments

    using managed C# code, while LWIS required approximately2 ms.13 To overcome the fact that VBEM can converge slowly

    if initialized far from the final solution, several code optimiza-

    tion strategies (not implemented here) could be used, such as

    parallelization of Algorithm 3, clustering of VBEM initializa-

    tions across similar GM components, and use of unmanaged

    pointer arithmetic. Such optimizations were not required for the

    present application, as VBIS did not lead to appreciable delays

    for online operation.

    An important advantage of GM posterior approximations is

    their compactness compared with the offline-computed ground-

    truth discrete grids. A 15-component GM for one target at a

    single time step requires 720 bytes (double precision), whilethe grid requires approximately 52 times as much memory at

    44 064 bytes. Hence, for a full 900-s search mission, the targets

    full posterior time history recorded at 1 Hz requires 0.65 MB

    with a GM, versus 39.66 MB with a grid. This discrepancy is

    even larger ifXk is augmented to include additional states (e.g.,

    vertical displacement and velocities). Such storage costs are

    highly relevant for applications in which pdfs over multiple time

    steps must be stored and/or communicated, e.g., decentralized

    data fusion sensor networks [17]. Note that the development of

    12Determined using KruskallWallis tests with p= 0.01 on the time-averaged log KLD values.

    13

    These times did not increase significantly for = 1.

  • 8/11/2019 2013 Bayesian Multicategorical Soft Data Fusion for HumanRobot Collaboration

    17/18

    AHMEDet al.: BAYESIAN MULTICATEGORICAL SOFT DATA FUSION FOR HUMANROBOT COLLABORATION 205

    sophisticated, yet computationally affordable, online GM com-

    pression methods to avert excessive posterior information loss

    in realistic fusion scenarios (e.g., with hundreds or thousands of

    mixands) is still an active area of estimation research.

    Finally, it is worth considering whether a standard particle

    filtering approach is adequate for humanrobot fusion in place

    of the GM methods that are discussed here. Additional offline

    fusion performance analyses for the multitarget search trials

    were performed with the common BPF [24] using different

    sample sizes (50010 000 particles) and resampling schemes.

    Unlike the GM filtering approaches that are considered here,

    the BPF approximates (46)(48) with weighted samples (drawn

    initially from the prescribed GM priors at k = 0) and performsall Bayesian updates via likelihood weighted IS. Although full

    details are omitted here due to limited space, the BPFs per-

    formance (in terms of robustness, consistency, and estimation

    accuracy) was generally found to be worse across all sample

    sizes compared with the performance of the GM filters. For in-

    stance, the BPFs final value for3 is always about 4 m for the

    Human with Robot Mission 4 trial under the benign Uni-form prior, whereas the VBIS+LWIS GM filters final valueof3 is always about 0.4 m. This behavior can be traced to par-

    ticle degeneracies that arise in the BPF via likelihood weighted

    IS and the BPFs inability to explore new Xk values outside

    its initial sample set. These issues are neatly addressed by the

    proposed GM filter, which also provides a more compact and

    completely continuous approximation of the fusion posterior.

    VI. CONCLUSION

    This paper derived a computationally efficient and accurate

    approximation to the recursive hybrid Bayesian inference prob-lem involved in the dynamic fusion of soft categorical human

    observations with conventional hard robot sensor data. The pro-

    posed VBIS fusion method combines the strengths of fast stand-

    alone variational Bayes and Monte Carlo IS inference approxi-

    mations to obtain consistent Gaussian posteriors in the baseline

    case of Gaussian state priors with softmax likelihood functions.

    VBIS was then extended to derive GM posterior approximations

    for GM priors with MMS likelihood models in order to handle

    more general recursive hybrid data fusion problems. Experi-

    mental multitarget search results for a real humanrobot team

    showed that soft categorical observations from human sen-

    sors, although subject to limited precision and potential data

    association ambiguities, can still be highly useful and informa-tive for recursive Bayesian estimation problems that feature a

    high degree of uncertainty or inconsistency. The results also

    provide valuable practical insight into the reliability of the pro-

    posed VBIS GM approximations under a variety of fusion condi-

    tions, vis-a-vis LWIS GM and grid-based ground-truth approxi-

    mations. Soft categorical human sensor observations can be ex-

    ploited in many different dynamic data fusion domains and are

    particularly convenient in situations where humans must share

    information quickly but do not have enough time to precisely

    estimate states of interest (e.g., the precise distance and bearing

    to a target in meters and degrees, respectively). Although the

    important issues of estimating error/false alarm and likelihood

    model uncertainties for human sensors are not addressed here in

    detail due to limited space, the proposed data fusion framework

    can incorporate these in a fully Bayesian manner [10], [32].

    REFERENCES

    [1] P. Bladon, P. Day, T. Hughes, and P. Stanley, High-level fusion using

    Bayesian networks: Applications in command and control, inProc. Inf.

    Fusion Command Support, 2004, pp. 4.44.18.[2] F. Bourgault, Decentralized control in a Bayesian world, Ph.D. dis-

    sertation, Sch. Aerosp., Mech. Mechatronic Eng., Univ. Sydney, N.S.W.,Australia, 2005.

    [3] T. Fong and I. Nourbakhsh, Interaction challenges in humanrobot spaceexploration, ACM Interact., vol. 12, no. 2, pp. 4245, 2005.

    [4] A. Bauer, K. Klasing, G. Lidoris, Q. Muhlhbauer, F. Rohrmuller, S.Sosnowski, T. Xu, K. Kuhnlenz, D. Wollherr, and M. Buss, The au-tonomous cityexplorer: Towardsnaturalhumanrobotinteraction in urbanenvironments, Int. J. Soc. Robot., vol. 1, no. 2, pp. 127140, 2009.

    [5] T. Nakamura,T. Nagai,and N. Iwahashi,Bagof multimodal LDA modelsfor concept formation, in Proc. IEEEInt. Conf. Robot. Autom., May2011,pp. 62336238.

    [6] E. Topp andH. Christensen, Topological modellingfor human augmentedmapping, in Proc. Int. Conf. Intell. Robots Syst., Beijing, China, 2006,pp. 22572263.

    [7] B. Khaleghi, A. Khamis, and F. Karray, Random finite set theoreticbased soft/hard data fusion with application for target tracking, in Proc.Conf. Multisensor Fusion Integ. Intell. Syst., Salt Lake City, UT, 2010,pp. 5055.

    [8] D. Hall and J. Jordan,Human-Centered Information Fusion. Boston,MA: Artech House, 2010.

    [9] M. Michalowski, S. Sabanovic, C. DiSalvo, D. Busquets,L. Hiatt, N. Mel-chior, and R. Simmons, Socially distributed perception: Grace plays so-cial tag at AAAI 2005, Autonom. Robots, vol. 22, pp. 385397, 2007.

    [10] T. Kaupp, Probabilistic humanrobot information fusion Ph.D. disser-tation, Sch. Aerosp., Mech. Mechatronic Eng., Univ. Sydney, N.S.W.,Australia, 2008.

    [11] M. Lewis, H. Wang, P. Velgapudi,P.Scerri, andK. Sycara, Usinghumansas sensors in robotic search, in Proc. 12th Int. Conf. Inf. Fusion, Seattle,WA, 2009, pp. 12491256.

    [12] F. Bourgault, A. Chokshi, J. Wang, D. Shah, J. Schoenberg, R. Iyer,F. Cedano, and M. Campbell, Scalable Bayesian humanrobot coopera-tion in mobile sensor networks, in Proc. Int. Conf. Intell. Robots Syst. ,2008, pp. 23422349.

    [13] S. Thrun, W. Burgard, and D. Fox,Probabilistic Robotics. Cambridge,MA: MIT Press, 2001.

    [14] Y. Bar-Shalom, X. Li, and T. Kirubarajan,Estimation with Applicationsto Navigation and Tracking. New York: Wiley, 2001.

    [15] T. Kaupp, A. Makaerenko, F. Ramos, B. Upcroft, S. Williams, andH. Durrant-Whyte, Adaptive human sensor model in sensor networks,inProc. 8th Int. Conf. Inf. Fusion, 2005, vol. 1, pp. 748755.

    [16] T. Kaupp, A. Makaerenko, S. Kumar, B. Upcroft, and S. Williams, Oper-ators as information sources in sensor networks, in Proc. IEEE/RSJ Int.Conf. Intell. Robots and Syst., 2005, pp. 936941.

    [17] T. Kaupp, B. Douillard, F. Ramos,A. Makarenko, andB. Upcroft, Sharedenvironment representation for a humanrobot team performing informa-tion fusion, J. Field Robot., vol. 24, no. 11, pp. 911942, 2007.

    [18] M. Skubic, D. Perzanowski, S. Blisard, A. Schultz, W. Adams, M. Buga-

    jska, and D. Brock, Spatial language for humanrobot dialogs, IEEETrans. Syst., Man, Cybern. C, Appl. Rev., vol. 34, no. 2, pp. 154167, May2004.

    [19] A. Huang, S. Tellex,