a simple algorithm for adaptive decision fusion

8/2/2019 A Simple Algorithm for Adaptive Decision Fusion

1/5

TA5 - 10:20A Simple Algorithm for Adaptive Decision Fusion

Qiang Zhu, Xiaoxun Zhu and Moshe KamData Fusion Laboratory

ECE Department, Drexel UniversityPhiladelphia, PA 19104

A b s t r a c tDesign of parallel binary decision fusion systems isoften performed under the assumption that the decisionintegrator (the data fusion center, DFC) possesses perfectknowledge of the local-detector (LD) statistics. In most

studies, other statistical parameters are also assumed to beknown, namely the a priori probabilities of the hypotheses,and the transition probabilities of DFC-LD channels. Underthese circumstances, the DFCs sufficient statistic is aweighted sum of the local decisions. When these statisticsare unknown, we propose to tune the weights on-line, guidedby correct examples or by past experience. We develop asupervised training scheme that employs correct input-outputexamples to train the DFC. This scheme is then made into anunsupervised learning technique by replacing the exampleswith a self-assessment of the DFC, based on its own pastdecisions. In both cases the DFC minimizes the squared errorbetween the actual and the desired values of its discriminantfunction. When supervised, the DFC obtains the desirablevalue from the supervisor. When unsupervised, the DFCestimates the desirable value from its last decision. Thisestimation includes rejection of data that is deemedunreliable.

I. I n t r o d u c t i o nDecision fusion architectures and decentralizedhypothesis testing have been studied since the early 1980s

(e.g., [2], [9], [lo] and [14]), but only recently has adaptivedecision fusion become a topic of interest (e.g., Ill, [71, 181and [16]). The basic parallel decision fusion architecture isillustrated in Fig. 1. A bank of local detectors observes acommon volume of surveillance. Based on its observationvector, % E Rk, the ith detector uses a local decision rule,g i : Rk+ [ -1 , l ) , to decide whether to accept the n u l lh y p o t h e s i s Ho (u i = - l ) , or to accept the a l t e r n a t i v ehypothesis H, (ui = 1). The local observations are assumed tobe statistically independent, conditioned on the hypothesis.The decision vector [ui )c l is then transmitted to a data fus ioncenter (indexed i = 0) that uses a global decision rulego: [ 1, 1}N -+ {-1, 1 , to make a global decision, U,,.

Acknowledgments: This s tudy was supported by theNational Science Foundation under a Presidential YoungInvestigator Award (ECS 057587) and a REU upplement.

1 1Data Fusion

Center

I - 1I IFig. 1: Decentralized detection structure

Simultaneous optimal design of [ g i} c for a globalBayesian objective function was shown to be difficult(Reibman and Nolte [l l] , Thomopoulos et. al. [15]). We usehere the much-simpler objective of designing go optimallywhen [gi}cl are w. Even for this case, which wascompletely solved by Chair and Varshney [2], the DFC stillneeds complete statistical chara cteriza tion of the localdetectors and of the a priori probabilities of the hypotheses.When these are not available, one can resort either to robust(often minimax) techniques ([3], [5], [13]) or to adaptivetechniques.Ex i s t i n g ad ap t iv e fu s io n t ech n iq u es

The first study in adaptive fusion was conducted byAtteson et al. [l]. They have developed local and globaldecision rules for decision makers (local or global) that areignorant of the a priori probabilities and of the local-detectorperformance probabilities. A stochastic-approximation-liketechnique was proposed to es t im ate t h e u n k n o wnprobabilities from past (local and global) decisions. Thisapproach was formalized and improved by Naim and Kam [7].In both studies, the type (but not the parameters) of theobservations distribution function was assumed to beknown.Reibman [9] studied a system where local-detectorstatistics deviate from known values due to the effect of theLD-DFC channels. The DFC estimates actual statistics fromthe available local decisions. An adaptive schem e updates thedensity function of the channel transitions. The prior densityfunction at each time instant is set to the a posteriori densityfunction computed at the previous step. The algorithm can beapplied for supervised learning (when the actual hypothesis isknown), and for unsupervised learning (by using the mostrecent fusion decision as the hypothesis).

1304


2/5

Wissinger and Athans [16] proposed an iterativestochastic-approximation-based algorithm to satisfy thenecessary conditions for optimality, and to calculate theresulting optimal thresholds for all decision makers (localand global). Th e necessary conditions were satisfied bysolving (for each decision maker) a sequence of one-dimensional stochastic approximation subproblems, each ofwhich is coupled to similar subproblems throughout thenetwork. The topology of a three-membe r V structure wasinvestigate d as an examp le. Communication between the LDswas assumed, to allow simultaneous solution of the designequations.

Pados et al. [8] have studied adaptive decision fusion forthe Neyman-Pearson criterion. They have derived a class ofadaptive learning algorithms for pertinent networkparameters, minimizing the Kullback-Leibler distancebetween the distributions of the observed and the desiredoutput. Their work includes a formal convergence study. Arelated methodology was presented in [121.We note that in most of the studies on adaptive fusion

([11, [7], [8] and [16]) stochastic approximation techniqueswere used. Unsu pervise d training algorithm s were proposedin [l], [7] and [lo].

11. Objective of the Presented StudyIn our study, the performance characteristics of the localdetectors and the DFC are unknown, and so are the a prioriprobabilities of the hypotheses. Moreover, no statistics orobservation d is t r ibut ion funct ions are assumed. T h eobjective is to construct a decision rule that will be near-optimal in the Bayesian sense.Chair and Varshney [2] have studied the design of the

optimal Bayesian DFC when all statistics are available. Theyhave minimized the Bayesian risk1 1B = C ijP jP (e IHj),

i d =O

where P j is the a priori probability of hypothesis Hj; P(u, = i IHj) is the probability that the DFC accepts Hi given that Hj istrue; and Cij is the cost associated with accepting Hi whenhypothe sis Hj is true. For C, = C l l = 0 and Col = C,, = 1, theBayesian risk becomes the probability of error , P,. In thesequel, our objective function is P, (generalization of ageneral Bayesian risk is straight-forward).The optimal Bayesian discriminant function go(& is definedas

where

P& IHI)=f i P,,,i)+(l-P,,,i)?,i=1

Ni=I

P& IHI)=n ( P f i ) ~ ( l - P f i ) ~The optimal decision rule that minimizes the probability oferror is of the form

wherew,= l ~ g ( ( ~ - ~ ~ ) ( ~ - ~ " ' ~ ) ) ,= 1, 2, ...,N; andPtiPmi

We collect the optimal weights inWept= [WOW, ...WN].

The relation between discriminantand EO@) n (4) is

(4)

(6 )a l x (N+ l ) vec to r

function go(lh) in (2)

where $(x) is the logistic function

The objective of our weight-tuning rules is to design theweight vector W such that $ ( W a is a "best" mean-squarederror (MSE) approximation of the optimal discriminantfunction go(&. The performance index that we minimize is,therefore,

We shall use two iterative methods to minimize E ~ .These are: (i) a supervised training technique, where aninfallible 'teacher' supplies the correct DFC decision for agiven data vector JL; and (ii) an unsupervised trainingalgorithm using 'bootstrap' self-tuning.

111. MSE Approximation of the DiscriminantFunction - Supervised LearningLike most supervised-learning decision networks, ournetwork will operate in a training phase and a retrieval pha se.In the training phase, we are supplied with examples of local-decision vectors, along with the corresponding correct globaloutputs. A discriminant function of the form + ( W a is createdfor the DFC. An iterative weight-tuning procedure is used tominimize the MSE between this function and gooh). Once allavailable training information was used (or near-optimal

1305


3/5

classification of the training set was achieved), the networkbecomes operational. In the retrieval phase, new vectors oflocal decisions [U , u2 ...UNITare provided to the DFC. TheDFC determines which hypothesis they represent, based onthe approximated discriminant function.Le t d be the desired output for a given local-decisionvector 8. The output d is provided by an infallible teacher,

su ch th a td = 1 i f p c H , a n d d = - l i f p E % . H e nc eP (d =l I@=P(H1 I@, P(d= -1 I@ = P ( b I @, andE(d I& = dP(dl U)=P(H, I& - P&@ = go@ (10)

d(Here, E(.) is the expected-value operator). From (10) weconclude that d is an unbiased estimate of the Bayesiandiscriminant function, go&). Therefore we replace g 0( a in (9)by d , to obtain the objective function

Finally, our supervised training strategy becomes

The weight-tuning procedure during training is depictedin Fig. 2. For a local decision vector U, he DFC e valuates thevalue of the discriminant function $(W&. t then compares itwith the desired value d and adjusts the weights using gradientdescent.

Fig. 2 Design of optimal decentralized decision fusionsystem using supervised trainingThe gradient-descent algorithm that (locally) minimizes

(12) iteratively isW = [ 0 1 ... 11;

where k is the time index, Wk is the weight vector at time k,ukT is the input pattem vector, and p k is the learning rate(between 0 an d 1). W is the initial weight vector,corresponding in our algorithm to.a majority rule and to anunbiased threshold4= 0.

Although the solution of (12) does not directly minimizethe probability of error, it does offer (locally) a bestapproximation to the Bayesian discriminant in the MSEsense.

IV . MSE Approximation of the DiscriminantFunction - Unsupervised LearningWhen the desired output is unavailable, the supervisedtraining scheme becomes impractical. For this situation wedevise an unsupervised learning scheme that employsbootstrapping adaptation. A hyperbolic tangent function

with slope P is used to generate an estimated decision for eachlocal decision vector (Fig. 3). The selection of tanh wa sinspired by the form of the optimal discriminant function inequation (7).

+

Fig. 3: Design of optimal decentralized decision fusionsystem using unsupervised trainingcThe value of d = tanh(PW&, which takes values between-1 and 1, can be considered a soft decision of the D FC, asopposed to its final hard binary decision (-1 or 1). T ogenerate a sequence of reliable decisions as the training

pattern, we use the absolute value of d as a reliability indexfor the decision. More precise ly, when lWul is sufficientlylarge, d = sign(Wu) (little ambiguity); and when lWul is closeto 0, d = 0 (large ambiguity). In the latter case, there isincentive to ignore (during training) these decision vectorswhose d is close to zero, as they represent inconclusive data.

e

c

c

F

The gradient-descent based algorithm for unsupervisedlearning isWO= [ 0 1 ... 11;if tanh(PWkk) .5. (14)1306


4/5

The algorithm uses two designer-specified parameters.The parameter &(O , 1) is a reliability threshold, used toreject bad data. If Id1 = Itanh(PW@l c 6, the training patternwill not be used for training. The parameter j3 controls thetransition region of the soft decision from -1 to +l.Th esoft decision is closer to the binary hard decision forlarger values of p.

F

Extensive simulations demonstrate that when 6 is toosmall (6


5/5

V. ConclusionsWe have pre sented a s imple design procedure for a binarydistributed decision fusion system. The local-decision rulesin this system are fixe d, but the performa nce of the LDs andthe a priori probabilities are unavailab le to the DFC. Ratherthan estimating the unknown probabili t ies, we haveestimated directly the weights of the sufficient statistic.When correct examples were available, we use a minimumMSE algorithm to shape the discriminant function (guided bythe desirable decisions). When training examples were notavailable, we used soft decisions to guide a bootstrappingon-line weight-tuning process. This procedure depends onthe steepness of the discriminant-function approximator (p),

and on the c riterion that was used to reject bad data (6).We have evaluated the tuned systems in numericalsimulations, using the probability of decision-error as aperformance criterion. In all cases, we have used initialweight values that correspond to a majority rule. With

supervised learning, especially at low signal-to-noise ratios(SNRs), the algorithm improved the systems performancegreatly and often the resulting probability of error was equalto that of the optimal omniscient system. Unsupervisedlearning was, of course, less successful, but significantreduction in the probability of error was still obtained (whencompared to the pe rforman ce of the majority rule).Additional investigation is needed to determineconditions for convergence of the tuning rules, and toestimate the domain of attraction of the optimal solution.

References[l] Atteson, K., Schrier, M., Lipson, G. and Kam, M.(1 988): Distributed Decision-making with LearningThreshold Elements, Proceedings of the 2 Fh Conference onDecision and Control, pp. 804-805.[2] Chair, Z. and Varshney, P. K. (1986): Optimal DataFusion in Multiple Sensor Detection Systems, I E E ETransactions on Aerospace and Electronic Systems, Vol. 22,No.1, pp. 98-101.[3] Drakopoulos, E. nd Lee, C. C. (1992): Decision Rulesfor D istributed Decision Networks with U ncertainties, lEEETransactions on Automatic Control, Vol. 37,No . 1, pp. 5-14.[4] Duda, P.O.,art, .E. (1973): attem Classification andScene Analysis, New York: Wiley.[SI Geraniotis, E. and Chau, Y. A. (1988): DistributedDetection of Weak Signals from Multiple Sensors withCorrelated O bservations, Proceedings of the 2Fh Conferenceon Decision and Control, pp. 2501-2506.

[71 Naim, A. and Kam, M.: On-Line Estimation OfProbabi l i t ies for Dist r ibu ted Bayesian Detect ion ,Automatica, Vol. 10.No. 7, 1994.[8] Pados, D., Papantoni-Kazakos, P., Kazakos, D. andKoyiantis, A. (1993): Information Measures for Learning inNeural Binary Classifiers, Proceedings of 2P Conference onDecision and Control, pp. 852-857.[9] Reibman, A. R. (1989): Distributed Detection Using anAdaptive Fusion Processor, Proceed ings of 28lh Conferenceon Decision and Control, pp. 1309-1314.[IO] Reibman A. R. and Nolte, L. W. (1987): Design andPerformance Comparison of Distributed D etection Networks,IEEE Transactions on Aerospace and Electronics Systems,Vol. 23,NO. 6, p. 789-797.[ll] Reibman, A. R. and Nolte, L. W. (1987): OptimalDetection and Performance of Distributed Sensor Systems,IEEE Transactions on Aerospace and Electronic Systems, Vol.23,NO . 1, pp. 24-30.[12] Nedljkovic, V. (1993): A Novel Multilayer NeuralNetworks Training Algorithm that M inimizes the Probabilityof Classification Error, IEEE Transactions on NeuralNetworks , Vol. 4,NO .4, p. 650-659.[131 Sayiner N. and Viswanathan, R. (1989): DistributedDetection in Jamming Environment, Proceedings 23rdConference on Information Sciences and Systems.[14] Tenney R. R. and Sandell , N. R. Jr. (1981):Distributed Detection Networks, IEEE Transactions onAerospace and Electronics Systems, No. 17, p. 501 5 0.[15] Th o m o p o u lo s , S. C., Viswanathan , R. andBougoulias, D. K. (1989): Optimal Distributed DecisionFusion, IEEE Transactions on Aerospace and ElectronicSystems, Vol. 25,No . 5, pp. 761-765.[16] Wissinger , J . and Athans, M. (1993): ANonparametric Training Algorithm for Decentralized BinaryHypothesis Testing Networks, P roceed in gs o f 1993American Control Conference, pp. 176-177.

[6] Kam, M., hang, W. and Zhu, Q. (1991): HardwareComplexity of Binary Distributed Detection Systems withIsolated Local Detectors, IEEE Transactions on Systems,Man and Cybernetics, Vol. 21, No. 3,pp. 565-571.

a simple algorithm for adaptive decision fusion

Documents