training data assisted anomaly detection of multi-pixel

11
3022 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020 Training Data Assisted Anomaly Detection of Multi-Pixel Targets In Hyperspectral Imagery Jun Liu , Senior Member, IEEE, Yutong Feng, Weijian Liu , Senior Member, IEEE, Danilo Orlando , Senior Member, IEEE, and Hongbin Li , Fellow, IEEE Abstract—In this paper, we investigate the anomaly detection problem for widespread targets with known spacial patterns under a local Gaussian model when training data are available. Three adaptive detectors are proposed based on the principles of the generalized likelihood ratio test, the Rao test, and the Wald test, respectively. We prove that these tests are statistically equivalent to each other. In addition, analytical expressions for the probability of false alarm and probability of detection of the proposed detectors are obtained, which are verified through Monte Carlo simulations. It is shown that these detectors have a constant false alarm rate against the covariance matrix. Finally, numerical examples using synthetic and real hyperspectral data demonstrate that these train- ing data assisted detectors have better detection performance than their counterparts without training data. Index Terms—Anomaly detection, hyperspectral images, constant false alarm rate, generalized likelihood ratio test, Rao test, Wald test. I. INTRODUCTION H YPERSPECTRAL imaging sensors collect data in hun- dreds of contiguous spectral bands for each pixel of a scene [1]–[3]. The fine spectral resolution makes it possible to identify objects or features in the scene based on their spectral signatures. One application of hyperspectral data is target de- tection which has been widely investigated and found to be very useful, e.g., defense-and-surveillance, search-and-rescue, and mine exploration systems [2], [4], [5]. From a detection-theoretic point of view, target detection problem is a binary hypothesis test. A typical solution is the Neyman-Pearson (NP) approach [2], which maximizes the detection probability for a preassigned Manuscript received August 1, 2019; revised March 6, 2020 and April 12, 2020; accepted April 13, 2020. Date of publication April 29, 2020; date of current version May 29, 2020. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Elias Aboutanios. This work was supported in part by the Youth Innovation Promotion Association CAS under Grant CX2100060053, in part by the National Natural Science Foundation of China under Grant 61871469, in part by the Fundamental Research Funds for the Central Universities under Grant WK2100000006, and in part by the Key Research Program of the Frontier Sciences, Chinese Academy of Sciences under Grant QYZDY-SSW-JSC035. (Corresponding author: Weijian Liu.) Jun Liu and Yutong Feng are with the Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei 230027, China (e-mail: [email protected]; [email protected]). Weijian Liu is with the Wuhan Electronic Information Institute, Wuhan 430019, China (e-mail: [email protected]). Danilo Orlando is with the Università degli Studi “Niccolò Cusano”, Rome 00166, Italy (e-mail: [email protected]). Hongbin Li is with the Department of Electrical and Computer Engi- neering, Stevens Institute of Technology, Hoboken, NJ 07030 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TSP.2020.2991311 false alarm probability. In recent years, spectral signature-based target detection has been studied by many researchers [3], [6]– [8]. In most of such works, it is assumed that the target spectral signatures are known. The goal is to detect a signal with known signature but unknown amplitude from the background clutter. Many solutions have been developed to solve similar problems in radar target detection, such as Kelly’s generalized likelihood ratio test (GLRT) [9], the adaptive matched filter [10], the adaptive coherence estimator test [11], and the adaptive Rao test [12]. However, in real applications, the effects of atmospheric fluctuations and some other complicating factors may have a strong influence on the detection results [2], [13], [14]. This has led to the consideration of anomaly detection methods which are designed to distinguish unusual targets (anomalies) from the background without target signature references [1], [15]. In hyperspectral imaging applications, anomaly detection can be considered as a particular case of target detection. It tries to de- tect rare objects (anomalies) whose spectral signatures are differ- ent from those of their surroundings, when no a priori informa- tion about the target spectral signatures is available. Anomalies can be defined with respect to a model of the background. Local spectral anomalies can be defined as observations that deviate from the neighboring clutter background, while global spectral anomalies are defined as the pixels whose signatures spectrally distinct from the image-wide clutter background [8], [16]. Both approaches have their advantages and disadvantages (for more detailed discussion, see [1], [8]). The local model method is useful for the background clutter characterization. However, it may be susceptible to false alarms that are isolated anomalies. The global anomaly detection algorithms are not likely to gener- ate this type of false alarms, but it may be incapable of properly detecting isolated targets. In this paper, we focus on the anomaly detection problem based on the local approach. Due to mathematical tractability, most local methods for hyperspectral data rely on a Gaussian model which assumes that the background clutter obeys a real-valued Gaussian mul- tivariate distribution with an unknown covariance matrix. In real hyperspectral data, the Gaussian assumption was found suitable after the data is processed by a spatial sliding window approach [17], [18]. Based on the local Gaussian model, the well-known Reed-Xiaoli detector (RXD) was derived in [18] for detecting multi-pixel targets with known spatial patterns [8], [18]. If the spatial pattern is neglected (i.e., just one anomaly pixel is intended to be detected), the original RXD test variable reduces to the Mahalanobis distance which is also widely used 1053-587X © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Upload: others

Post on 21-Oct-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Training Data Assisted Anomaly Detection of Multi-Pixel

3022 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020

Training Data Assisted Anomaly Detection ofMulti-Pixel Targets In Hyperspectral Imagery

Jun Liu , Senior Member, IEEE, Yutong Feng, Weijian Liu , Senior Member, IEEE,Danilo Orlando , Senior Member, IEEE, and Hongbin Li , Fellow, IEEE

Abstract—In this paper, we investigate the anomaly detectionproblem for widespread targets with known spacial patterns undera local Gaussian model when training data are available. Threeadaptive detectors are proposed based on the principles of thegeneralized likelihood ratio test, the Rao test, and the Wald test,respectively. We prove that these tests are statistically equivalent toeach other. In addition, analytical expressions for the probability offalse alarm and probability of detection of the proposed detectorsare obtained, which are verified through Monte Carlo simulations.It is shown that these detectors have a constant false alarm rateagainst the covariance matrix. Finally, numerical examples usingsynthetic and real hyperspectral data demonstrate that these train-ing data assisted detectors have better detection performance thantheir counterparts without training data.

Index Terms—Anomaly detection, hyperspectral images,constant false alarm rate, generalized likelihood ratio test, Raotest, Wald test.

I. INTRODUCTION

HYPERSPECTRAL imaging sensors collect data in hun-dreds of contiguous spectral bands for each pixel of a

scene [1]–[3]. The fine spectral resolution makes it possible toidentify objects or features in the scene based on their spectralsignatures. One application of hyperspectral data is target de-tection which has been widely investigated and found to be veryuseful, e.g., defense-and-surveillance, search-and-rescue, andmine exploration systems [2], [4], [5]. From a detection-theoreticpoint of view, target detection problem is a binary hypothesistest. A typical solution is the Neyman-Pearson (NP) approach[2], which maximizes the detection probability for a preassigned

Manuscript received August 1, 2019; revised March 6, 2020 and April 12,2020; accepted April 13, 2020. Date of publication April 29, 2020; date ofcurrent version May 29, 2020. The associate editor coordinating the review ofthis manuscript and approving it for publication was Prof. Elias Aboutanios. Thiswork was supported in part by the Youth Innovation Promotion Association CASunder Grant CX2100060053, in part by the National Natural Science Foundationof China under Grant 61871469, in part by the Fundamental Research Fundsfor the Central Universities under Grant WK2100000006, and in part by theKey Research Program of the Frontier Sciences, Chinese Academy of Sciencesunder Grant QYZDY-SSW-JSC035. (Corresponding author: Weijian Liu.)

Jun Liu and Yutong Feng are with the Department of Electronic Engineeringand Information Science, University of Science and Technology of China, Hefei230027, China (e-mail: [email protected]; [email protected]).

Weijian Liu is with the Wuhan Electronic Information Institute, Wuhan430019, China (e-mail: [email protected]).

Danilo Orlando is with the Università degli Studi “Niccolò Cusano”, Rome00166, Italy (e-mail: [email protected]).

Hongbin Li is with the Department of Electrical and Computer Engi-neering, Stevens Institute of Technology, Hoboken, NJ 07030 USA (e-mail:[email protected]).

Digital Object Identifier 10.1109/TSP.2020.2991311

false alarm probability. In recent years, spectral signature-basedtarget detection has been studied by many researchers [3], [6]–[8]. In most of such works, it is assumed that the target spectralsignatures are known. The goal is to detect a signal with knownsignature but unknown amplitude from the background clutter.Many solutions have been developed to solve similar problemsin radar target detection, such as Kelly’s generalized likelihoodratio test (GLRT) [9], the adaptive matched filter [10], theadaptive coherence estimator test [11], and the adaptive Rao test[12]. However, in real applications, the effects of atmosphericfluctuations and some other complicating factors may have astrong influence on the detection results [2], [13], [14]. This hasled to the consideration of anomaly detection methods whichare designed to distinguish unusual targets (anomalies) from thebackground without target signature references [1], [15].

In hyperspectral imaging applications, anomaly detection canbe considered as a particular case of target detection. It tries to de-tect rare objects (anomalies) whose spectral signatures are differ-ent from those of their surroundings, when no a priori informa-tion about the target spectral signatures is available. Anomaliescan be defined with respect to a model of the background. Localspectral anomalies can be defined as observations that deviatefrom the neighboring clutter background, while global spectralanomalies are defined as the pixels whose signatures spectrallydistinct from the image-wide clutter background [8], [16]. Bothapproaches have their advantages and disadvantages (for moredetailed discussion, see [1], [8]). The local model method isuseful for the background clutter characterization. However, itmay be susceptible to false alarms that are isolated anomalies.The global anomaly detection algorithms are not likely to gener-ate this type of false alarms, but it may be incapable of properlydetecting isolated targets. In this paper, we focus on the anomalydetection problem based on the local approach.

Due to mathematical tractability, most local methods forhyperspectral data rely on a Gaussian model which assumesthat the background clutter obeys a real-valued Gaussian mul-tivariate distribution with an unknown covariance matrix. Inreal hyperspectral data, the Gaussian assumption was foundsuitable after the data is processed by a spatial sliding windowapproach [17], [18]. Based on the local Gaussian model, thewell-known Reed-Xiaoli detector (RXD) was derived in [18]for detecting multi-pixel targets with known spatial patterns [8],[18]. If the spatial pattern is neglected (i.e., just one anomalypixel is intended to be detected), the original RXD test variablereduces to the Mahalanobis distance which is also widely used

1053-587X © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.

Page 2: Training Data Assisted Anomaly Detection of Multi-Pixel

LIU et al.: TRAINING DATA ASSISTED ANOMALY DETECTION OF MULTI-PIXEL TARGETS IN HYPERSPECTRAL IMAGERY 3023

for anomaly detection [1], [8], [15], [19]. The RXD has beensuccessfully applied in many real hyperspectral applications andis considered as a benchmark anomaly detector for hyperspectraldata. When the anomaly to be detected is multiple-pixel, co-variance matrix estimation is directly performed over the pixelsunder test (PUTs) in the original RXD [17], [18], [20]. That is,no additional training data are used in the multi-pixel case. Inpractice, additional training data are often available, which canbe collected using a guard window method [3], [8], [21]–[23],and it is a standard practice to employ training data for solvingdetection problems in hyperspectral imagery [1], [2], [15].

In this paper, the anomaly detection problem under a localGaussian model is investigated. We consider the situation fordetecting widespread targets with a known spacial pattern inhyperspectral images when training data are available. Thecorresponding GLRT, Rao test and Wald test are derived. In-terestingly, we find that the three tests are coincident with eachother. In addition, detection and false alarm probabilities ofthe proposed detectors are obtained and validated by numericalsimulations. It is found from these expressions that the proposeddetectors have the CFAR property. Finally, experiments areconducted on synthetic and real hyperspectral images showingthat the proposed detectors outperform the RXD.

The remainder of this paper is organized as follows.1

Section II provides signal models and formulates the detectionproblem. In Section III, we design the GLRT, Rao test andWald test, and prove that the three tests coincide. Analyticalperformance is evaluated in Section IV. Numerical simulationsand experiments on synthetic hyperspectral data are carried outin Section V. Finally, the paper is summarized in Section VI.

II. PROBLEM FORMULATION

We consider the anomaly detection problem for widespreadtargets in hyperspectral images when the training data are avail-able. The problem can be formulated as a binary hypothesistest which is to decide between the null hypothesis H0 and thealternative one H1

H0 :

{rn = xn, n = 1, . . . , N,

rk = xk, k = 1, . . . ,K,(1)

and

H1 :

{rn = xn + sa(n), n = 1, . . . , N,

rk = xk, k = 1, . . . ,K,(2)

1In the following, scalars, vectors and matrices are represented by lightfacedlowercase, boldfaced lowercase and boldfaced uppercase letters, respectively.ln(·) denotes the natural logarithm. E[·] and cov[·] are the statistical expectationand covariance matrix, respectively. The notation � means “defined as”. ∼denotes “be distributed as”. N (µ,R) represents a real-valued Gaussian distri-bution with mean vector µ and covariance matrix R. χ2

n denotes a real centralChi-squared distribution with n degrees of freedom, while χ2

n(ζ) denotes areal non-central Chi-squared distribution with n degrees of freedom and anon-centrality parameter ζ. A−1 and |A| stand for the inverse and determinantof matrix A, respectively. ∂f/∂a denotes the partial derivative of the scalarfunction f with respect to the vector a. vec(C) denotes the vectorization of thematrix C. The symbol (·)T stands for transpose operation. 0M,N denotes theM ×N -dimensional matrices of zeros. IN stands for the N ×N -dimensionalidentity matrix. Finally, R denotes the set of real numbers, and RM×N denotesthe set of M ×N -dimensional real matrices. Γ(·) denotes the gamma function.

where• rn ∈ RM×1, n = 1, . . . , N , are the observed spectral data

containing potential targets (i.e., anomalies), and rk ∈RM×1, k = 1, . . . ,K are training data;

• s ∈ RM×1 denotes the spectral signature and a(n), n =1, . . . , N , are the signal amplitudes;

• M and N denote the number of spectral bands and thenumber of PUTs, respectively.K is the number of availabletraining data with K +N > M ;

• the training data rk ∈ RM×1, k = 1, . . . ,K, and the resid-ual background vectorsxn ∈ RM×1, n = 1, . . . , N , are as-sumed to be the independent identically distributed (i.i.d.)samples from a real-valued multivariate zero-mean Gaus-sian distribution with covariance matrix M. That is{

xn ∼ N (0M,1,M), n = 1, . . . , N,

rk ∼ N (0M,1,M), k = 1, . . . ,K.(3)

The covariance matrix M is assumed to be unknown andneeds to be estimated.

By defining⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩

R = [r1 r2 . . . rN ] ∈ RM×N ,

X = [x1 x2 . . . xN ] ∈ RM×N ,

RL = [r1 r2 . . . rK ] ∈ RM×K ,

XL = [x1 x2 . . . xK ] ∈ RM×K ,

a = [a(1) a(2) . . . a(N)]T ∈ RN×1,

we can recast the binary hypothesis test in a compact form:{H0 : R = X, RL = XL,

H1 : R = X+ saT , RL = XL,(4)

where the vectora stands for the known spacial pattern of targets,and the spectral signature s is unknown.

Note that when K = 0 (which means that the training dataare not available), the model stated above is exactly the oneconsidered by Reed and Yu in [18]. In [18], where the generalizedlikelihood ratio test (GLRT) criterion was used to derive thewell-known RXD, which takes the form

TRXD =(Ra)T

(RRT

)−1(Ra)

aTa

H1

≷H0

λRXD, (5)

where λRXD is a detection threshold. It should be pointed outthat the constraint M < N has to be satisfied in the RXD, whichmight be restrictive in practice.

To the best of our knowledge, the detection problem in (4) forthe case of K > 0 has not been considered before. The purposeof this work is to address the anomaly detection problem byexploiting training data.

III. DETECTOR DESIGN

In this section, we derive the GLRT, the Rao and Wald testsfor the detection problem in (4). As to be shown at the end ofthis section, these tests coincide with each other.

A. GLRT

In the GLRT procedure, we replace the unknown parametersby their maximum likelihood (ML) estimates under each of the

Page 3: Training Data Assisted Anomaly Detection of Multi-Pixel

3024 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020

two hypotheses in the likelihood ratio [9], [24], [25], that is

TGLRT =max{s,M}f1(R,RL)

max{M}f0(R,RL)

H1

≷H0

λG, (6)

where λG is a detection threshold, f0(R,RL) and f1(R,RL)are the joint probability density functions (PDFs) of R and RL

under hypothesis H0 and H1, respectively.According to the model in (4), the joint PDF f0(R,RL) under

hypothesis H0 can be written as

f0(R,RL) = (2π)−M(N+K)

2 |M|−N+K2

× exp

{−1

2tr[M−1

(RRT +RLR

TL

)]}.

(7)

Thus, the ML estimate of the unknown covariance matrix Munder hypothesis H0 can be calculated as

M0 =RRT +RLR

TL

N +K. (8)

Similarly, the joint PDF f1(R,RL) under hypothesis H1 takesthe form

f1(R,RL) = (2π)−M(N+K)

2 |M|−N+K2

× exp

{−12

tr[M−1

(R−saT

) (R−saT

)T+M−1RLR

TL

]}.

(9)

After some algebra, one can verify that the ML estimates of Mand s under hypothesis H1 are

M1 =

(R− s1a

T) (

R− s1aT)T

+RLRTL

N +K, (10)

and

s1 =Ra

aTa, (11)

respectively.Substituting these ML estimates of M into the joint PDFs

f0(R,RL) and f1(R,RL) in (7) and (9), respectively, we havethe below identities

f0(R,RL) = (2π)−M(N+K)

2 |M0|−N+K

2 exp

(−M(N+K)

2

),

(12)and

f1(R,RL) = (2π)−M(N+K)

2 |M1|−N+K

2 exp

(−M(N+K)

2

).

(13)Thus, plugging (12) and (13) into the GLRT defined in (6) andneglecting the exponent (N +K)/2, result in

TGLRT =|M0||M1|

H1

≷H0

λGLRT, (14)

where λGLRT = λ2

N+K

G . Finally, a substitution of (11) into (14)yields the explicit test

TGLRT =

∣∣RRT +RLRTL

∣∣∣∣∣∣∣RRT +RLRTL − RaaTRT

aTa

∣∣∣∣∣H1

≷H0

λGLRT. (15)

Note that when the training data are not available (i.e., K = 0),the term RLR

TL disappears. As a result, the GLRT with K = 0

in (15) reduces to the RXD.

B. Rao Test

Define a parameter vector

θ =[θTr ,θ

Ts

]T=

[sT , vecT (M)

]T, (16)

where θr = s is the relevant parameter and θs = vec(M) isthe nuisance parameter. The real-valued Rao test with nuisanceparameters is given by [26]

TRao =∂lnf1(R,RL)

∂θr

∣∣∣∣T

θ=θ0

[I−1(θ0)

]θrθr

× ∂lnf1(R,RL)

∂θr

∣∣∣∣θ=θ0

H1

≷H0

λR,

(17)

where λR is the detection threshold, θ0 = [θTr0,θT

s0]T is the ML

estimate ofθ under hypothesisH0, I(θ) is the Fisher informationmatrix (FIM), defined as

I(θ) = E

[∂lnf(R; s,Σ)

∂θ

∂lnf(R; s,Σ)

∂θT

], (18)

which can be partitioned into the following form

I(θ) =

[Iθrθr

(θ) Iθrθs(θ)

Iθsθr(θ) Iθsθs

(θ)

]. (19)

Thus, the term [I−1(θ)]θrθrin (17) can be expressed as[

I−1(θ)]θrθr

=[Iθrθr

(θ)− Iθrθs(θ)I−1

θsθs(θ)Iθsθr

(θ)]−1

.(20)

Taking the logarithm of (9) and calculating its derivative withrespect to θr and θT

r , we have the following two identities

∂lnf1 (R,RL)

∂θr= M−1

(R− saT

)a, (21)

and∂lnf1 (R,RL)

∂θTr

= aT(R− saT

)TM−1. (22)

Plugging (21) and (22) into (18) results in

Iθrθr(θ) = aTaM−1. (23)

In a similar manner, one can readily verify that in (19), Iθrθs(θ)

is a null matrix. As a consequence, we have[I−1(θ)

]θrθr

= I−1θrθr

(θ) =M

aTa. (24)

Note that the ML estimate of θ under hypothesis H0 is

θ0 =[0TM,1, vecT (M0)

]T, (25)

where M0 is the ML estimate of the covariance matrix M underhypothesis H0, defined in (8).

Combining (8), (21), (22), (24) and (25) and (17), with theconstant scalar dropped, yields the Rao test

TRao =(Ra)T

(RRT +RLR

TL

)−1(Ra)

aTa

H1

≷H0

λRao, (26)

where λRao = λR/(N+K).

Page 4: Training Data Assisted Anomaly Detection of Multi-Pixel

LIU et al.: TRAINING DATA ASSISTED ANOMALY DETECTION OF MULTI-PIXEL TARGETS IN HYPERSPECTRAL IMAGERY 3025

C. Wald Test

The real-valued Wald test is given by [26]

TWald = θT

r1

([I−1(θ1)θrθr

])−1

θr1

H1

≷H0

λW, (27)

where λW denotes a detection threshold, θ1 = [θTr1,θ

Ts1]

T is theML estimate of θ under hypothesis H1, ([I−1(θ)θrθr

])−1 is theinversion of (20).

Following similar derivation of FIM for the Rao test, one canverify that ([

I−1(θ1)θrθr

])−1

= Iθrθr(θ1). (28)

Moreover, the ML estimate of θ under hypothesis H1 is givenby

θ1 =[sT1 , vecT (M1)

]T, (29)

where s1 and M1 are the ML estimates of the parameters s andM under hypothesis H1, shown in (10) and (11), respectively.

Plugging (10), (11), (23), (28) and (29) into (27), and puttingthe scalar into the threshold, we can get the Wald test as

TWald =(Ra)T

(RP⊥

aRT +RLR

TL

)−1(Ra)

aTa

H1

≷H0

λWald, (30)

where λWald = λW/(N+K), P⊥a is a projection operator, de-

fined as

P⊥a = IN − a(aTa)−1aT . (31)

In summary, we have derived the GLRT, Rao test and Waldtest for the detection problem in (4). The structures of the threeproposed detectors appear different from each other. However,we prove in Appendix A that these three detectors are statisticallyequivalent. Therefore, the GLRT, Rao test and Wald test coincidefor the detection problem in (4). In addition, we can observethat the proposed detector is computationally less efficient thanthe conventional RXD, which is the cost to achieve detectionperformance improvements.

IV. ANALYTICAL PERFORMANCE

In this section, we derive analytical expressions of the proba-bility of false alarm (PFA) and probability of detection (PD) forthe proposed detectors. Due to the equivalence of the three pro-posed tests, we consider the Wald test, and derive its analyticalperformance.

In order to obtain analytical expressions of the PFA and PDfor the Wald test in (30), we first derive an explicit form for theWald test. We start by noting that{

cov[rn|Hi] = M, i = 0, 1,

cov[Rk|Hi] = M, i = 0, 1,(32)

{E[R|H0] = 0M,N ,

E[R|H1] = saT ,(33)

and E[XL|Hi] = 0M,K , i = 0, 1. Perform a whitening proce-dure on R and RL by letting

Z = [z1, z2, . . . , zN ] = M− 12R, (34)

and

ZL = [z1, z2, . . . , zK ] = M− 12RL, (35)

respectively. Then, we have{cov[zn|Hi] = IM , i = 0, 1,

cov[zk|Hi] = IM , i = 0, 1.(36)

In addition, {E[Z|H0] = 0M,N ,

E[Z|H1] = M− 12 saT ,

(37)

and E[ZL|Hi] = 0M,K , i = 0, 1.After the whitening procedure,it is evident that the Wald test in (30) takes the form

TWald =aTZT

(ZP⊥

aZT + ZLZ

TL

)−1Za

aTa. (38)

As derived in Appendix A, the Wald test can be written as

TWald = vT1 Ψ

−1v1, (39)

whereΨ is defined in (60). After the transformations, the covari-ance matrices of vn, n = 1, 2, . . . , N and vk, k = 1, 2, . . . ,Kare the same to those of zn, n = 1, 2, . . . , N and zk, k =1, 2, . . . ,K, whereas the mean values are changed under hy-pothesis H1. That is{

cov[vn|Hi] = IM , i = 0, 1,

cov[vk|Hi] = IM , i = 0, 1,(40)

and E[v1|H1] = E[z1UT |H1] = M− 12 s(aTa)

12 ,{

E[vn|H1] = 0M,1, n = 2, 3, . . . , N,

E[vn|H1] = 0M,1, n = 1, 2, . . . ,K.(41)

For further simplification, the test statistic in (39) can be recastas follows (see Appendix B for the detailed derivations)

TWald =ν

τ

H1

≷H0

λWald, (42)

where• the random variables ν and τ are independent to each other,

and are defined in (76) and (79), respectively;• the distributions of the numerator ν under hypotheses H0

and H1 are

ν ∼{χ2M , underH0,

χ2M (σ), underH1,

(43)

where the non-centrality parameter σ is defined as thegeneralized signal-to-noise ratio (GSNR), given by

σ = E[vT1 |H1]E[v1|H1] = (sTM−1s)|a|2. (44)

• the denominator τ obeys a central Chi-squared distributionwith N +K −M degrees of freedom under both hypoth-esis H0 and H1, that is τ ∼ χ2

N+K−M .In the following, we derive the PFA and PD for the test in

(42). According to [27, p. 52, corollary 2], we can obtain thePDF of the Wald test under hypothesis H1 as

f(TWald|H1) =x

M−22 e

−σ2 1F1

(N+K

2 ; M2 ; σx

2(1+x)

)B(N+K−M

2 , M2

)(1 + x)

N+K2

, (45)

where B(a, b) denotes the beta function, and 1F1(a; b;x) is theconfluent hypergeometric function. If no signal is present, i.e.,

Page 5: Training Data Assisted Anomaly Detection of Multi-Pixel

3026 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020

σ = 0, (45) reduces to the PDF of the Wald test under hypothesisH0, namely,

f(TWald|H0) =x

M−22 1F1

(N+K

2 ; M2 ; 0

)B(N+K−M

2 , M2

)(1 + x)

N+K2

. (46)

Hence, in terms of the PDFs in (45) and (46), the PFA and PDcan be calculated as

PFA =

∫ +∞

λWald

f(TWald|H0)dx, (47)

PD =

∫ +∞

λWald

f(TWald|H1)dx, (48)

respectively. Note that the expression of the PFA in (47) isindependent of noise parameters, thus the proposed detectorsbear the CFAR property.

Interestingly, we can derive the PFA and PD in finite-sumforms for particular cases, by using the results in [28]. Specifi-cally, when M is even, the PFA can be obtained as

PFA =

M2∑

i=1

Γ(N+K

2 −i)

Γ(M2 −i+1

)Γ(N+K−M

2

)λM2 −i

Wald (1 + λWald)i−N+K

2 ,

(49)When N +K −M is even, the PD can be derived as

PD = 1− λM2 −1

Wald (1 + λWald)1−N+K

2

×N+K−M

2∑j=1

Γ(N+K

2

)Γ(M2 +j

)Γ(N+K−M

2 −j+1)λj

Wald

× exp

[ −σ

2(1+λWald)

] j−1∑m=0

1

m!

2(1+λWald)

]m.

(50)

Obviously, these finite-sum expressions are computationallymore efficient than the integral forms for performance evalu-ation.

V. SIMULATIONS AND PERFORMANCE EVALUATIONS

In this section, numerical simulations are conducted to showthe performance of the proposed decision schemes.

A. Simulated Data

The simulations are conducted on Gaussian vectors (the noisevectors) with dimensionM = 6. The number of pixels under testN is assumed to be 9, i.e., N = 9. The noise covariance matrixM is chosen to be Mi,j = ρ|i−j|, where ρ = 0.8. Note that theGSNR is defined in (44). Without loss of generality, the spectralsignature vector s is selected to be s = [1, 1, 1, 1, 1, 1]T , andsignal pattern vector is set to be a = α[1, 1, . . . , 1]T , where theterm α is a positive scalar adopted to control the GSNR.

In the following figures, the lines show the results obtainedfrom the derived theoretical expressions, while the markersshow the results from Mont Carlo (MC) counting techniques.To evaluate the detection probabilities and set the detectionthreshold (for the desired PFA), we resort to 104 and 107 MCtrials, respectively.

Fig. 1. PFA versus threshold for different values of K = 3, 9, 31, 123 (N =9,M = 6). The lines denote the theoretical expressions, and the markers denotethe results obtained from Monte Carlo simulations.

Fig. 2. PD versus the GSNR for M = 6,N = 9. The PFA is set to be 10−5.The lines denote the theoretical expressions, and the markers denote the resultsobtained from Monte Carlo simulations.

In Fig. 1, we illustrate the false alarm regulation for the pro-posed detectors when K = 3, 9, 31, 123. The agreement bete-ween the lines and markers confirms the theoretical result in(47). Furthermore, in order to compare the performance of theproposed detectors and the RXD (i.e., K = 0), we plot thedetection probability as a function of the GSNR in Fig. 2 forK = 0, 3, 9, 31, 123. The false alarm probability is set to be10−5. We can observe that the theoretical results (denoted bythe lines) match the MC ones (denoted by the markers) verywell. In addition, we can observe that for different values ofGSNR, the detection probabilities of the proposed detectors arealways higher than those of the RXD.

In Fig. 3, We plot the receiver operating characteristic (ROC)curves of the RXD (i.e., K = 0) and the proposed detectorsfor the cases of K = 0, 3, 9, 31, 123. The GSNR is fixed to be15 dB. It can be seen that the proposed detectors outperformthe RXD when the training data are employed. Moreover, the

Page 6: Training Data Assisted Anomaly Detection of Multi-Pixel

LIU et al.: TRAINING DATA ASSISTED ANOMALY DETECTION OF MULTI-PIXEL TARGETS IN HYPERSPECTRAL IMAGERY 3027

Fig. 3. ROC curves for the fixed GSNR = 15 dB (M = 6,N = 9). The linesdenote the theoretical expressions, and the markers denote the results obtainedfrom Monte Carlo simulations.

Fig. 4. PD for PFA = 10−5 corresponding to different values of K =3, 9, 17, 35, 71, 123, 139 when GSNR = 15 dB and 18 dB (M = 6,N = 9).The lines denote the theoretical expressions, and the symbols denote the resultsobtained from Monte Carlo simulations.

detection performance of the proposed detectors becomes betteras the number of training data K increases, since the estimationaccuracy of the background covariance matrix improves.

The detection probability as a function of the training datanumber K for the different GSNR is presented in Fig. 4, wherethe PFA is fixed to be 10−5. We can observe that as the number oftraining data increases, the performance of the proposed detectorbecomes better.

B. Synthetic Hyperspectral Image

The synthetic hyperspectral data are applied to conduct exper-iments for the performance assessment of the proposed detec-tors. The hyperspectral data we used are the images of the PaviaCity of Italy2 collected by ROSIS-03, displayed in Fig. 5. These

2The PaviaU hyperspectral data can be downloaded at the website: www.ehu.eus/ccwintco/uploads/e/ee/PaviaU.mat

Fig. 5. The false color image of the ROSIS-03 data.

Fig. 6. Spacial windows adopted in experiment: covariance estimation win-dow (outer solid rectangle), guard window (the dashed rectangle), PUTs (theinner solid rectangle).

images are constituted of 610× 340 pixels and 103 continuousspectral bands. We extract a small part of the images to performthe experiments. The location is marked out by a white rectanglein Fig. 5. The extracted part consists of 32× 32 pixels. In orderto avoid the well-known problem caused by high dimensionality[15], we choose the first continuous 6 bands, i.e., M = 6.

Since raw hyperspectral data is often non-Gaussian [18],[29], [30], one simple method to address the problem is toperform a local demeaning using a sliding window [8], [23],[31]. The classical Gaussianity test “Q-Q plot” for the 32× 32image over the first band indicates that the residual images areapproximately Gaussian.

The training data are generally obtained by using a guardwindow method [8], [23]. We demonstrate the procedure inFig. 6. The covariance estimation window (the outer solid rect-angle in Fig. 6) is adopted to collect the training data in a smallneighborhood of the PUTs, and the guard window (the dashedrectangle in Fig. 6) excludes potential target pixels near thePUTs. As a consequence, the pixels between the guard windowand the covariance estimation window are adopted as the training

Page 7: Training Data Assisted Anomaly Detection of Multi-Pixel

3028 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020

Fig. 7. Experiment results of the RXD and the proposed detectors for differentvalues of GSNR (M = 6,N = 9,K = 32, PFA = 10−5). The pixels withvalues below the threshold are set to zero. (a) anomaly location. (b) and (c)the detection results of the RXD and the proposed detectors when GSNR= 16.19 dB. (d) and (e) the detection results of the RXD and the proposeddetectors when GSNR = 22.21 dB. (f) and (g) the detection results of the RXDand the proposed detectors when GSNR = 24.15 dB.

data. The windows slide pixel by pixel as the location of thePUTs changes. In our experiments, the PUTs are 3× 3 pixels,the size of the covariance estimation window we used is 9× 9pixels, and the the guard window size is 8× 8 pixels. Thus, thenumber of the secondary data is K = 32.

In order to analyze the performance of the proposed detectors,we extract one pixel at a different area (i.e., the white roof area)of the hyperspectral data, and implant it into the test area in all6 bands of the residual images. This implantation approach iswidely used in the community and can be found in, e.g., [15],[32], [33]. We form a 3× 3 (i.e., N = 9) known pattern, givenby ⎡

⎣ 0 1 01 1 10 1 0

⎤⎦ .

Fig. 8. (a) The false color image of the Nuance Cri hyperspectral data. (b) Thetest result of the RXD. (c) The test result of the proposed detector.

That is, in our experiments, the signal pattern vector a is a =αa0, where a0 = [0, 1, 0, 1, 1, 1, 0, 1, 0]T , the scalar α is used tocontrol the GSNR. It should be mentioned that the backgroundcovariance matrix M of the residue subimages is unknown inadvance. Without loss of generality, we replace the unknowncovariance matrix M by its ML estimate M. Thus, the GSNRcan be written as

GSNR = α2|a0|2(sTM−1s). (51)

We consider three different GSNR values: 1) GSNR= 16.19dB;2) GSNR = 22.21 dB; 3) GSNR = 24.15 dB.

For comparison purposes, the RXD and the proposed detec-tors are assessed in our experiments. First, we compute by (47)to determine the theoretical threshold for a given PFA= 10−5.Then, after the test statistic is calculated for each pixel, we setthe pixels with values below the threshold to be zero. The resultsare shown in Fig. 7, where Fig. 7(a) illustrates the location of theimplanted anomaly. Figs. 7(c), (e) and (g) demonstrate that theanomaly can be successfully detected by the proposed detectorswhen the GSNR is set to be 16.19 dB, 22.21 dB and 24.15dB, respectively. In contrast, in Figs. 7(b), (d) and (f), we canobserve that the anomaly target can only be properly detectedby the RXD when the GSNR is relatively high, i.e., 24.15 dB,in the three given GSNR values. Hence, we can conclude thatthe proposed detectors outerperform the RXD, due to the use oftraining data.

C. Real Hyperspectral Data

We next conduct experiments on the Nuance Cri hyperspectraldata which comprise of 400× 400 pixels and 46 continuousspectral bands in the wavelengths from 650 to 1100 nm [6],[34]. The background of the Nuance Cri hyperspectral image isgrass and 10 rocks in the image are considered as anomalies tobe detected.

Page 8: Training Data Assisted Anomaly Detection of Multi-Pixel

LIU et al.: TRAINING DATA ASSISTED ANOMALY DETECTION OF MULTI-PIXEL TARGETS IN HYPERSPECTRAL IMAGERY 3029

Fig. 9. The spatial pattern of the marked target.

First, we implement the local demeaning procedure on theNuance Cri hyperspectral data. Next, we determine the spatialpattern vector a according to the ground-truth of the anomalies.For example, for the anomaly marked by the white rectangle inFig. 8(a), the ground-truth can be described by a matrix with0 and 1 elements, shown in Fig. 9. Thus, the spatial patternvector a can be acquired by reshaping the matrix in Fig. 9 into acolumn vector (in this case, N = 108 and M = 46). Note thatthe spatial pattern vectors a of the 10 anomalies are different,and each of them can be obtained by using this approach. Inour experiments, training data are selected by using the guardwindow method mentioned in Section V-B. We choose the sizeof the guard window and of the covariance estimation window tobe 24× 24 and 25× 25, respectively. Hence, the training datasize is K = 96. For the Nuance Cri hyperspectral data, we setthe PFA to be = 10−4.

For comparison purposes, the performance of the RXD andthe proposed detectors are evaluated in Fig. 8(b) and Fig. 8(c),respectively. It can be seen that the proposed detector cansuccessfully detect all the anomalies, while the RXD withouttraining data cannot detect any of them. This result can beexplained by the fact that the former exploits the training datafor covariance matrix estimation.

VI. CONCLUSION

We have considered the anomaly detection problem forwidespread targets (anomalies) with a given signal pattern inhyperspectral images when training data are available. TheGLRT, Rao test and Wald test have been proposed, and we haveproved that they coincide with each other. In addition, we haveanalyzed the statistical properties of the proposed detectors, andobtained the theoretical expressions for the PFA and PD of theproposed detectors. These expressions reveal that the proposeddetectors bear the CFAR against the noise covariance matrix.Numerical examples based on synthetic and real data show that,when the training data are employed, the performance of theproposed detectors, compared to the conventional RXD, can beimproved.

APPENDIX AEQUIVALENCE OF THE GLRT, RAO TEST AND WALD TEST

In this appendix, we prove that the GLRT, the Rao and Waldtests derived for the detection problem (4) coincide with eachother.

A. Equivalence of the GLRT and Rao test

First, we prove that the GLRT and the Rao test are statisticallyequivalent. Define

YYT =(RRT +RLR

TL

) ∈ RM×M . (52)

Thus, the GLRT and the Rao test in (15) and (26) can be rewrittenas

TGLRT =

∣∣YYT∣∣∣∣∣∣∣YYT − RaaTRT

aTa

∣∣∣∣∣H1

≷H0

λGLRT, (53)

and

TRao =(Ra)T

(YYT

)−1(Ra)

aTa

H1

≷H0

λRao, (54)

respectively.To further simplify (53), we can factor out the determinant of

the M ×M -dimension matrix YYT in the denominator. Thisyields

TGLRT =

∣∣YYT∣∣

|YYT |∣∣∣∣∣IM − (YYT )−1 Ra (Ra)T

aTa

∣∣∣∣∣=

1

1− (Ra)T(YYT

)−1(Ra)

aTa

=1

1− TRao.

(55)

Evidently, the GLRT is equivalent to the the Rao test.

B. Equivalence of the Wald and Rao test

Next, we show that the Wald test is equivalent to the Rao test.Similar to the derivations in (34)-(38), we perform a whiteningprocedure on R and XL. After the whitening procedure, theWald test can be written as (38). Notice that aTa is a positivescalar, thus we can normalize the vector a by letting

a =a

(aTa)12

. (56)

Evidently the Wald test becomes

TWald = aTZT(ZP⊥

aZT + ZLZ

TL

)−1Za. (57)

Define an N ×N orthonormal matrix U = [a,GT ]T , where Gis an (N − 1)×N matrix with orthonormal row vectors suchthat aTGT = 01,N−1. Using the transformation U, we obtain⎧⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

aTUT = [1, 0, . . . , 0],

V � ZUT = [v1,v2, . . . ,vN ],

VL � ZLUT = [v1, v2, . . . , vK ],

Q � UP⊥aU

T =

[0 01,N−1

0N−1,1 IN−1

].

(58)

As a consequence, the Wald test reduces to

TWald = vT1

(VQVT +VLV

TL

)−1v1. (59)

For a further simplification of the test in (59), we separate matrixV into two parts V = [v1, V], where V = [v2, . . . ,vN ]. Thuswe have the following identity

Ψ � VQVT +VLVTL = VVT +VLV

TL , (60)

and the Wald test can be expressed as (39).

Page 9: Training Data Assisted Anomaly Detection of Multi-Pixel

3030 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020

Applying similar approach, one can readily verify that theRao test in (26) can be formulated as

TRao = vT1 (v1v

T1 +Ψ)−1v1. (61)

According to the matrix inversion lemma [35, p. 18], we have

(v1vT1 +Ψ)−1 =

(IM − Ψ−1v1v

T1

1 + vT1 Ψ

−1v1

)Ψ−1. (62)

Finally, a substitution of (62) into the Rao test in (61) yields theexpression

TRao =vT1 Ψ

−1v1

1 + vT1 Ψ

−1v1

=TWald

1 + TWald. (63)

Thus, the Rao test is equivalent to the Wald test.In conclusion, we prove that the GLRT, Rao test and Wald test

coincide with each other because monotonous transformationsdo not change the test statistical properties.

APPENDIX BEQUIVALENT TRANSFORMATION OF THE WALD TEST

In this appendix, we derive an equivalent form of the Waldtest in (39) to facilitate analysis of its statistical properties.

Rewrite the matrix Ψ as Ψ = WWT , where

W = [v2, . . . ,vN , v1, . . . , vK ] ∈ RM×(N+K−1). (64)

Thus, we can rewrite the Wald test in (39) in the following form

TWald = vT1 (WWT )−1v1 = |v1|2Δ, (65)

where γ = v1

‖v1‖ ∈ RM×1 is a normalized vector, and Δ =

γT (WWT )−1γ.Now by conditioning on the vector v1, we can treat γ as a

constant normalized vector. Thus, we can define an orthonormalmatix U1 ∈ RM×M such that [9], [18]

U1γ = [1, 0, . . . , 0]T ∈ RM×1. (66)

Applying the transformation U1 to matrix W, we define

D = U1W

= U1[v2, . . . ,vN , v1, . . . , vK ] ∈ RM×(N+K−1).(67)

Hence, we can rewrite the term Δ as follow

Δ = [1, 0, . . . , 0](DDT

)−1[1, 0, . . . , 0]T . (68)

We partition the matrixD into two partsD = [d1, DT ]T ,where

d1 ∈ R(N+K−1)×1 and D ∈ R(M−1)×(N+K−1). Thus the term(DDT )−1 can be written as

(DDT

)−1=

[dT1 d1 dT

1 DT

Dd1 DDT

]−1

�[b11 bT

21

b21 B22

], (69)

where b1 = dT1 d1 is a scalar, b21 is a column vector of dimen-

sion M − 1, and B22 is an (M − 1)× (M − 1)-dimensionalmatrix.

Applying the partitioned matrix inversion theorem [35, p. 17],the term Δ reduces to

Δ = b11 = (dT1 d1 − dT

1 DT (DDT )−1Dd1)

−1

=1

dT1 P

⊥d1,

(70)

whereP⊥ � IN+K−1 − DT (DDT )−1D is a projection operatorsuch that tr(P⊥) = N+K−M . According to the properties of

the projection matrix, one can verify that P⊥ has N+K−Munity eigenvalues and M−1 zero eigenvalues. Thus, P⊥ can bediagonalized as

Λ = UT2 P

⊥U2

=

[IN+K−M 0N+K−M,M−1

0M−1,N+K−M 0M−1,M−1

],

(71)

where the orthonormal matrix U2 is the modal matrix of P⊥

[26].We now proceed by defining

ξ � Λ12UT

2 d1 = [ξ1, . . . , ξN+K−1]T ∈ R(N+K−1)×1. (72)

It is obvious that the last M−1 elements in ξ equal to zero. Byfixing P⊥ temporarily, we have

1

Δ= dT

1 P⊥d1 = ξT ξ =

N+K−M∑j=1

ξ2j . (73)

Thus, after a substitution of (73) into (65), the Wald test becomes

TWald =vT1 v1

ξT ξ. (74)

In the following, we show that the Wald test in (74) can beexpressed as a ratio of two independent Chi-squared distributedrandom variables. First, note that the distributions of v1 underhypotheses H0 and H1 are{

H0 : v1 ∼ N (0M,1, IM ),

H1 : v1 ∼ N (M− 12 s(aTa)

12 , IM ).

(75)

Hence, the numerator vT1 v1 is distributed as

ν � vT1 v1 ∼

{χ2M , underH0,

χ2M (σ), underH1,

(76)

where the term σ is the non-centrality parameter, given by

σ = E[vT1 |H1]E[v1|H1] = (sTM−1s)|a|2. (77)

Next, notice that the component vectors (i.e., v2, . . . ,vN , v1,. . . , vK ) of the matrix W are independent of each other, andthe mean vectors and covariance matrices of each componentare shown in (41) and (40), respectively. Thus one can readilyverify that the term d1 has a zero mean vector and a identitycovariance matrix under both hypotheses H0 and H1, that isE[d1|Hi] = 0N+K−1,1, i = 0, 1, and cov[d1|Hi] = IN+K−1, i =0, 1. Hence, it is easy to find that, conditioned on v1 and P⊥,ξj , j = 1, . . . , N+K−M are independent zero-mean Gaussianrandom variables under both hypotheses H0 and H1, i.e., theconditioned joint PDF is

fξ(ξ1, ξ2, . . . , ξN+K−M |v1,P⊥) = N (0N+K−M,1, IN+K−M ).

(78)Also, due to the whitening procedure, it is clear that ξ is statisti-cally independent to v1 and P⊥. Thus, the denominator in (74)is Chi-squared distributed with N+K−M degrees of freedomunder both hypotheses H0 and H1, that is

τ � ξT ξ ∼ χ2N+K−M . (79)

Finally, we can equivalently express the test statistic of the Waldtest in (74) as (42).

Page 10: Training Data Assisted Anomaly Detection of Multi-Pixel

LIU et al.: TRAINING DATA ASSISTED ANOMALY DETECTION OF MULTI-PIXEL TARGETS IN HYPERSPECTRAL IMAGERY 3031

REFERENCES

[1] D. W. J. Stein, S. G. Beaven, L. E. Hoff, E. M. Winter, A. P. Schaum, andA. D. Stocker, “Anomaly detection from hyperspectral imagery,” IEEESignal Process. Magazine, vol. 19, no. 1, pp. 58–69, Jan. 2002.

[2] D. Manolakis and G. Shaw, “Detection algorithms for hyperspectralimaging applications,” IEEE Signal Process. Magazine, vol. 19, no. 1,pp. 29–43, Jan. 2002.

[3] H. Li and J. H. Michels, “Parametric adaptive signal detection for hyper-spectral imaging,” IEEE Trans. Signal Process., vol. 54, no. 7, pp. 2704–2715, Jul. 2006.

[4] D. Manolakis, E. Truslow, M. Pieper, T. Cooley, and M. Brueggeman,“Detection algorithms in hyperspectral imaging systems: An overviewof practical algorithms,” IEEE Signal Process. Magazine, vol. 31, no. 1,pp. 24–33, Jan. 2014.

[5] M. T. Eismann, A. D. Stocker, and N. M. Nasrabadi, “Automated hyper-spectral cueing for civilian search and rescue,” Proc. IEEE Proc. IRE*(through 1962), vol. 97, no. 6, pp. 1031–1055, Jun. 2009.

[6] H. Ning, X. Zhang, H. Zhou, and L. Jiao, “Hyperspectral anomaly detectionvia background and potential anomaly dictionaries construction,” IEEETrans. Geosci. Remote Sens., vol. 57, no. 4, pp. 2263–2276, Apr. 2019.

[7] A. W. Bitar, L.-F. Cheong, and J.-P. Ovarlez, “Sparse and low-rank matrixdecomposition for automatic target detection in hyperspectral imagery,”IEEE Trans. Geosci. Remote Sens., vol. 57, no. 8, pp. 5239–5251, 2019.

[8] S. Matteoli, M. Diani, and G. Corsini, “A tutorial overview of anomalydetection in hyperspectral images,” IEEE Trans. Aerosp. Electron. Syst.Magazine, vol. 25, no. 7, pp. 5–28, Jul. 2010.

[9] E. J. Kelly, “An adaptive detection algorithm,” IEEE Trans. Aerosp.Electron. Syst. (1965-present), vol. 22, no. 1, pp. 115–127, Mar. 1986.

[10] F. C. Robey, D. R. Fuhrmann, E. J. Kelly, and R. Nitzberg, “A CFARadaptive matched filter detector,” IEEE Trans. Aerosp. Electron. Syst.(1965-present), vol. 28, no. 1, pp. 208–216, Jan. 1992.

[11] S. Kraut and L. L. Scharf, “The CFAR adaptive subspace detector is a scale-invariant GLRT,” IEEE Trans. Signal Process., vol. 47, no. 9, pp. 2538–2541, Sep. 1999.

[12] A. De Maio, “Rao test for adaptive detection in Gaussian interference withunknown covariance matrix,” IEEE Trans. Signal Process., vol. 55, no. 7,pp. 3577–3584, Jul. 2007.

[13] P. H. Suen, G. Healey, and D. Slater, “The impact of viewing geometry onmaterial discriminability in hyperspectral images,” IEEE Trans. Geosci.Remote Sens., vol. 39, no. 7, pp. 1352–1359, Jul. 2001.

[14] G. Healey and D. Slater, “Models and methods for automated materialidentification in hyperspectral imagery acquired under unknown illumi-nation and atmospheric conditions,” IEEE Trans. Geosci. Remote Sens.,vol. 37, no. 6, pp. 2706–2717, Nov. 1999.

[15] J. Frontera-Pons, M. A. Veganzones, F. Pascal, and J. Ovarlez, “Hyper-spectral anomaly detectors using robust estimators,” IEEE J. Sel. TopicsAppl. Earth Observ. Remote Sens., vol. 9, no. 2, pp. 720–731, Feb. 2016.

[16] Q. Ling, Y. Guo, Z. Lin, and W. An, “A constrained sparse representationmodel for hyperspectral anomaly detection,” IEEE Trans. Geosci. RemoteSens., vol. 57, no. 4, pp. 2358–2371, 2019.

[17] X. Yu, I. S. Reed, and A. D. Stocker, “Comparative performance analysisof adaptive multispectral detectors,” IEEE Trans. Signal Process., vol. 41,no. 8, pp. 2639–2656, Aug. 1993.

[18] I. S. Reed and X. Yu, “Adaptive multiple-band CFAR detection of anoptical pattern with unknown spectral distribution,” IEEE Transactions onAcoustics, Speech, and Signal Processing, vol. 38, no. 19, pp. 1760–1770,Oct. 1990.

[19] C. I. Chang and S. S. Chiang, “Anomaly detection and classification forhyperspectral imagery,” IEEE Trans. Geosci. Remote Sens., vol. 40, no. 6,pp. 1314–1325, Jun. 2002.

[20] X. Yu, L. E. Hoff, I. S. Reed, A. M. Chen, and L. B. Stotts, “Automatictarget detection and recognition in multiband imagery: A unified MLdetection and estimation approach,” IEEE Trans. Image Process., vol. 6,no. 1, pp. 143–156, Jan. 1997.

[21] N. Acito, G. Corsini, and M. Diani, “Adaptive detection algorithm for fullpixel targets in hyperspectral images,” IEE Proceedings-Vision, Image andSignal Processing, vol. 152, no. 6, pp. 731–740, Dec. 2005.

[22] H. Kwon and N. M. Nasrabadi, “Kernel RX-algorithm: A nonlinearanomaly detector for hyperspectral imagery,” IEEE Trans. Geosci. RemoteSens., vol. 43, no. 2, pp. 388–397, Feb. 2005.

[23] N. M. Nasrabadi, “Hyperspectral target detection : An overview of currentand future challenges,” IEEE Signal Processing Magazine, vol. 31, no. 1,pp. 34–44, Jan. 2014.

[24] W. Liu, W. Xie, J. Liu, and Y. Wang, “Adaptive double subspace signaldetection in Gaussian background—Part I: Homogeneous environments,”IEEE Trans. Signal Process., vol. 62, no. 9, pp. 2345–2357, May 2014.

[25] ——, “Adaptive double subspace signal detection in Gaussianbackground—Part II: Partially homogeneous environments,” IEEE Trans.Signal Process., vol. 62, no. 9, pp. 2358–2369, May 2014.

[26] S. M. Kay, Fundamentals of Statistical Signal Processing: DetectionTheory. Upper Saddle River, NJ: Prentice Hall, 1998.

[27] K. S. Miller, Multidimensional Gaussian Distributions. New York: Wiley,1964.

[28] E. J. Kelly, “Finite-sum expression for signal detection probabilities,”Lincoln Laboratory, MIT, Technical Report 566, 1981.

[29] A. Margalit, I. S. Reed, and R. M. Gagliardi, “Adaptive optical targetdetection using correlated images,” IEEE Trans. Aerosp. Electron. Syst.(1965-present), vol. AES-21, no. 3, pp. 394–405, May 1985.

[30] J. Y. Chen and I. S. Reed, “A detection algorithm for optical targets inclutter,” IEEE Trans. Aerosp. Electron. Syst. (1965-present), vol. 23, no. 1,pp. 46–59, Jan. 1987.

[31] J. A. Richards and X. Jia, Remote Sensing Digital Image Processing. NewYork: Springer-Verlag, 1993.

[32] Y. Qu et al., “Hyperspectral anomaly detection through spectral unmix-ing and dictionary-based low-rank decomposition,” IEEE Trans. Geosci.Remote Sens., vol. 56, no. 8, pp. 4391–4405, Aug. 2018.

[33] S. Yang and Z. Shi, “Hyperspectral image target detection improvementbased on total variation,” IEEE Trans. Image Process., vol. 25, no. 5,pp. 2249–2258, May 2016.

[34] Y. Zhang, B. Du, L. Zhang, and S. Wang, “A low-rank and sparse ma-trix decomposition-based mahalanobis distance method for hyperspectralanomaly detection,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 3,pp. 1376–1389, 2015.

[35] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge UniversityPress, 1985.

Jun Liu (Senior Member, IEEE) received the B.S.degree in mathematics from Wuhan University ofTechnology, Wuhan, China, in 2006, the M.S. degreein mathematics from Chinese Academy of Sciences,China, in 2009, and the Ph.D. degree in electricalengineering from Xidian University, Xi’an, China, in2012.

From July 2012 to December 2012, he was aPostdoctoral Research Associate with the Departmentof Electrical and Computer Engineering, Duke Uni-versity, Durham, NC, USA. From January 2013 to

September 2014, he was a Postdoctoral Research Associate with the Departmentof Electrical and Computer Engineering, Stevens Institute of Technology, Hobo-ken, NJ, USA. From October 2014 to March 2018, he was with Xidian University,Xi’an, China. He is currently an Associate Professor with the Department ofElectronic Engineering and Information Science, University of Science andTechnology of China, Hefei, China. His research interests include statisticalsignal processing, image processing, and machine learning. He is currentlyan Associate Editor for the IEEE SIGNAL PROCESSING LETTERS, and amember of the Editorial Board of Signal Processing (Elsevier).

Yutong Feng received the B.Eng. degree from theDepartment of Electronic Engineering, Xidian Uni-versity, in 2018. He is currently pursuing the M.S.degree with the Department of Electronic Engineer-ing and Information Science, University of Scienceand Technology of China (USTC). His current re-search interests include multi-channel signal process-ing, detection algorithms for hyperspectral applica-tion, multi-channel speech enhancement.

Page 11: Training Data Assisted Anomaly Detection of Multi-Pixel

3032 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020

Weijian Liu (Senior Member, IEEE) received theB.S. degree in information engineering and M.S. de-gree in signal and information processing, both fromWuhan Radar Academy, Wuhan, China, and the Ph.D.degree in information and communication engineer-ing from National University of Defense Technology,Changsha, China, in 2006, 2009, and 2014, respec-tively. He is currently a Lecturer at Wuhan ElectronicInformation Institute. His current research interestsinclude multichannel signal detection, statistical andarray signal processing. He is currently an Associate

Editor for the Circuits, Systems, and Signal Processing.

Danilo Orlando (Senior Member, IEEE) was bornin Gagliano del Capo, Italy, on August 9, 1978.He received the Dr. Eng. Degree (with honors) incomputer engineering and the Ph.D. degree (withmaximum score) in information engineering, bothfrom the University of Salento (formerly Universityof Lecce), Italy, in 2004 and 2008, respectively.

From July 2007 to July 2010, he worked with theUniversity of Cassino (Italy), engaged in a researchproject on algorithms for track-before-detect of multi-ple targets in uncertain scenarios. From September to

November 2009, he has been Visiting Scientist at the NATO Undersea ResearchCentre (NURC), La Spezia (Italy). From September 2011 to April 2015, hehas worked at Elettronica SpA and was engaged as System Analyst in the fieldof Electronic Warfare. In May 2015, he joined Università degli Studi “NiccolòCusano”, where he is currently Associate Professor. His main research interestsare in the field of statistical signal processing and image processing with moreemphasis on adaptive detection and tracking of multiple targets in multisensorscenarios. He has held visiting positions at the Department of Avionics andSystems of ENSICA (now Institut Supérieur de l’Aéronautique et de l’Espace,ISAE), Toulouse (France) in 2007 and at Chinese Academy of Science, Beijing(China) in 2017-2019. He is Senior Member of IEEE; he has served IEEETRANSACTIONS ON SIGNAL PROCESSING as Senior Area Editor and currentlyis Associate Editor for IEEE Open Journal on Signal Processing, EURASIPJournal on Advances in Signal Processing, and MDPI Remote Sensing. He isalso author or co-author of about 110 scientific publications in internationaljournals, conferences, and books.

Hongbin Li (Fellow, IEEE) received the B.S. andM.S. degrees from the University of Electronic Sci-ence and Technology of China, in 1991 and 1994,respectively, and the Ph.D. degree from the Universityof Florida, Gainesville, FL, in 1999, all in electricalengineering.

From July 1996 to May 1999, he was a ResearchAssistant in the Department of Electrical and Com-puter Engineering at the University of Florida. SinceJuly 1999, he has been with the Department of Elec-trical and Computer Engineering, Stevens Institute

of Technology, Hoboken, NJ, where he is currently the Charles and RosannaBatchelor Memorial Chair Professor. He was a Summer Visiting Faculty Mem-ber at the Air Force Research Laboratory in the summers of 2003, 2004 and2009. His general research interests include statistical signal processing, wirelesscommunications, and radars.

Dr. Li received the IEEE Jack Neubauer Memorial Award in 2013 from theIEEE Vehicular Technology Society, Outstanding Paper Award from the IEEEAFICON Conference in 2011, Provost’s Award for Research Excellence in 2019,Harvey N. Davis Teaching Award in 2003, and Jess H. Davis Memorial ResearchAward in 2001 from Stevens Institute of Technology, and Sigma Xi GraduateResearch Award from the University of Florida in 1999. He has been a memberof the IEEE SPS Signal Processing Theory and Methods Technical Committee(TC) and the IEEE SPS Sensor Array and Multichannel TC, an Associate Editorfor Signal Processing (Elsevier), IEEE TRANSACTIONS ON SIGNAL PROCESSING,IEEE SIGNAL PROCESSING LETTERS, and IEEE TRANSACTIONS ON WIRELESS

COMMUNICATIONS, as well as a Guest Editor for IEEE JOURNAL OF SELECTED

TOPICS IN SIGNAL PROCESSING and EURASIP Journal on Applied Signal Pro-cessing. He has been involved in various conference organization activities,including serving as a General Co-Chair for the 7th IEEE Sensor Array andMultichannel Signal Processing (SAM) Workshop, Hoboken, NJ, June 17-20,2012. Dr. Li is a member of Tau Beta Pi and Phi Kappa Phi.