hyperspectral anomaly detection method based on …akincaliskan.net/publications/spie_2015.pdf ·...

Hyperspectral Anomaly Detection Method Based onAuto-encoder

Emrecan Batı*, Akın Calıskan*, Alper Koz* and A. Aydın Alatan*†

*Center for Image Analysis, Middle East Technical University, Ankara, Turkey;†Electrical and Electronics Engineering, Middle East Technical University, Ankara, Turkey

ABSTRACT

A major drawback of most of the existing hyperspectral anomaly detection methods is the lack of an efficientbackground representation, which can successfully adapt to the varying complexity of hyperspectral images. Inthis paper, we propose a novel anomaly detection method which represents the hyperspectral scenes of differentcomplexity with the state-of-the-art representation learning method, namely auto-encoder. The proposed methodfirst encodes the spectral image into a sparse code, then decodes the coded image, and finally, assesses the codingerror at each pixel as a measure of anomaly. Predictive Sparse Decomposition Auto-encoder is utilized in theproposed anomaly method due to its efficient joint learning for the encoding and decoding functions. Theperformance of the proposed anomaly detection method is both tested on visible-near infrared (VNIR) and longwave infrared (LWIR) hyperspectral images and compared with the conventional anomaly detection method,namely Reed-Xiaoli (RX) detector.1 The experiments has verified the superiority of the proposed anomalydetection method in terms of receiver operating characteristics (ROC) performance.

Keywords: Hyperspectral image, Anomaly detection, Representation learning, Auto-encoder, RX anomalydetector

1. INTRODUCTION

Hyperspectral target detection studies are carried on evolving in two distinct categories. In anomaly detectionmethods, which can be considered as the first category, the detection is assessed by the spectral differencebetween the hyperspectral pixel under test and its surrounding. On the other hand, spectral signature basedtarget detection methods forming the second category do not compare the pixel with its surrounding but insteadwith a previously available spectral signature. Compared to the signature based methods, the anomaly detectionmethods have become more preferable in many applications where prior information is not available.

Anomaly can be defined as the data or observation which does not fit into the general characteristics of a givendata set. Among anomaly detection methods, Reed-Xioloi (RX) method1 has taken a significant attention in theliterature with its pioneer aspect and simplicity. The method utilizes the Mahalanobis distance metric to describethe discrepancy between the target pixel and the background distribution to differentiate the abnormal objects.The algorithm stands as a benchmark anomaly detector, on which several modifications have been implementedso far.2 Kernel-RX3 and Local-RX4 are two examples of such modifications. Kernel-RX makes use of kerneltrick to implicitly transform data into a higher dimensional space and calculate Mahalanobis distance in thatspace, in single step without computing the transformed data. Local-RX models the background distributionfrom pixels spatially close to the pixel under test instead of using the whole hyperspectral image.

A major challenge in these anomaly detectors is an efficient representation of background. When varyingcomplexity of hyperspectral scenes is considered, using a single Gaussian distribution to model the backgroundas in RX method would not be sufficient. Although the complexity of the background can be alleviated byLocal-RX methods, a drawback of such methods is to decrease the number of samples necessary to learn thebackground. Furthermore, the selection of the suitable kernel for Kernel-RX methods is always not a simple taskfor the scenes composed of many different materials.

Further author information: (Send correspondence to E.B.)E.B.: E-mail: [email protected]

Y

Z

Y

Encoder Decoder

(a)

Input Yi Code Z∗i

Yi

α

∏ ∏∑β

System Energy

Code Energy‖Z∗

i ‖1

Encoderge(Yi,We, D)

Decoder EnergyEncoder Energy

Decodergd(Z∗

i ,Wd)

Code predictionZi

(b)

Figure 1: (a) A simple encoder-decoder architecture of auto-encoder and (b) the block diagram of the minimized

system energy in learning phase, where Y , Z and Y denotes the original pixel, coded pixel and reconstructedpixel, respectively.

In this paper, we first propose to represent the hyperspectral scenes of different complexity with the state-of-the-art representation learning method, namely auto-encoder rather than the Gaussian distribution basedapproaches. Auto-encoders are until now used in representation learning as a building block of the deep learningarchitectures5 with applications for natural language processing,6 text retrieval,7 speech recognition8 and manyothers. Auto-encoders learn to efficiently encode a given input in a nonlinear manner to minimize reconstructionerror. Compared to its linear counterparts9 such as principle component analysis (PCA), independent componentanalysis (ICA) and minimum noise fraction (MNF), auto-encoders have been verified to give better performanceswith its sparse-overcomplete representation.10 In such a representation, the anomalies are expected to yieldgreater reconstruction error due to their rareness. Therefore, the reconstruction error can be regarded as ananomaly metric. Based on this idea, we secondly propose in this paper a novel anomaly detection methoddepending on the proposed auto-encoder based representation of hyperspectral image.

The rest of the paper is organized as follows. The proposed method is explained in detail in Section 2. Wethen present the experimental setup and discuss the experiment results in Section 3. Finally, the conclusions aredrawn in Section 4.

2. PROPOSED ANOMALY DETECTION METHOD USING AUTOENCODER

Proposed method consists of two stages (Figure 1a). In the first stage, described as learning phase, each hy-perspectral pixel, Yi, of the input test image is encoded into a code, Zi, and then decoded to reconstruct theoriginal pixel.

The parameters of the encoding and decoding functions are tuned to minimize the cumulative of the energiesdefined for each spectral pixel. We adopt the auto-encoder using the unified energy based framework11 for theencoding and decoding functions. Specifically, predictive sparse decomposition auto-encoder12 is adapted for themultidimensional hyperspectral data in the developed method. In the second phase, the reconstruction error ateach pixel with the learned parameters is assigned to that pixel as a measure of being anomaly.

Before going into the description of the energy function and the learning algorithm, we first present theutilized notation in the paper in parallel with Figure 1b.

• Yi: ith pixel of the hyperspectral image cube of dimension h× w × p. Note that this corresponds to a onedimensional vector with size p.

• Zi: The output of the encoder function with a size nz for the input pixel Yi.

• Z∗i : The optimal code sought for input pixel Yi.

• Yi: Reconstructed pixel from the code Zi.

• ge(Yi,We, D): Encoder function which transforms the input Yi to the code Zi with the parameters We andD. We corresponds to a matrix of size nz × p and D is a diagonal matrix of size nz × nz. For predictivesparse decomposition auto-encoder,12 ge(Yi,We, D) is selected as

ge(Yi,We, D) = D tanh(WeYi). (1)

• gd(Zi,Wd): Decoder function which reconstructs the pixel Yi from the code Zi with the parameter Wd.Wd ia a matrix of size p × nz. For predictive sparse decomposition auto-encoder,12 gd(Zi,Wd) is selectedas:

gd(Zi,Wd) = WdYi. (2)

The energy function that would be minimized for hyperspectral pixels during the learning phase consists ofthree terms (Figure 1b). The first term contributing to the energy function is the encoder energy which describesthe distance between the obtained code and the optimal code. As the encoder energy gets smaller the derivedcode would be more closer to the optimal code. The second term is defined as the decoder energy which indicatesthe success of reconstruction of the hyperspectral pixel from the code Z∗

i . As the decoder energy gets smaller,the reconstructed pixels would be more closer to the original pixels. The final term code energy is L1 norm ofthe generated code which enforces the sparsity of the code. The overall energy of the system is given as12

E(Yi, Zi,We, D,Wd) = α‖Zi − ge(Yi,We, D)‖22 + ‖Yi − gd(Zi,Wd)‖22 + β‖Zi‖1. (3)

The proposed algorithm to find the optimal code, Z∗i , which minimizes the energy function in (3) is as follows:

1. Initialize randomly the parameters We and D of the encoder function and Wd of the decoder function.

2. For a predefined number of iterations,

• Randomly select a group of k pixels from all the pixels of the hyperspectral image.

• Calculate the code Zi for each hyperspectral pixel, Yi, in the selected group.

• Search Z∗i by minimizing the energy in (3) with respect to Zi while We, D and Wd are fixed. In the

current implementation the search method is selected as gradient descent for the simplicity.

• After the optimal code Z∗i is found, fix Zi to Z∗

i and update We, D and Wd to minimize the averageof the energy in (3) calculated over all the pixels in the selected group. The update is performed byusing the gradient descent method. Wd is normalized to a unit norm after the update so as to avoidthe situation where Wd and Z are multiplied and divided by the same constant to decrease the energyin (3).

3. After the iteration stage, which yields the learned parameters We, D and Wd, calculate code Zi by usingencoding function ge.

4. Assign reconstruction error, ‖Yi −WdZi‖22, to each pixel as a measure of being anomaly for that pixel.

In the proposed algorithm encoding function ge is selected as D tanh(WeY ) and decoding function gd isselected as WdY as suggested by Kavukcuoglu et al.12 The number of iterations are adjusted so as to guaranteean early-stopping before the average energy enters to the saturation. The weighting factor α is adjusted to 1 assuggested by Kavukcuoglu et al.12 We choose the β value experimentally as 0.2. The code length nz is selectedas a design parameter to judge the performance of the algorithm.

(a) (b)

Figure 2: (a)RGB representation of VNIR hyperspectral test image and (b) Gray-scale representation of theLWIR hyperspectral image.

3. EXPERIMENTAL RESULTS AND COMPARISONS

The experiments are conducted with two images, one acquired by visible-infrared (VNIR) camera and the otherby long-wave-infrared (LWIR) camera (Figure 2). The VNIR camera has a spectral range of 400-1000 nm andLWIR camera has a spectral range of 7800-11500 nm. The captured hyperspectral VNIR and LWIR images areof the size 885× 1500× 182 and 180× 266× 121, respectively. The scene for the VNIR image contains differentfabrics and construction supplies which are placed on a rural area. The scene is captured from a height of 500 m.The scene for the LWIR image contains a piece of carpet, parquet, polyethylene foam, raincoat, gray aluminumand black aluminum captured from a distance of 10 m on the ground. The images have ground truth data fortesting their performances.

The performance of the proposed algorithm is first evaluated with respect to the size of the code, Z. Based onthis evaluation a suitable value of Z is selected to compare the proposed algorithm with the base line algorithmsin the literature. The methods for comparisons are selected as the variants of Reed-Xioloi (RX) algorithm,namely Global-RX algorithm1 and dual window Local-RX algorithm.4 The comparisons are performed overReceiver Operating Characteristic (ROC) curves which indicates true positive rate (TPR) with respect to thefalse positive rate (FPR). TPR and FPR are defined as

TPR =TP

Nanom, (4)

FPR =FP

Nbg. (5)

where TP represents the anomaly pixels detected given a certain threshold; Nanom represents the total number ofthe anomaly pixels in the image; FP represents the number of pixels detected as anomaly from the background;and Nbg is the number of background pixels.

Figure 3 illustrates ROC curves for the proposed algorithm with different code sizes, nz. As the code sizeincreases, the performance of the algorithm increases as well. This can be reasoned by the more expressivecapacity of the auto-encoder with the larger code size. The outputs of the proposed algorithm for the code sizeof 8 for VNIR hyperspectral image and for code size of 64 for LWIR image is given in Figure 4 and LWIR images.

Figure 5a illustrates generated code vectors, Zi, belonging to anomaly pixels and code vectors correspondingto background pixels in a two dimensional graph for the code size, nz = 2, in order to explore the distributionof the code vectors in their represented space. The same illustration is also given for the code size, nz = 3, in athree dimensional graph given in Figure 5b. Compatible to the concept of anomaly, the pixels belonging to theanomaly are concentrated on the far edges of the graph, which indicates the success of the proposed auto-encoderin filtering the anomalies out.

Figure 6 presents the comparison of the proposed algorithm with Global-RX algorithm1 and dual windowLocal-RX algorithm.4 While the proposed algorithm gives the best ROC performance, it is followed by the

(a) (b)

Figure 3: The ROC curves with respect to different code sizes (a) for VNIR hyperspectral image and (b) forLWIR hyperspectral image.

(a) (b)

Figure 4: The outputs of the proposed algorithm (a) for the code size of 8 for VNIR hyperspectral image and(b) for the code size of 64 for LWIR hyperspectral image.

dual window Local-RX. Global-RX yields the worst performance among the three techniques. As the proposedalgorithm can cope with the scene variability with its adaptable characteristics and tunable code size, it catchesthe dominant and major spectral signatures in the hyperspectral data quite well, while leaving the remainingpixels as anomalies.

4. CONCLUSIONS

Considering that an efficient background representation is one of the major challenges in the existing anomalydetection methods, we propose an auto-encoder based anomaly detection method which can represent the scenesof different complexity with its inherited learning capabilities. The experimental results first verify that theselection of the code size has a direct influence on the performance of the method. Secondly, the superiority ofthe proposed anomaly detection method over the baseline RX variants is revealed in terms of receiver operatingcharacteristics (ROC) performance.

A basic trade-off in the proposed learning based anomaly detection method is the level of the learning versusthe capability of the reconstruction error to represent the anomalies. In other words, as the auto-encoder beginsto memorize, instead of learning the scene, it might be expected that the anomalies would also begin to dissipate.Therefore, the code size, the encoder and decoder functions and other parameters related to the learning capacityof the auto-encoder have emerged as critical selections in the proposed method. The future work will focus on

(a) (b)

Figure 5: Illustration of generated code vectors (a) for code size of 2 and (b) for code size of 3. Red and greencolors correspond to background and anomaly pixels, respectively.

(a) (b)

Figure 6: ROC curves for the proposed algorithm with Global-RX algorithm and dual window Local-RX algo-rithm (a) for VNIR image and (b) for LWIR image.

a detailed analysis of these factors as well as the comparisons with the other state-of-the-art anomaly detectionmethods.

ACKNOWLEDGMENTS

This research was partially supported by HAVELSAN Inc. and by Scientific Research Project Center of METUunder the project id BAP-08-11-2015-005.

REFERENCES

[1] Reed, I. S. and Yu, X., “Adaptive multiple-band cfar detection of an optical pattern with unknown spectraldistribution,” Acoustics, Speech and Signal Processing, IEEE Transactions on 38, 1760–1770 (Oct 1990).

[2] Borghys, D., Kasen, I., Achard, V., and Perneel, C., “Comparative evaluation of hyperspectral anomalydetectors in different types of background,” 83902J–83902J–12 (May 2012).

[3] Kwon, H. and Nasrabadi, N., “Kernel rx-algorithm: a nonlinear anomaly detector for hyperspectral im-agery,” Geoscience and Remote Sensing, IEEE Transactions on 43, 388–397 (Feb 2005).

[4] Kwon, H., Der, S. Z., and Nasrabadi, N. M., “Adaptive anomaly detection using subspace separation forhyperspectral imagery,” Optical Engineering 42(11), 3342–3351 (2003).

[5] Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., Montreal, U. D., and Quebec, M., “Greedy layer-wisetraining of deep networks,” in [In NIPS ], MIT Press (2007).

[6] Chandar A P, S., Lauly, S., Larochelle, H., Khapra, M., Ravindran, B., Raykar, V. C., and Saha, A.,“An autoencoder approach to learning bilingual word representations,” in [Advances in Neural InformationProcessing Systems 27 ], Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., and Weinberger, K., eds.,1853–1861, Curran Associates, Inc. (2014).

[7] Ranzato, M. and Szummer, M., “Semi-supervised learning of compact document representations with deepnetworks,” in [Proceedings of the 25th international conference on Machine learning ], 792–799, ACM (2008).

[8] Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.-r., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen,P., Sainath, T. N., et al., “Deep neural networks for acoustic modeling in speech recognition: The sharedviews of four research groups,” Signal Processing Magazine, IEEE 29(6), 82–97 (2012).

[9] Licciardi, G., Khan, M., Chanussot, J., Montanvert, A., Condat, L., and Jutten, C., “Fusion of hyperspectraland panchromatic images using multiresolution analysis and nonlinear pca band reduction,” EURASIPJournal on Advances in Signal Processing 2012(1) (2012).

[10] Poultney, C., Chopra, S., Cun, Y. L., et al., “Efficient learning of sparse representations with an energy-basedmodel,” in [Advances in neural information processing systems ], 1137–1144 (2006).

[11] Ranzato, M., Boureau, Y., Chopra, S., and LeCun, Y., “A unified energy-based framework for unsupervisedlearning,” in [Proc. Conference on AI and Statistics (AI-Stats) ], (2007).

[12] Kavukcuoglu, K., Ranzato, M., and LeCun, Y., “Fast inference in sparse coding algorithms with applicationsto object recognition,” CoRR abs/1010.3467 (2010).

hyperspectral anomaly detection method based on …akincaliskan.net/publications/spie_2015.pdf ·...

Documents