band selection using improved sparse subspace clustering for … · 2015-10-12 · matrix); and 3)...

14
2784 IEEE JOURNAL OF SELECTED TOPICS INAPPLIED EARTHOBSERVATIONS AND REMOTE SENSING, VOL. 8, NO. 6, JUNE 2015 Band Selection Using Improved Sparse Subspace Clustering for Hyperspectral Imagery Classification Weiwei Sun, Liangpei Zhang, Senior Member, IEEE, Bo Du, Senior Member, IEEE, Weiyue Li, and Yenming Mark Lai Abstract—An improved sparse subspace clustering (ISSC) method is proposed to select an appropriate band subset for hyper- spectral imagery (HSI) classification. The ISSC assumes that band vectors are sampled from a union of low-dimensional orthogonal subspaces and each band can be sparsely represented as a linear or affine combination of other bands within its subspace. First, the ISSC represents band vectors with sparse coefficient vectors by solving the L2-norm optimization problem using the least square regression (LSR) algorithm. The sparse and block diagonal struc- ture of the coefficient matrix from LSR leads to correct segmenta- tion of band vectors. Second, the angular similarity measurement is presented and utilized to construct the similarity matrix. Third, the distribution compactness (DC) plot algorithm is used to esti- mate an appropriate size of the band subset. Finally, spectral clustering is implemented to segment the similarity matrix and the desired ISSC band subset is found. Four groups of experiments on three widely used HSI datasets are performed to test the perfor- mance of ISSC for selecting bands in classification. In addition, the following six state-of-the-art band selection methods are used to make comparisons: linear constrained minimum variance-based band correlation constraint (LCMV-BCC), affinity propagation (AP), spectral information divergence (SID), maximum-variance principal component analysis (MVPCA), sparse representation- based band selection (SpaBS), and sparse nonnegative matrix factorization (SNMF). Experimental results show that the ISSC has the second shortest computational time and also outperforms the other six methods in classification accuracy when using an appropriate band number obtained by the DC plot algorithm. Manuscript received October 17, 2014; revised January 25, 2015; accepted March 13, 2015. Date of publication April 13, 2015; date of current version July 30, 2015. This work was supported in part by the National Natural Science Foundation under Grant 41401389, Grant 41431175, and Grant 61471274, in part by the Research Project of Zhejiang Educational Committee under Grant Y201430436, in part by Ningbo Natural Science Foundation under Grant 2014A610173, in part by the Discipline Construction Project of Ningbo University under Grant ZX2014000400, Normal project of Shanghai Normal University under Grant SK201525, in part by the Key Laboratory of Mining Spatial Information Technology of NASMG under Grant KLM201309, and in part by the K. C. Wong Magna Fund in Ningbo University. W. Sun is with the State Key Laboratory for Information Engineering in Surveying, Mapping, and Remote Sensing (LIESMARS), Wuhan University, Wuhan 430039, China, and also with the College of Architectural Engineering, Civil Engineering and Environment, Ningbo University, Ningbo 315211, China (e-mail: [email protected]). L. Zhang is with the State Key Laboratory for Information Engineering in Surveying, Mapping, and Remote Sensing (LIESMARS), Wuhan University, Wuhan 430039, China (e-mail: [email protected]). B. Du is with the School of Computer, Wuhan University, Wuhan 430039, China (e-mail: [email protected]). W. Li is with the Institute of Urban Studies, Shanghai Normal University, Shanghai 200234, China (e-mail: [email protected]). Y. M. Lai is with the Institute for Computational Engineering and Sciences (ICES), University of Texas at Austin, Austin, TX 78712 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSTARS.2015.2417156 Index Terms—Band selection, classification, hyperspectral imagery (HSI), improved sparse subspace clustering (ISSC). I. I NTRODUCTION O WNING to advantages in collecting tens to hundreds of continuous bands of spectrum responses from visible to near-infrared wavelength, hyperspectral imagery (HSI) has powerful performance in recognizing different ground objects with subtle spectrum divergences through the classification implementation [1]. The classification results of HSI dataset are now widely used in many realistic applications, such as ocean monitoring [2], land cover mapping [3], [4], precision farming [5], [6], and mine exploitation [7], [8]. Unfortunately, numer- ous bands as well as strong intra-band correlations also bring about big problems to the classification implementation [9], [10]. Especially, the “curse of dimensionality” renders that the HSI dataset requires extremely more training samples if accu- rate classification result is wanted, whereas collecting so many training samples is expensive and time-consuming [11], [12]. Therefore, making dimensionality reduction is an alternative way to conquer these problems. Dimensionality reduction of HSI datasets can typically be divided into two categories: band selection (i.e., feature selec- tion) and feature extraction [13]. Band selection selects an appropriate band subset from the original band set of the HSI dataset while feature extraction preserves important spectral features through mathematical transformations. In this paper, we focus on the band selection category of dimensionality reduction, since the selected band combination could perfectly solve the problem of “curse of dimensionality” and inherit the original spectral meanings from the original HSI dataset. Previous work in the dimensionality reduction of band selec- tion can also be roughly divided into two classes: 1) the maxi- mum information or minimum correlation (MIMC) scheme and 2) the maximum inter-class separability (MIS) scheme. MIMC selects an appropriate band subset in which each single-band image has the maximum information or minimum correlation with other bands. The MIMC scheme typically uses the three main criteria of entropy criterion, the intra-band correlation cri- terion, and the cluster criterion. The entropy criterion algorithm collects an appropriate band subset by maximizing the overall amount of information using entropy-like measurements [14], [15]. The intra-band correlation criterion algorithms select the appropriate band subset having minimum intra-band correla- tions. Examples include the mutual information-based algo- rithm [16] and the constrained band selection algorithm based 1939-1404 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Upload: others

Post on 02-Aug-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Band Selection Using Improved Sparse Subspace Clustering for … · 2015-10-12 · matrix); and 3) clustering the similarity matrix using spectral clustering [33]. Assume a high-dimensional

2784 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 8, NO. 6, JUNE 2015

Band Selection Using Improved Sparse SubspaceClustering for Hyperspectral Imagery Classification

Weiwei Sun, Liangpei Zhang, Senior Member, IEEE, Bo Du, Senior Member, IEEE, Weiyue Li,and Yenming Mark Lai

Abstract—An improved sparse subspace clustering (ISSC)method is proposed to select an appropriate band subset for hyper-spectral imagery (HSI) classification. The ISSC assumes that bandvectors are sampled from a union of low-dimensional orthogonalsubspaces and each band can be sparsely represented as a linearor affine combination of other bands within its subspace. First, theISSC represents band vectors with sparse coefficient vectors bysolving the L2-norm optimization problem using the least squareregression (LSR) algorithm. The sparse and block diagonal struc-ture of the coefficient matrix from LSR leads to correct segmenta-tion of band vectors. Second, the angular similarity measurementis presented and utilized to construct the similarity matrix. Third,the distribution compactness (DC) plot algorithm is used to esti-mate an appropriate size of the band subset. Finally, spectralclustering is implemented to segment the similarity matrix and thedesired ISSC band subset is found. Four groups of experiments onthree widely used HSI datasets are performed to test the perfor-mance of ISSC for selecting bands in classification. In addition, thefollowing six state-of-the-art band selection methods are used tomake comparisons: linear constrained minimum variance-basedband correlation constraint (LCMV-BCC), affinity propagation(AP), spectral information divergence (SID), maximum-varianceprincipal component analysis (MVPCA), sparse representation-based band selection (SpaBS), and sparse nonnegative matrixfactorization (SNMF). Experimental results show that the ISSChas the second shortest computational time and also outperformsthe other six methods in classification accuracy when using anappropriate band number obtained by the DC plot algorithm.

Manuscript received October 17, 2014; revised January 25, 2015; acceptedMarch 13, 2015. Date of publication April 13, 2015; date of current versionJuly 30, 2015. This work was supported in part by the National Natural ScienceFoundation under Grant 41401389, Grant 41431175, and Grant 61471274,in part by the Research Project of Zhejiang Educational Committee underGrant Y201430436, in part by Ningbo Natural Science Foundation underGrant 2014A610173, in part by the Discipline Construction Project of NingboUniversity under Grant ZX2014000400, Normal project of Shanghai NormalUniversity under Grant SK201525, in part by the Key Laboratory of MiningSpatial Information Technology of NASMG under Grant KLM201309, and inpart by the K. C. Wong Magna Fund in Ningbo University.

W. Sun is with the State Key Laboratory for Information Engineering inSurveying, Mapping, and Remote Sensing (LIESMARS), Wuhan University,Wuhan 430039, China, and also with the College of Architectural Engineering,Civil Engineering and Environment, Ningbo University, Ningbo 315211, China(e-mail: [email protected]).

L. Zhang is with the State Key Laboratory for Information Engineering inSurveying, Mapping, and Remote Sensing (LIESMARS), Wuhan University,Wuhan 430039, China (e-mail: [email protected]).

B. Du is with the School of Computer, Wuhan University, Wuhan 430039,China (e-mail: [email protected]).

W. Li is with the Institute of Urban Studies, Shanghai Normal University,Shanghai 200234, China (e-mail: [email protected]).

Y. M. Lai is with the Institute for Computational Engineering and Sciences(ICES), University of Texas at Austin, Austin, TX 78712 USA (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSTARS.2015.2417156

Index Terms—Band selection, classification, hyperspectralimagery (HSI), improved sparse subspace clustering (ISSC).

I. INTRODUCTION

O WNING to advantages in collecting tens to hundredsof continuous bands of spectrum responses from visible

to near-infrared wavelength, hyperspectral imagery (HSI) haspowerful performance in recognizing different ground objectswith subtle spectrum divergences through the classificationimplementation [1]. The classification results of HSI dataset arenow widely used in many realistic applications, such as oceanmonitoring [2], land cover mapping [3], [4], precision farming[5], [6], and mine exploitation [7], [8]. Unfortunately, numer-ous bands as well as strong intra-band correlations also bringabout big problems to the classification implementation [9],[10]. Especially, the “curse of dimensionality” renders that theHSI dataset requires extremely more training samples if accu-rate classification result is wanted, whereas collecting so manytraining samples is expensive and time-consuming [11], [12].Therefore, making dimensionality reduction is an alternativeway to conquer these problems.

Dimensionality reduction of HSI datasets can typically bedivided into two categories: band selection (i.e., feature selec-tion) and feature extraction [13]. Band selection selects anappropriate band subset from the original band set of the HSIdataset while feature extraction preserves important spectralfeatures through mathematical transformations. In this paper,we focus on the band selection category of dimensionalityreduction, since the selected band combination could perfectlysolve the problem of “curse of dimensionality” and inherit theoriginal spectral meanings from the original HSI dataset.

Previous work in the dimensionality reduction of band selec-tion can also be roughly divided into two classes: 1) the maxi-mum information or minimum correlation (MIMC) scheme and2) the maximum inter-class separability (MIS) scheme. MIMCselects an appropriate band subset in which each single-bandimage has the maximum information or minimum correlationwith other bands. The MIMC scheme typically uses the threemain criteria of entropy criterion, the intra-band correlation cri-terion, and the cluster criterion. The entropy criterion algorithmcollects an appropriate band subset by maximizing the overallamount of information using entropy-like measurements [14],[15]. The intra-band correlation criterion algorithms select theappropriate band subset having minimum intra-band correla-tions. Examples include the mutual information-based algo-rithm [16] and the constrained band selection algorithm based

1939-1404 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: Band Selection Using Improved Sparse Subspace Clustering for … · 2015-10-12 · matrix); and 3) clustering the similarity matrix using spectral clustering [33]. Assume a high-dimensional

SUN et al.: BAND SELECTION USING ISSC FOR HSI CLASSIFICATION 2785

on constrained energy minimization (CBS-CEM) [17]. Thecluster criterion algorithm also considers intra-band correctionsand selects a representative band from each band cluster usingcertain clustering algorithms. Examples include the hierarchicalclustering algorithm using the mutual information measure-ment [18] and the affinity propagation (AP) algorithm withnoise-removal bands using wavelet shrinkage [19].

In contrast, the MIS scheme selects an appropriate bandsubset that maximizes the separability of different groundobjects in the image scene. The MIS scheme is typically imple-mented using one of the following algorithms: the distancemeasurement criterion algorithm, the feature transformationcriterion algorithm, and the realistic application criterion algo-rithm. The distance measurement criterion algorithm maxi-mizes the inter-class differences using a distance-like measure-ment such as the spectral information divergence (SID), thetransformed divergence (TD), or Mahalanobis distance [20].The feature transformation criterion algorithm selects an appro-priate band subset by analyzing the inter-class separabilityof ground objects in a low-dimensional feature space foundthrough feature transformations. Examples include linear pre-diction algorithm [21] and the complex network algorithm[22]. The realistic application criterion algorithm selects anappropriate band subset through maximizing or minimizing thedefined objective function suitable for realistic applications ofHSI dataset, and the typical examples are the band selectionalgorithm using high-order movements [23] and the super-vised band selection algorithm using the known class spectralsignatures [24].

In recent years, the study of sparsity in HSI datasets hasattracted much interest in the remote sensing community. Thesparsity theory states that each band vector or spectrum vec-tor can be sparsely represented using only a few nonzerocoefficients in a suitable basis or dictionary [25]. Sparse rep-resentations of a band vector can then reveal certain under-lying structures such as the clustering structure within theHSI dataset and also drastically reduce the computational bur-den in processing HSI datasets [26]. Accordingly, researchersbegan to study the band selection problem using sparsity-based (SB) schemes and have proposed algorithms such asthe sparse representation-based band selection (SpaBS) algo-rithm [27], the sparse nonnegative matrix factorization (SNMF)algorithm [28], the collaborative sparse model (CSM) algo-rithm [29], and the sparse support vector machine (SSVM)algorithm [30].

In this paper, we address the band selection problem usingan idea inspired by sparse space clustering (SSC). We presenta band selection method using an improved version of SSCwhich we call improved sparse space clustering (ISSC). Inparticular, our motivation is to ameliorate the SSC techniquesby ISSC and implement the ISSC into HSI dataset in orderto solve the band selection problem. Our contributions areas follows: First, we are the first to explore band selectionfrom the SSC perspective. The ISSC method assumes thatall bands (i.e., band vectors) of the HSI dataset are drawnfrom a union of low-dimensional orthogonal subspaces ratherthan a single uniform subspace, and that each band can be

sparsely represented as a linear or affine combination ofother bands [31]. The two above assumptions differ from theassumptions of current SB algorithms and clustering algo-rithms. Second, our proposed ISSC method improves the SSCmethod with three following modifications: The ISSC uses thel2-norm to avoid the “too sparse” coefficient vector solutionwhen using the l1-norm and to ensure the coefficient matrixis sparse and block diagonal. Our proposed angular similar-ity measurement replaces the l1-directed graph construction(DGC) measurement in SSC and better represents the totalsimilarity between two sparse coefficient band vectors thanthe isolated coefficient values. The distribution compactness(DC) plot algorithm intelligently estimates the size of bandsubset and eliminates errors from artificial estimation in SSC.The above improvements in SSC technique by the proposedmethod ensure good performance in selecting an appropriateband subset.

This paper is organized as follows. Section II reviews theclassical sparse subspace clustering. Section III presents theband selection method using the proposed ISSC. Section IVanalyzes the performance of ISSC in band selection for clas-sification on three widely used HSI datasets. Section V statesconclusion and outlines our future work.

II. A REVIEW OF SPARSE SUBSPACE CLUSTERING

In this section, we review the classical SSC method. Wechoose to use a noise-free dataset rather than noisy datasetto illustrate principles more clearly. SSC proposed by EhsanElhamifar regards that each data point lies in a union of sub-spaces corresponding to several classes or clusters to whichthe dataset belongs; therefore, each point can be sparsely rep-resented by other points of a union of subspaces. The SSCthen uses sparse representations of data points to cluster thepoints into separate subspaces [31]. SSC has been used in awide variety of applications such as motion segmentation andface clustering [32]. It typically consists of three stages: 1) find-ing sparse representations of each data point through convexoptimization; 2) learning a similarity matrix (i.e., a weightmatrix); and 3) clustering the similarity matrix using spectralclustering [33].

Assume a high-dimensional dataset without noises Y ={yi}Ni=1 ∈ RD×N are actually lying in a union of linear sub-spaces {Cl}kl=1 with dimensions {dl}kl=1, where D is thedimension of high-dimensional space and k is the number ofsubspaces or clusters. Specifically, for a band dataset in hyper-spectral field, all band vectors constitute the high-dimensionaldataset, the number of bands N corresponds to the size of datapoints, and the number of pixels D determines the dimension-ality of high-dimensional space. We assume each point yi ∈RD×1 lies in exactly one of the k linear spaces Cl. Hence, eachlinear subspace Cl contains a cluster of unique Nl data pointsand the number of points within subspace Cl, Nl is greater thanthe subspace dimension size dl, i.e., for each subspace Cl, we

have the cluster of data points Yl ={ylj}Nl

j=1∈ RD×Nl with

Nl > dl. This placement of N points into the k subspaces also

Page 3: Band Selection Using Improved Sparse Subspace Clustering for … · 2015-10-12 · matrix); and 3) clustering the similarity matrix using spectral clustering [33]. Assume a high-dimensional

2786 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 8, NO. 6, JUNE 2015

impliesk∑

l=1

Nl = N . Accordingly, each data point yi ∈ RD×1

belonging to Cl can be represented as

yi = yl1αl1 + yl2α

l2 + · · · yljαl

j · · ·+ ylNlαlNl

=[yl1 yl2 · · · ylj · · · ylNl

]

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

αl1

αl2...αlj...

αlNl

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦

= Ylαl, αlj = 0 if yi = ylj (1)

where αl ∈ RNl×1 and Y lare the coefficient vector and dic-tionary of yi, respectively. The constraint αl

j = 0 if yi = yljin (1) is to avoid the trivial solution of reconstructing point yias a linear combination of itself. If we combine all the clusterdictionaries

{Yl

}k

l=1, the point yi can be represented using

yi = Y1α1 +Y2α2 + · · ·+Ykαk

=[Y1 Y2 · · · Yk

]⎡⎢⎢⎢⎣α1

α2

...αk

⎤⎥⎥⎥⎦ = ΦAi (2)

where Φ =[Y1 Y2 · · · Yk

] ∈ RD×N is the arrangement ofk cluster dictionaries and Ai ∈ RN×1 is the coefficient vectorfrom all data points. Assume that Y = ΦΓ, ΓΓT = ΓTΓ = I,where Γ is the permutation matrix of all cluster dictionaries,when combining all N data points into a matrix format byplacing points in individual columns, (2) can be transformedinto

Y = YZ, diag (Z) = 0 (3)

where Z = ΓT [A1 A2 · · ·AN ] ∈ RN×N is the coefficientmatrix of all data points and diag(Z) is the vector of diago-nal entries in Z. The constraint diag(Z) = 0 is to eliminate thetrivial solution that each data points is simply a linear combi-nation of itself. The ideal solution of (3) finds a set data pointsfrom a single subspace where the number of nonzero entriesof yi coincides with the dimensionality of this subspace. Theideal solution guarantees that the coefficient matrix Z is sparseor block diagonal and could benefit the correct segmentation ofall data points into separate subspaces.

The solution Z of (3) can be regarded as the optimizationproblem of minimizing the following objection function:

Z = argmin‖Z‖q, subject toY = YZ and diag(Z) = 0

(4)

where ‖Z‖q represents the lq-norm of Z defined as ‖Zq‖ =(∑Ni=1

∑Nj=1 |Zij |

q)1

q

. The l0-norm minimization counts the

number of nonzero entries in Z and can be solved using thesmoothed L0 algorithm [34]. The l1-norm can be efficiently

minimized using convex programming algorithms such as theinterior point method [35], the basis pursuit algorithm [36], andthe alternating directions algorithm [37].

Sparse matrix Z is then utilized to construct the similaritymatrix W for inferring the segmentation of all data points intodifferent subspaces. The matrix W can be regarded as an undi-rected weighted graph, where each entry Wij represents theweight of the edges between pairwise points yi and yj . The SSCmethod utilizes the l1-directed graph construction (DGC) mea-surement to guarantee the symmetrization of weights betweenpairwise points yi and yj . The DGC measurement is performedas follows:

Wij =|Zij |+ |Zji|

2(5)

where |Zij | is the absolute value of the entry Zij in Z. Afterconstructing the similarity matrix W, the SSC uses spectralclustering [33] to cluster all data points into their underlyingsubspaces.

III. BAND SELECTION OF HSI DATASET USING ISSC

In this section, the band selection method using ISSCis described. Section III-A presents sparse representa-tions of each band vector through solving the l2-norm.Section III-B describes the construction of the similaritymatrix using our proposed angular similarity measurement.Section III-C explains the appropriate estimation of band clus-ters using the DC plot algorithm and shows how to selectan appropriate band subset using spectral clustering. Finally,Section III-D summarizes our band selection method usingISSC.

A. Sparse Representations of Band Vectors Using L2-Norm

Consider a collection of HSI band vectors Y = {yi}Ni=1 ∈RD×N that are drawn from a union of orthogonal subspaces{Cl}kl=1, where D is the dimension of high-dimensional spaceand is equal to the number of pixels in the image scene, N is thenumber of bands with N << D, and k is the number of bandclusters (i.e., the number of underlying subspaces). Each bandvector yi is contaminated with Gaussian noise and is sparselyrepresented by other band vectors as follows:

yi = YZi + e, Zii = 0 (6)

where Zi = [Zi,1 Zi,2 · · ·Zi,N ]T is the coefficient vector ofband vector yi, and e is the error term with a bounded norm. Theerror in (6) results from noises in band vectors and the approx-imation errors in sparse representation by other band vectors.We combine all the band vectors by placing them column bycolumn to achieve the following matrix format:

Y = YZ+E, diag(Z) = 0 (7)

where Z = [Z1 Z2 · · ·ZN ] and E = [e1 e2 · · · eN ] are thesparse coefficient matrix and error matrix of all band vectors,

Page 4: Band Selection Using Improved Sparse Subspace Clustering for … · 2015-10-12 · matrix); and 3) clustering the similarity matrix using spectral clustering [33]. Assume a high-dimensional

SUN et al.: BAND SELECTION USING ISSC FOR HSI CLASSIFICATION 2787

respectively. Like (4), (7) can be solved by optimizing thefollowing problem:

Z = argmin‖Z‖q, subject to ‖Y −YZ‖≤ ε and diag(Z) = 0 (8)

where ε is the norm bound of the error. When choosing q = 1,the optimization problem (9) can be transformed into the well-known lasso problem [38]

Z = argmin ‖Y −YZ‖22 + β ‖Z‖1, subject to diag(Z) = 0(9)

where ‖Z‖1 denotes the l1-norm of the matrix Z, ‖Z‖1 =∑Ni=1

∑Nj=1 |Zij | and β > 0 is a scalar regularization param-

eter that balances the weight of the error terms and the sparsityof the coefficient matrix. The assumption of underlying sub-spaces in all band vectors guarantees that the l1-norm achievesa sparse and block diagonal coefficient matrix for further seg-mentation. However, the extremely high intra-band correlationsmay cause the sparse representations of band vectors to selectonly one band at random [39], which would bring about the“too sparse” solution of problem (9) by segmenting the within-cluster bands into different subspaces. Therefore, we relax (7)by optimizing the following l2- norm problem:

Z = argmin ‖Y−YZ‖22 +β ‖Z‖22 , subject to diag(Z) = 0(10)

where ‖Z‖2 represents the l2-norm (the Frobenius norm) of

matrix Z, ‖Z‖2 =(∑N

i=1

∑Nj=1 Z

2ij

) 12

. It has been proved that

both the orthogonal subspace assumption of band vectors andthe l2-norm of matrix Z can guarantee the optimal solution inproblem (10) is both sparse and block diagonal even if the bandnumbers N is smaller than the dimension of high-dimensionalspace D (i.e., insufficient data sampling with N << D) [40].Moreover, the optimal solution in problem (10) is proven tohave a grouping effect with intra-band correlation dependentband vectors, and this grouping effect can segment highly cor-related band vectors [41]. Therefore, the optimal solution ofproblem (10) can minimize the intra-cluster affinities of bandvectors and capture the correlation structure of band vectorsfrom the same orthogonal subspace for correct clustering.

In this paper, we utilize the least squares regression (LSR)[41] algorithm to solve the optimization problem (10). LetY∗

i = Y\yi = [y1 y2 · · · yi−1 yi+1 · · · yN ] be the remainingcolumn set of Y after removing the column yi, and let Ei =(Y∗T

i Yi∗ + βI)−1 and YΓ = [Y∗

i yi], where Γ is the per-mutation matrix with ΓΓT = ΓTΓ = I. The LSR factorizesmatrix

[ΓT (YTY + βI)Γ

]−1to achieve the optimal solution

Z, using the Woodbury formula [42]

[ΓT (YTY + βI)Γ

]−1=

[Y∗T

i Y∗i + βI Y∗T

i yiyTi Y

∗i yTi yi + β

]−1

=

[(Y∗T

i Y∗i + βI)−1 00 0

]

+ γi

[[Z]i[Z]

Ti −[Z]i

−[Z]Ti 1

](11)

where [Z]i = (Y∗Ti Y∗

i + βI)−1Y∗Ti yi is the ith column

of desired optimal solution Z and γi = yTi yi + β −yTi Y

∗i (Y

∗Ti Y∗

i + βI)−1Y∗Ti yi. We then substitute the

equation ΓT (YTY + βI)−1Γ =[ΓT (YTY + βI)Γ

]−1into

(11) resulting from the property of matrix Γ, and theoptimal solution is achieved as Z = −(YTY + βI)−1

(diag((YTY + βI)−1))−1, where diag(Z) = 0.

B. Angular Similarity Measurement Between SparseCoefficient Band Vectors

After solving problem (10), the ISSC constructs the similar-ity matrix W using the sparse coefficient matrix Z of all bandvectors. The similarity matrix can be regarded as an undirectedweighted graph G = (V,W), where Vij ∈ V represents theedge between pairwise bands yi and yj , and the weight Wij ∈W measures the similarity between sparse representations ofpairwise band vectors yi and yj . The DGC similarity mea-surement in SSC assumes that the nonzero entries of sparsecoefficient vector Zi reflect the closeness or similarity withother bands from the same subspace, since the ideal similar-ity matrix has only edges between pairwise band vectors fromthe same subspace [33]. Some authors have proposed relevantmeasurements to improve the performance of the DGC similar-ity measurement, such as the sparsity induced similarity [43]and the nonnegative sparsity induced similarity [44]. The abovemeasurements utilize individual sparse coefficients to computethe weights of two bands. However, the local similarity repre-sented by sparse coefficients cannot describe the true similaritybetween two bands because sparse coefficients from real-worldHSI dataset are sensitive to noise and outliers [45].

We present an angular similarity (AS) measurement usingthe angular information between two sparse coefficient bandvectors. The AS measurement assumes that sparse represen-tations of two similar bands from the same subspace shouldhave a small angle between the two of them since both of themare sparsely represented in a similar combination using otherbands. For each pairwise sparse coefficient vectors Ziand Zj

that represent coefficient vectors of yi and yj , respectively, theAS similarity measurement is defined as follows:

WASij=

⎛⎜⎝ Zi · Zj∥∥∥Zi

∥∥∥2 × ∥∥∥Zj

∥∥∥2⎞⎟⎠

2

(12)

where · denotes the inner product of two band vectors and ‖·‖2is the norm of the vector. The use of the squaring operationin (12) is to guarantee a positive value for each WASij

and toincrease the separability of sparse band vectors from differentsubspaces. By computing all combinations of columns indexesand row indexes using (12), the similarity matrix W is foundfor further clustering.

C. Band Clustering With an Appropriate Cluster Number

When constructing the similarity matrix, the spectral clus-tering algorithm is utilized to segment the weighted graph into

Page 5: Band Selection Using Improved Sparse Subspace Clustering for … · 2015-10-12 · matrix); and 3) clustering the similarity matrix using spectral clustering [33]. Assume a high-dimensional

2788 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 8, NO. 6, JUNE 2015

k clusters. However, a difficult problem is how to estimate anappropriate number k of band clusters, because the k signifi-cantly affects the clustering result of band vectors. In general,the cluster number k is arbitrarily determined with estima-tion. However, a too small k would divide highly correlatedbands into other different subspaces and render an error-proneband subset. In contrast, a too large k would bring about toomuch computational burden for further classification of HSIdataset. To address these problems, a variety of methods havebeen proposed to help estimate an appropriate k and can beclassified into two main classes, the posterior method and theprior method. The posterior method tests all possible num-ber of clusters and then selects an appropriate cluster numberusing defined criteria such as the over information-theoreticcriterion [46], the gap statistic criterion [47], and the mini-mum description length criterion [48]. The posterior method iscomputationally expensive, and all candidate numbers of clus-ters have to be tested explicitly. In contrast, the prior methodestimates the cluster number before implementing spectral clus-tering. Prior methods include the Eigen-gap heuristic algorithm[49] and the edge based-algorithm [50].

Since the similarity matrix W is ideally sparse and blockdiagonal, we introduce a DC plot algorithm [51] to estimate theappropriate number k of band clusters. The algorithm uses anonparametric estimate of the probability density function ofthe band dataset and determines the appropriate number k ofband clusters through analyzing the DC of sparse representa-tions of band vectors in the kernel space constructed by theAS measurement. The DC measurement of all clusters uses aneigenvalue decomposition of the similarity matrix as follows:

DC =

∫Z

p(Z) dZ ≈ 1TNW1N =

N∑i=1

λi

{1TNui

}2(13)

where 1N is a N × 1 dimensional vector that contains all ones,W is the similarity matrix, and λi and ui are the ith eigenvaluesand eigenvectors of W, respectively. The appropriate cluster

number is obtained by analyzing the plot of log(λi

{1TNui

}2)

against i. The logarithm value rather than the direct value ofλi

{1TNui

}2is used to better explain large shifts of the plot at

i = 1 and also to smooth the DC data. The logarithm plot isadditionally smoothed using the average filter of size 3 sincethe log-likelihood function is sensitive to variance in the data.The number corresponding to the “elbow” of the logarithm plotis then selected as the appropriate number k.

Given the appropriate cluster size k of band vectors, spec-tral clustering then implements band selection in the followingfour stages: 1) The symmetric normalized Laplacian matrixLsym is built from the similarity matrix W using Lsym =

D−1/2WD−1/2, where D is a diagonal matrix constructedwith the diagonal entries of W. 2) The first k eigenvectorsUk = [u1 u2 · · · uk] ∈ RN×k are found through a singularvalue decomposition of the Laplacian matrix Lsym and eachrow vector Hi of Uk is normalized to norm 1 using Hij =

uij/(∑

k u2ik

)1/2. 3) Row vectors in normalized Uk are clus-

tered into k clusters using the K-means algorithm, and the

Fig. 1. Band selection using ISSC.

corresponding band vectors are segmented into their underly-ing subspaces. 4) The band whose corresponding row vectorsin Uk is closest to the mean vector (i.e., the vector of the cen-troid) of its cluster {Cl}kl=1 in terms of Euclidean distance ischosen as an element of the band subset for HSI dataset andhence the appropriate band subset from ISSC is achieved.

D. Summary of Band Selection Using ISSC

In the above three sections, we provided three improvementsof the classical SSC when implementing band selection on aHSI dataset. 1) We set the lq-norm problem in (5) to the l2-normproblem to achieve sparse representations of each band vectorusing the LSR algorithm. The l2-norm guarantees the coeffi-cient matrix is both sparse and block diagonal and can alsominimize the between-cluster similarity for correct segmenta-tion of band vectors. 2) The AS measurement improves theperformance of the similarity matrix constructed with the DGCmeasurement. The AS measurement considers the overall sim-ilarity between two sparse coefficient band vectors rather thanonly the similarity between isolated sparse coefficients. 3) Weuse the DC plot algorithm to estimate an appropriate number kof band clusters to avoid errors caused from variance in data.The process of band selection using the ISSC is shown in Fig. 1and is implemented in the following five steps.

1) The HSI imagery is transformed from a data cube intotwo-dimensional (2-D) band dataset Y ∈ RD×N , whereD and N are the dimension of band vectors (i.e., thenumber of pixels) and the number of bands, respectively.

2) Sparse representations of band vectors are constructedusing the LSR algorithm by solving the optimizationproblem (10) and sparse coefficient matrix Z of bandvectors is found.

Page 6: Band Selection Using Improved Sparse Subspace Clustering for … · 2015-10-12 · matrix); and 3) clustering the similarity matrix using spectral clustering [33]. Assume a high-dimensional

SUN et al.: BAND SELECTION USING ISSC FOR HSI CLASSIFICATION 2789

TABLE ICONTRAST IN COMPUTATIONAL COMPLEXITY BETWEEN ISSC AND SIX

OTHER BAND SELECTION METHODS

3) The similarity matrix W is found with the AS mea-surement in (12), where each entry of W representsthe similarity between pairwise sparse coefficient bandvectors.

4) The appropriate number k of band clusters is estimatedusing the DC plot algorithm in (13). Spectral clustering isthen used to segment the similarity matrix into separateclusters.

5) The bands nearest to the centroid of their cluster are thenthe desired band subset using ISSC.

In ISSC, the computational complexity of sparse representa-tions of all band vectors using LSR is O(N2D), the computa-tional complexity in constructing the angular similarity matrixis O(N2), and the computational complexity of clusteringbands with the appropriate number is O(kNt), where D and Nare the dimension of the band vectors and the number of bands,respectively, and k and t represent the number of band clus-ters and iterations, respectively. Therefore, the total complexityof ISSC roughly equals O(N2D +N2 + kNt). Consideringthe fact that O(N2D) dominates O(N2) and O(kNt), thecomplexity of ISSC is approximately O(N2D). We com-pare the complexity of ISSC with six other state-of-the-artband selection methods. We test two MIMC approaches, linearconstrained minimum variance-based band correlation con-straint (LCMV-BCC) [17] and AP [52], two MIS approaches,SID [20] and maximum-variance principal component analy-sis (MVPCA) [53], and two SB approaches, SpaBS [27] andSNMF [30]. The comparison in computational complexity of allseven methods is listed in Table I, where K denotes the sparsitylevel of SpaBS. Since N2 < D, N2D < D2 << D2N , andK < k << N << D, we deduce that the computational com-plexity of ISSC is lower than that of AP, MVPCA, SpaBS, andLCMV-BCC but higher than that of SID. Moreover, the SpaBShas the highest computational complexity among all the meth-ods, whereas LCMV-BCC has the second highest computa-tional complexity. In addition, the computational complexity ofAP is higher than that of the other four methods SID, MVPCA,SNMF, and ISSC. In summary, ISSC has a lower computationalcomplexity among all seven band selection methods.

IV. EXPERIMENTAL RESULTS AND ANALYSIS

In this section, four groups of experiments on three famousHSI datasets are implemented in order to testify our pro-posed ISSC method when selecting an appropriate band subset.Section IV-A describes the relevant information of the three

Fig. 2. Image of Indian Pines dataset.

HSI datasets Indian Pines, Pavia University (PaviaU), andUrban datasets. Section IV-B lists the detailed results from thefour groups of experiments, and Section IV-C analyzes anddiscusses the experimental results from Section IV-B.

A. Descriptions of Three HSI Datasets

The first dataset is Indian Pines from the MultispectralImage Data Analysis System Group at Purdue University(https://engineering.purdue.edu/~biehl/MultiSpec/aviris_documentation.html). The dataset was acquired by NASA on June12, 1992 using the AVIRIS sensor from JPL. The datasethas 20 m spatial resolutions and 10 nm spectral resolutionscovering a spectrum range of 200–2400 nm. A subset of theimage scene of size 145× 145 pixels depicted in Fig. 2 isused in our experiment and covers an area of 6 miles west ofWest Lafayette, Indiana. The dataset was preprocessed withradiometric corrections and bad band removal, and afterward200 bands remained with calibrated data values proportionalto radiances. Sixteen classes of ground objects exist in theimage scene, and the ground truth for both training and testingsamples for each class is listed in Table II.

The second is Pavia University (PaviaU) dataset taken fromthe Computational Intelligence Group in the Basque University(http://www.ehu.es/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes). The dataset was achieved from ROSIS sen-sor with 1.3 m spatial resolutions and 115 bands. Afterremoving low signal to noise ratio (SNR) bands, 103 bandswere left for further analysis. The small subset of the largerdataset shown in Fig. 3 is used in our experiments. The testdataset contains 350× 340 pixels and covers the area of PaviaUniversity. The image scene has nine classes of ground objectsincluding shadows, and the ground truth information of trainingand testing samples in each class is listed in Table III.

The third dataset is the Urban dataset acquired from the U.S.Army Geospatial Center (www.tec.army.mil/hypercube). Thedataset was collected by a HYDICE sensor with 10 nm spec-tral resolution and 2 m spatial resolutions. The low SNR bandsets [1–4, 76, 87, 101–111, 136–153, 198–210] were eliminatedfrom the initial 210 bands, leaving the final 162 bands. Fig. 4shows a small image subset of size 307× 307 pixels selectedfrom the larger image. The small dataset covers an area atCopperas Cove near Fort Hood, TX, USA, and has 22 classes ofground objects in the image scene. Table IV shows the groundtruth information for training and testing samples in each class.

Page 7: Band Selection Using Improved Sparse Subspace Clustering for … · 2015-10-12 · matrix); and 3) clustering the similarity matrix using spectral clustering [33]. Assume a high-dimensional

2790 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 8, NO. 6, JUNE 2015

TABLE IIGROUND TRUTH OF TRAINING AND TESTING SAMPLES IN EACH CLASS FOR INDIAN PINES DATASET

Fig. 3. Image of PaviaU dataset.

B. Experimental Results

We conduct four groups of experiments using three HSIdatasets above to test the performance of our ISSC methodin selecting an appropriate band subset for classification. Sixstate-of-the-art methods in band selection are used to makeholistic comparisons with our methods, including two MIMCapproaches LCMV-BCC and AP, two MIS approaches SID andMVPCA, and two SB approaches SpaBS and SNMF. First, wequantify the band selection performance of ISSC and comparethe results with those of the other six methods. The experi-ment assesses the performance of the ISSC in band selectionbefore classification. Second, we compare the computationaltime of ISSC against six other band selection methods whenvarying the size of band subset k. The experiment investigatesthe computational performance of ISSC. Third, we comparethe classification accuracies of ISSC against that of the sixother methods. Two widely used classifiers are used in theexperiment, K-nearest neighbor (KNN) [54] and support vectormachine (SVM) [55] classifiers. The overall classification accu-racy (OCA) and average classification accuracy (ACA) is usedto measure the classification performance of all seven meth-ods in the experiment. The KNN classifier uses the Euclideandistance, and the SVM classifier uses the radial basis func-tion (RBF) kernel function with the variance parameter andthe penalization factor obtained via cross-validation. For eachdataset, we repeatedly subsample the training samples and test-ing samples ten times. Finally, we investigate the relationships

between the scalar parameter β in ISSC and the classifica-tion accuracies. The experiment helps to determine a proper βwhen using ISSC in real-world applications of HSI classifica-tion. The following experimental results without specific nota-tions are the average results of ten different and independentexperiments.

1) Quantitative Evaluation of the ISSC Band Subset: Theexperiment evaluates the band selection results obtained beforeclassification from ISSC and other six methods using threequantitative measures. We use the average information entropy(AIE) to measure the information amount and to evaluate therichness of spectrum information in the band subset. We com-pute the average correlation coefficient (ACC) to estimate theintra-band correlations in the band subset. The average relativeentropy (ARE) (also called average Kullback–Leibler diver-gence, AKLD) is used to measure the inter-separabilities ofselected bands and assess the distinguishability within the bandsubset for classification. We use the three above quantitativemeasures because they measure the three desired performancecharacteristic of selecting an appropriate band subset havinghigh information amount, low intra-band correlations, and highinter-separabilities in the band subset. In the experiment, theappropriate size of band subset k (i.e., the number of bands inthe subset) is estimated using the DC plot algorithm and is thenset as the dimension of band subset for all the methods. The k inIndian Pines dataset is 12, the k in PaviaU dataset is 10, and thek in Urban dataset is 20. In the SNMF method, the parameter αthat controls the entry size of dictionary matrix and the param-eter γ that determines the sparseness of coefficient matrix aredetermined using cross-validation and the optimal α and γ hav-ing the best result are selected. For the Indian Pines dataset,the α and γ of SNMF are chosen as 3.0 and 0.05, respec-tively, and the scalar parameter β in ISSC chosen to be 0.1after cross-validation. In the PaviaU dataset, the α and γ are4.0 and 0.001, respectively, and the β is 0.001. The α, γ, and βin Urban dataset are 3.5, 0.01, and 0.05, respectively. The iter-ation time t for the learning dictionary in SpaBS is manuallyset as 5 for all three datasets. Table V lists detailed informationabout the parameters of the above seven methods on the threeHSI datasets.

Table VI illustrates the quantitative evaluation results of allthe methods on the three HSI datasets. For the Indian Pinesdataset, the ISSC has the highest AIE and ARE, whereas SNMFhas the second lowest ACC since the ACC of SNMF. The

Page 8: Band Selection Using Improved Sparse Subspace Clustering for … · 2015-10-12 · matrix); and 3) clustering the similarity matrix using spectral clustering [33]. Assume a high-dimensional

SUN et al.: BAND SELECTION USING ISSC FOR HSI CLASSIFICATION 2791

TABLE IIIGROUND TRUTH OF TRAINING AND TESTING SAMPLES IN EACH CLASS FOR PAVIAU DATASET

Fig. 4. Image of Urban dataset.

SID and MVPCA behave worse due to their higher ACC andlower AIE and ARE. For the PaviaU dataset, the ISSC performsbest among all the methods for all three quantitative measures.When comparing four other methods with ISSC excluded, theSID and MVPCA show higher ACC and lower AIE and ARE;therefore, the band subset of the two methods have poorerperformance than that of the two other methods. The ISSC per-formance in the Urban dataset has the highest AIE and lowestACC, and ARE of the ISSC is the second only to that of SNMF.The SID performance on the Urban dataset was similar to itsperformance on the Pavia U dataset. It again performed worsethan the other methods in the three quantitative measures.

2) Computational Performance of ISSC: The experimenttests the computational speed performance of ISSC and theother six methods when varying the sizes of band subset k. Forthe Indian pines dataset, k is set between 6 and 24 with a stepinterval of 6; k in PaviaU dataset is set between 5 and 25 witha step interval of 5; and k in Urban dataset is set between 10and 40 with a step interval of 10. Other parameters in all theseven methods are the same as their counterparts in the previ-ous experiment. Table V details parameter configurations of allthe methods.

We run the experiment on a Windows 7 computer with Interi7-4700 Quad Core Processor and 16 GB of RAM. Both theISSC and the other six methods’ algorithms are implemented inMATLAB 2014a. Table VII shows the comparisons in the com-putations times of the seven methods on the three HSI datasets.For each HSI dataset, the computational times of all the meth-ods gradually increase with increasing k. The SpaBS takesthe longest computational times among all the methods. TheLCMV-BCC has a shorter computational time than SpaBS butis still slower than the other five methods. The computationaltime of AP is longer than those of MVPCA, SID, SNMF, and

ISSC, and the computational time of SNMF is longer than thoseof MVPCA, ISSC, and SID. The MVPCA has longer computa-tional time than SID and ISSC. The computational time of SIDis shorter than that of ISSC and has the shortest computationaltime among all the methods. We found that these computationaltimes coincide with the analysis of computational complexityof all the methods in Section III-D. The computational times inincreasing order of all the seven methods are as follows: SID,ISSC, MVPCA, SNMF, AP, LCMV-BCC, and SpaBS.

3) Classification Performance of ISSC: This experimentmeasures the classification performance of the ISSC method.Our aim is to make holistic evaluations in classification perfor-mance by varying the size of band subset k rather than using acertain predefined band number. As we did in the above exper-iments, we compare classification accuracies using the OCAand ACA. For each dataset, we repeatedly subsample the train-ing samples and testing samples ten times to achieve accurateclassification accuracies. In the experiment, the size of bandsubset k in Indian Pines dataset varies from 2 to 45 with a stepinterval of 2, and the size of band subset k in PaviaU datasetand Urban datasets vary between 2 and 50 with a step intervalof 2. The neighbor size k1 in the KNN classifier is set as 3, andthe threshold of total distortion in the SVM classifier is set as0.01. Using cross-validation, the α and γ in SNMF of PaviaUdataset are chosen as 3.0 and 0.1, respectively, the α and γ inUrban dataset are chosen as 4.0 and 1.5, respectively; and the αand γ in Urban dataset are chosen as 3.5 and 0.05, respectively.Other parameters are the same as their counterparts in the aboveexperiments. Table V details parameter configurations of all themethods.

Fig. 5 illustrates the OCA results of all seven methods usingthe SVM and KNN classifiers on the three datasets. We did notlist the ACA results because of the similarity between the ACAcurve and the OCA curve. For each dataset and each classifier,the OCA is small for band number k less than 5 and the OCArises with increasing k. The OCA changes slowly after a cer-tain threshold of the band number k and most curves becomealmost flat with slight fluctuations. The SID with each classi-fier and each HSI dataset always has the lowest value among allthe methods. For each dataset using KNN and SVM classifiers,ISSC outperforms the other six methods after a certain thresh-old k. The OCA results of ISSC are the best among all the sevenmethods.

Moreover, we found that the threshold k for the ISSC curvesof each dataset is located around the appropriate band num-ber estimated by the DC plot algorithm. Therefore, using theappropriate band number k from the DC plot algorithm, we

Page 9: Band Selection Using Improved Sparse Subspace Clustering for … · 2015-10-12 · matrix); and 3) clustering the similarity matrix using spectral clustering [33]. Assume a high-dimensional

2792 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 8, NO. 6, JUNE 2015

TABLE IVGROUND TRUTH OF TRAINING AND TESTING SAMPLES IN EACH CLASS FOR URBAN DATASET

TABLE VLISTS OF PARAMETERS IN ALL THE EXPERIMENTS ON THE THREE HSI DATASETS

TABLE VICONTRAST IN QUANTITATIVE EVALUATION OF BAND SUBSETS FROM ALL SEVEN METHODS ON THE THREE DATASETS

Page 10: Band Selection Using Improved Sparse Subspace Clustering for … · 2015-10-12 · matrix); and 3) clustering the similarity matrix using spectral clustering [33]. Assume a high-dimensional

SUN et al.: BAND SELECTION USING ISSC FOR HSI CLASSIFICATION 2793

TABLE VIICOMPUTATIONAL TIMES OF SEVEN BAND SELECTION METHODS USING DIFFERENT CHOICES OF K ON THREE HSI DATASETS

Fig. 5. OCA results of all the seven methods on the three HSI datasets. (a), (c), and (e): SVM; (b), (d), (f): KNN.

make detailed comparisons in the OCA and ACA results ofall the methods on the three datasets. As in experiment 1), theappropriate band number k in Indian Pines dataset is 12, thek in PaviaU dataset is 10, and the k in Urban dataset is 20.Table VIII shows that the ISSC has the best classification accu-racies for all three datasets using different classifiers, that theAP behaves better than SID and MVPCA in classification, and

that the SID has the worst performance of all the method. Theabove observations further support the results in Fig. 5.

4) Effect From the Scalar Parameter on the Sensitivity ofClassification Accuracy: The experiment explores the effectfrom the scalar parameter β in ISSC on the OCA and ACAresults of three HSI datasets when varying β from smaller tolarger values. Since the spectral values have similar magnitudes

Page 11: Band Selection Using Improved Sparse Subspace Clustering for … · 2015-10-12 · matrix); and 3) clustering the similarity matrix using spectral clustering [33]. Assume a high-dimensional

2794 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 8, NO. 6, JUNE 2015

TABLE VIIICLASSIFICATION ACCURACIES OF ALL THE METHODS USING AN APPROPRIATE SIZE OF BAND SUBSET ON THREE DATASETS

TABLE IXEFFECT FROM SCALAR PARAMETER IN ISSC ON CLASSIFICATION ACCURACIES OF THE THREE DATA HSI DATASETS

in all three datasets, we vary the scalar parameter β in all threedatasets from 0.001 and 100, and choose the test candidatesfrom the set [0.0001, 0.001, 0.01, 0.1, 0.5, 1, 5, 10, 50, 100].We did not investigate the effect of β in classification witha certain step interval value because the scale of the param-eter is too large to make detailed analysis. As in experiment3), the sizes of the ISSC band subset on the three datasets areestimated using the DC plot algorithm. The band number k inIndian Pines dataset is 12, the k in PaviaU dataset is 10, andthe k in Urban dataset is 20. Other parameter configurationsin the experiment are the same as their counterparts in exper-iment 3) and the detailed parameter settings in the experimentare listed in Table V.

Table IX shows the OCA and ACA results of ISSC usingdifferent classifiers and different HSI datasets. For all threedatasets, we observe that when increasing the scalar parame-ter β, the classification accuracies of ISSC gradually decreasewith small fluctuations. Moreover, the OCA and ACA decreasequickly for β between 0.0001 and 1, whereas the decline inOCA and ACA is much slower when β is from 1 to 100.This implies the scalar parameter β has a great effect onclassification accuracy of the ISSC band subset for HSI dataset.

C. Analysis and Discussion

The above four groups of experiments on the threeHSI datasets test the performance of our ISSC method in

classification. ISSC is compared against the six popular bandselection methods LCMV-BCC, AP, SID, MVPCA, SpaBS, andSNMF. The three quantitative measures of AIE, ACC, and AREshow that the ISSC has the best performance among all sevenmethods. The ISSC assumes that all bands lie in a union ofsubspaces and that each band can be sparsely represented withother bands from its subspace. The sparse and block diagonalcoefficient matrix found by solving the l2-norm optimizationproblem with LSR shows a grouping effect. Furthermore, theoptimization solution can guarantee subspace segmentation ofband vectors. The ISSC band subset has high intra-band sepa-rabilities and low intra-band correlations and each band imagehas high information amount. The ISSC satisfies the demandsof band subset selection and hence is an appropriate method forselecting an appropriate band subset for classification. In con-trast, the SID and MVPCA perform worse in the quantitativeevaluations and are poor choices for band selection.

The experiment in computational performance shows that theISSC has shorter computational times than those of LCMV-BCC, AP, MVPCA, SpaBS, and SNMF. The SID has theshortest computational times of all. The speed advantage ofSID is because SID only computes the diagonal entries ofthe similarity matrix. The AP has longer computational timesthan SID because it computes the entire similarity matrix ofband vectors rather than only computing the diagonal elementsof the similarity matrix. The longer computational times ofMVPCA are due to the computation in principal component

Page 12: Band Selection Using Improved Sparse Subspace Clustering for … · 2015-10-12 · matrix); and 3) clustering the similarity matrix using spectral clustering [33]. Assume a high-dimensional

SUN et al.: BAND SELECTION USING ISSC FOR HSI CLASSIFICATION 2795

analysis (PCA) transformation of HSI dataset. The lowestcomputational speeds in SpaBS results from the huge com-putational complexity of dictionary learning using the K-SVDalgorithm [56].

The experiment in classification performance compares clas-sification accuracies of ISSC against those of six other methods(LCMV-BCC, AP, SID, MVPCA, SpaBS, and SNMF). Theresults show that SID has the worst performance in classi-fication although its computational times are shortest. Thatcoincides with the conclusions in quantitative evaluations ofSID. Fortunately, when comparing the ACA and OCA of ISSCagainst those of other methods, ISSC performs better than theother six methods when using a cluster size obtained from theDC plot algorithm. This implies the DC plot algorithm finds anappropriate band number for selecting an appropriate band sub-set and guarantee good performance of ISSC in HSI datasetclassification. Finally, the experiment on the effect from thescalar parameter β on the classification sensitivity of ISSCshows that the ACA and OCA of ISSC decrease with increas-ing β. Therefore, a smaller scalar parameter β is appropriatefor the ISSC when selecting an appropriate band subset, sincesmaller parameter β produce higher classification accuraciesand shorter computational times.

V. CONCLUSION AND FUTURE WORK

This paper proposes the ISSC method to select an appropriateband subset from the HSI dataset. The ISSC assumes that eachband is drawn from a union of low-dimensional subspaces andeach band can be sparsely represented by other bands in its sub-space. Our algorithm constructs a similarity matrix with sparsecoefficient band vectors. The appropriate band subset is thenselected from band clusters using the similarity matrix withan appropriate band number found by the DC plot algorithm.Four groups of experiments are designed and implemented tocompletely investigate the performance of ISSC. First, the bandsubset chosen by ISSC is quantitatively evaluated using theAIE, ACC, and ARE measures. Its performance is comparedagainst the performance of the six popular methods LCMV-BCC, AP, MVPCA, SID, SpaBS, and SNMF. The results showthat the ISSC band subset contains higher information amount,lower intra-band correlations, and higher intra-band separabil-ities. Second, the experiment in computational performanceillustrates that the ISSC has a shorter computational time thando the other five methods (LCMV-BCC, AP, MVPCA, SpaBS,and SNMF). Third, the experiment in classification perfor-mance shows that classification accuracy of ISSC measured byACA and OCA is better than the six other methods when usingan appropriate number found by the DC plot algorithm. In short,the ISSC achieves the best classification performance with thesecond shortest computational time. In contrast, the SID provesto be worse both in quantitative evaluation results and in classi-fication accuracy. Finally, the experiment on the effect from thescalar parameter β on the sensitivity of classification accuracyof HSI dataset shows that a choice of a small scalar parameterβ leads to accurate classification. In the future, we will test ourISSC method against more HSI datasets to further understandits performance in real-world applications. Moreover, we will

try the l0-norm for the ISSC hoping to further ameliorate andimprove its classification performance.

ACKNOWLEDGMENT

The authors would like to thank the editor and referees fortheir suggestions which improved this paper.

REFERENCES

[1] B. Du and L. Zhang, “A discriminative metric learning based anomalydetection method,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 11,pp. 6844–6857, Nov. 2014.

[2] R. A. Garcia, P. R. Fearns, and L. I. McKinna, “Detecting trend andseasonal changes in bathymetry derived from HICO imagery: A casestudy of Shark Bay, Western Australia,” Remote Sens. Environ., vol. 147,pp. 186–205, 2014.

[3] L. Demarchi et al., “Assessing the performance of two unsuperviseddimensionality reduction techniques on hyperspectral APEX data for highresolution urban land-cover mapping,” ISPRS J. Photogramm. RemoteSens., vol. 87, pp. 166–179, 2014.

[4] M. Alonzo, B. Bookhagen, and D. A. Roberts, “Urban tree species map-ping using hyperspectral and lidar data fusion,” Remote Sens. Environ.,vol. 148, pp. 70–83, 2014.

[5] W. D. Hively et al., “Use of airborne hyperspectral imagery to map soilproperties in tilled agricultural fields,” Appl. Environ. Soil Sci., vol. 2011,pp. 1–13, 2011.

[6] I. Herrmann et al., “Ground-level hyperspectral imagery for detectingweeds in wheat fields,” Precis. Agric., vol. 14, no. 6, pp. 637–659, 2013.

[7] R. J. Murphy and S. T. Monteiro, “Mapping the distribution of ferric ironminerals on a vertical mine face using derivative analysis of hyperspectralimagery (430–970 nm),” ISPRS J. Photogramm. Remote Sens., vol. 75,pp. 29–39, 2013.

[8] N. Zabcic et al., “Using airborne hyperspectral data to characterize thesurface pH and mineralogy of pyrite mine tailings,” Int. J. Appl. EarthObserv. Geoinf., vol. 32, pp. 152–162, 2014.

[9] D. Scott, “The curse of dimensionality and dimension reduction,” inMultivariate Density Estimation: Theory, Practice, and Visualization.Hoboken, NJ, USA: Wiley, 1992, pp. 195–217.

[10] A. Plaza et al., “Dimensionality reduction and classification of hyperspec-tral image data using sequences of extended morphological transforma-tions,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 3, pp. 466–479,Mar. 2005.

[11] P. H. Hsu, “Feature extraction of hyperspectral images using wavelet andmatching pursuit,” ISPRS J. Photogramm. Remote Sens., vol. 62, no. 2,pp. 78–92, 2007.

[12] M. Pal and G. M. Foody, “Feature selection for classification of hyper-spectral data by SVM,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 5,pp. 2297–2307, May 2010.

[13] W. Sun et al., “Nonlinear dimensionality reduction via the ENH-LTSAmethod for hyperspectral image classification,” IEEE J. Sel. Topics Appl.Earth Observ. Remote Sens., vol. 7, no. 2, pp. 375–388, Feb. 2014.

[14] E. Arzuaga-Cruz, L. O. Jimenez-Rodriguez, and M. Velez-Reyes,“Unsupervised feature extraction and band subset selection techniquesbased on relative entropy criteria for hyperspectral data analysis,”AeroSense, vol. 2003, pp. 462–473, 2003.

[15] P. Bajcsy and P. Groves, “Methodology for hyperspectral band selection,”Photogramm. Eng. Remote Sens., vol. 70, no. 7, pp. 793–802, 2004.

[16] B. Guo et al., “Band selection for hyperspectral image classification usingmutual information,” IEEE Geosci. Remote Sens. Lett., vol. 3, no. 4,pp. 522–526, Oct. 2006.

[17] C.-I. Chang and S. Wang, “Constrained band selection for hyperspectralimagery,” IEEE Trans. Geosci. Remote Sens., vol. 44, no. 6, pp. 1575–1585, Jun. 2006.

[18] A. Martínez-Usó et al., “Clustering-based hyperspectral band selectionusing information measures,” IEEE Trans. Geosci. Remote Sens., vol. 45,no. 12, pp. 4158–4171, Dec. 2007.

[19] S. Jia et al., “Unsupervised band selection for hyperspectral imagery clas-sification without manual band removal,” IEEE J. Sel. Topics Appl. EarthObserv. Remote Sens., vol. 5, no. 2, pp. 531–543, Apr. 2012.

[20] P. Mausel, W. Kramber, and J. Lee, “Optimum band selection for super-vised classification of multispectral data,” Photogramm. Eng. RemoteSens., vol. 56, pp. 55–60, 1990.

Page 13: Band Selection Using Improved Sparse Subspace Clustering for … · 2015-10-12 · matrix); and 3) clustering the similarity matrix using spectral clustering [33]. Assume a high-dimensional

2796 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 8, NO. 6, JUNE 2015

[21] Q. Du and H. Yang, “Similarity-based unsupervised band selection forhyperspectral image analysis,” IEEE Geosci. Remote Sens. Lett., vol. 5,no. 4, pp. 564–568, Oct. 2008.

[22] W. Xia, B. Wang, and L. Zhang, “Band selection for hyperspectralimagery: A new approach based on complex networks,” IEEE Geosci.Remote Sens. Lett., vol. 10, no. 5, pp. 1229–1233, Sep. 2013.

[23] Q. Du, “Band selection and its impact on target detection and classifica-tion in hyperspectral image analysis,” in Proc. IEEE Workshop Adv. Tech.Anal. Remotely Sens. Data, Greenbelt, MD, USA, Oct. 27–28, 2003,pp. 374–377.

[24] H. Yang et al., “An efficient method for supervised hyperspectral bandselection,” IEEE Geosci. Remote Sens. Lett., vol. 8, no. 1, pp. 138–142,Jan. 2011.

[25] Y. Zhang, B. Du, and L. Zhang, “A sparse representation-based binaryhypothesis model for target detection in hyperspectral images,” IEEETrans. Geosci. Remote Sens., vol. 53, no. 3, pp. 1346–1354, Mar. 2015.

[26] Y. Chen, N. M. Nasrabadi, and T. D. Tran, “Hyperspectral image classifi-cation using dictionary-based sparse representation,” IEEE Trans. Geosci.Remote Sens., vol. 49, no. 10, pp. 3973–3985, Oct. 2011.

[27] Y. Chen, N. M. Nasrabadi, and T. D. Tran, “Sparse representation fortarget detection in hyperspectral imagery,” IEEE J. Sel. Topics SignalProcess., vol. 5, no. 3, pp. 629–640, Jun. 2011.

[28] S. Li and H. Qi, “Sparse representation based band selection for hyper-spectral images,” in Proc. 18th IEEE Int. Conf. Image Process. (ICIP),Brussels, Belgium, Sep. 11–14, 2011, pp. 2693–2696.

[29] J. M. Li and Y. T. Qian, “Clustering-based hyperspectral band selectionusing sparse nonnegative matrix factorization,” J. Zhejiang Univ. Sci. C,vol. 12, no. 7, pp. 542–549, 2011.

[30] Q. Du, J. M. Bioucas-Dias, and A. Plaza, “Hyperspectral band selectionusing a collaborative sparse model,” in Proc. IEEE Int. Geosci. RemoteSens. Symp. (IGARSS), Munich, Germany, Jul. 22–27, 2012, pp. 3054–3057.

[31] S. Chepushtanova, C. Gittins, and M. Kirby, “Band selection in hyper-spectral imagery using sparse support vector machines,” in Proc. SPIEDSS Conf., 2014, p. 90881F.

[32] E. Elhamifar and R. Vidal, “Sparse subspace clustering,” in Proc. IEEEConf. Comput. Vis. Pattern Recog. (CVPR’09), Miami, FL, USA, Jun.20–26, 2009, pp. 2790–2797.

[33] E. Elhamifar and R. Vidal, “Sparse subspace clustering: Algorithm, the-ory, and applications,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35,no. 11, pp. 2765–2781, Nov. 2013.

[34] U. Von Luxburg, “A tutorial on spectral clustering,” Statist. Comput.,vol. 17, no. 4, pp. 395–416, 2007.

[35] G. H. Mohimani, M. Babaie-Zadeh, and C. Jutten, “Fast sparse rep-resentation based on smoothed L0 norm,” in Independent ComponentAnalysis and Signal Separation. New York, NY, USA: Springer, 2007,pp. 389–396.

[36] K. Koh, S.-J. Kim, and S. P. Boyd, “An interior-point method for large-scale L1-regularized logistic regression,” J. Mach. Learn. Res., vol. 8,no. 8, pp. 1519–1555, 2007.

[37] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decompositionby basis pursuit,” SIAM J. Sci. Comput., vol. 20, no. 1, pp. 33–61, 1999.

[38] J. Yang and Y. Zhang, “Alternating direction algorithms for L1-problemsin compressive sensing,” SIAM J. Sci. Comput., vol. 33, no. 1, pp. 250–278, 2011.

[39] R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. Roy.Stat. Soc. B, vol. 58, no. 1, pp. 267–288, 1996.

[40] H. Zou and T. Hastie, “Regularization and variable selection via theelastic net,” J. Roy. Stat. Soc. B, vol. 67, no. 2, pp. 301–320, 2005.

[41] S. Wang et al., “Efficient subspace segmentation via quadratic program-ming,” in Proc. 25th AAAI Conf. Artif. Intell., San Francisco, CA, USA,Aug. 7–11, 2011, pp. 519–524.

[42] C.-Y. Lu et al., “Robust and efficient subspace segmentation via leastsquares regression,” in Proc. 12th Eur. Conf. Comput. Vis. Part VII,Florence, Italy, Oct. 7–13, 2012, pp. 347–360.

[43] G. H. Golub and C. F. Van Loan, Matrix Computations. Baltimore,Maryland: Johns Hopkins University Press, 2012.

[44] S. Yan and H. Wang, “Semi-supervised learning by sparse representa-tion,” in Proc. SIAM Int. Conf. Data Min. (SDM’09), Sparks, NV, USA,Apr. 30/May 2, 2009, pp. 792–801.

[45] Y. Gao, A. Choudhary, and G. Hua, “A nonnegative sparsity induced sim-ilarity measure with application to cluster analysis of spam images,” inProc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), Dallas,TX, USA, Mar. 14–19, 2010, pp. 5594–5597.

[46] S. Wu, X. Feng, and W. Zhou, “Spectral clustering of high-dimensionaldata exploiting sparse representation vectors,” Neurocomputing, vol. 135,pp. 229–239, 2014.

[47] S. Still and W. Bialek, “How many clusters? An information-theoreticperspective,” Neural Comput., vol. 16, no. 12, pp. 2483–2506, 2004.

[48] R. Tibshirani, G. Walther, and T. Hastie, “Estimating the number of clus-ters in a data set via the gap statistic,” J. Roy. Stat. Soc. B, vol. 63, no. 2,pp. 411–423, 2001.

[49] M. Honarkhah and J. Caers, “Classifying existing and generating newtraining image patterns in kernel space,” in Proc. 21st Stanford CenterReservoir Forecast. Aliate Meeting (SCRF), Stanford University, CA,USA, 2008, pp. 1–38.

[50] N. Bassiou, V. Moschou, and C. Kotropoulos, “Speaker diarizationexploiting the eigengap criterion and cluster ensembles,” IEEE Trans.Audio Speech Lang. Process., vol. 18, no. 8, pp. 2134–2144, Nov. 2010.

[51] R. Patil and K. Jondhale, “Edge based technique to estimate number ofclusters in k-means color image segmentation,” in Proc. 3rd IEEE Int.Conf. Comput. Sci. Inf. Technol. (ICCSIT), Beijing, China, Jul. 9–11,2010, pp. 117–121.

[52] M. Honarkhah and J. Caers, “Stochastic simulation of patterns usingdistance-based pattern modeling,” Math. Geosci., vol. 42, no. 5, pp. 487–517, 2010.

[53] Y. Qian, F. Yao, and S. Jia, “Band selection for hyperspectral imageryusing affinity propagation,” IET Comput. Vis., vol. 3, no. 4, pp. 213–222,Dec. 2009.

[54] C.-I. Chang et al., “A joint band prioritization and band-decorrelationapproach to band selection for hyperspectral image classification,” IEEETrans. Geosci. Remote Sens., vol. 37, no. 6, pp. 2631–2641, Nov. 1999.

[55] T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEETrans. Inf. Theory, vol. 13, no. 1, pp. 21–27, Jan. 1967.

[56] I. Steinwart and A. Christmann, Support Vector Machines. Berlin,Germany: Springer-Verlag, 2008.

[57] R. Rubinstein, M. Zibulevsky, and M. Elad, “Efficient implementationof the K-SVD algorithm using batch orthogonal matching pursuit,” CSTechnion, vol. 40, pp. 1–15, 2008.

Weiwei Sun received the B.S. degree in survey-ing and mapping and the Ph.D. degree in cartog-raphy and geographic information engineering fromTongji University, Shanghai, China, in 2007 and2013, respectively.

From 2011 to 2012, he was with the Departmentof Applied Mathematics, University of MarylandCollege Park, College Park, MD, USA, workingas a Visiting Scholar. He is currently an AssistantProfessor with Ningbo University, Ningbo, China,and is also working as a Postdoc with the State Key

Laboratory for Information Engineering in Surveying, Mapping, and RemoteSensing (LIESMARS), Wuhan University, Wuhan, China. He has authoredmore than 20 journal papers. His research interests include hyperspectral imageprocessing with manifold learning, anomaly detection, and target recognitionof remote sensing imagery using compressive sensing.

Liangpei Zhang (M’06–SM’08) received the B.S.degree in physics from Hunan Normal University,Changsha, China, in 1982, the M.S. degree in opticsfrom the Xi’an Institute of Optics and PrecisionMechanics, Chinese Academy of Sciences, Xi’an,China, in 1988, and the Ph.D. degree in photogram-metry and remote sensing from Wuhan University,Wuhan, China, in 1998.

He is currently the Head of the Remote SensingDivision, State Key Laboratory of InformationEngineering in Surveying, Mapping, and Remote

Sensing, Wuhan University. He is also a “Chang-Jiang Scholar” ChairProfessor appointed by the Ministry of Education of China. He is currently aPrincipal Scientist for the China State Key Basic Research Project (2011–2016)appointed by the Ministry of National Science and Technology of China to leadthe remote sensing program in China. He has authored more than 310 researchpapers. He is the holder of five patents. His research interests include hyper-spectral remote sensing, high-resolution remote sensing, image processing, andartificial intelligence.

Page 14: Band Selection Using Improved Sparse Subspace Clustering for … · 2015-10-12 · matrix); and 3) clustering the similarity matrix using spectral clustering [33]. Assume a high-dimensional

SUN et al.: BAND SELECTION USING ISSC FOR HSI CLASSIFICATION 2797

Dr. Zhang is an Executive Member (Board of Governor) of the ChinaNational Committee of International Geosphere–Biosphere Programme,Executive Member of the China Society of Image and Graphics, etc. He reg-ularly serves as a Co-Chair of the series SPIE Conferences on MultispectralImage Processing and Pattern Recognition, Conference on Asia RemoteSensing, and many other conferences. He edits several conference proceedings,issues, and geoinformatics symposiums. He also serves as an Associate Editorof the IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING,International Journal of Ambient Computing and Intelligence, InternationalJournal of Image and Graphics, International Journal of Digital MultimediaBroadcasting, Journal of Geo-spatial Information Science, and Journal ofRemote Sensing.

Bo Du (M’11–SM’15) received the B.S. degree inengineering from Wuhan University, Wuhan, China,in 2005, and the Ph.D. degree in photogrammetryand remote sensing from the State Key Laboratory ofInformation Engineering in Surveying, Mapping, andRemote sensing, Wuhan University, Wuhan, China, in2010.

He is currently an Associate Professor with theSchool of Computer, Wuhan University. His researchinterests include pattern recognition, hyperspectralimage processing, and signal processing.

Weiyue Li received the B.S. degree in geographyscience from Shandong Normal University, Jinan,China, in 2006, the M.S. degree in photogramme-try and remote sensing from Liaoning TechnicalUniversity, Huludao, China, in 2010, and the Ph.D.degree in cartography and geography informationengineering from Tongji University, Shanghai, China,in 2014.

He is working as an Assistant Researcher withthe Institute of Urban Studies, Shanghai NormalUniversity, Shanghai, China. His research interests

include feature extraction of hyperspectral imagery and LiDAR data, and thehazard analysis of landslide.

Yenming Mark Lai received the Ph.D. degree inapplied mathematics from the University of MarylandCollege Park, College Park, MD, USA, in 2014.

He is working as a Postdoc with the Institutefor Computational Engineering and Sciences (ICES),University of Texas at Austin, Austin, TX, USA.His research interests include manifold learning ofhyperspectral imagery and compressive sensing.