an improved svm method p‐svm for classification of remotely sensed data

This article was downloaded by: [University of Tennessee, Knoxville]On: 30 April 2013, At: 15:03Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

International Journal of RemoteSensingPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/tres20

An improved SVM method P‐SVM forclassification of remotely sensed dataR. Zhang a b & J. Ma aa State Key Laboratory of Remote Sensing Science, JointlySponsored by the Institute of Remote Sensing Applications,Chinese Academy of Sciences, and Beijing Normal University,Beijing, 100101, PR Chinab Graduate University, Chinese Academy of Sciences, Beijing,100049, PR ChinaPublished online: 20 Sep 2008.

To cite this article: R. Zhang & J. Ma (2008): An improved SVM method P‐SVM for classification ofremotely sensed data, International Journal of Remote Sensing, 29:20, 6029-6036

To link to this article: http://dx.doi.org/10.1080/01431160802220151

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representationthat the contents will be complete or accurate or up to date. The accuracy of anyinstructions, formulae, and drug doses should be independently verified with primarysources. The publisher shall not be liable for any loss, actions, claims, proceedings,demand, or costs or damages whatsoever or howsoever caused arising directly orindirectly in connection with or arising out of the use of this material.

http://www.tandfonline.com/loi/tres20

http://dx.doi.org/10.1080/01431160802220151

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

Letter

An improved SVM method P-SVM for classification of remotely senseddata

R. ZHANG*{{ and J. MA{

{State Key Laboratory of Remote Sensing Science, Jointly Sponsored by the Institute of

Remote Sensing Applications, Chinese Academy of Sciences, and Beijing Normal

University, Beijing, 100101, PR China

{Graduate University, Chinese Academy of Sciences, Beijing, 100049, PR China

(Received 13 October 2007; in final form 20 May 2008 )

A support vector machine (SVM) is a mathematical tool which is based on the

structural risk minimization principle. It tries to find a hyperplane in high

dimensional feature space to solve some linearly inseparable problems. SVM has

been applied within the remote sensing community to multispectral and

hyperspectral imagery analysis. However, the standard SVM faces some technical

disadvantages. For instance, the solution of an SVM learning problem is scale

sensitive, and the process is time-consuming. A novel Potential SVM (P-SVM)

algorithm is proposed to overcome the shortcomings of standard SVM and it has

shown some improvements. In this letter, the P-SVM algorithm is introduced

into multispectral and high-spatial resolution remotely sensed data classification,

and it is applied to ASTER imagery and ADS40 imagery respectively.

Experimental results indicate that the P-SVM is competitive with the standard

SVM algorithm in terms of accuracy of classification of remotely sensed data,

and the time needed is less.

1. Introduction

A support vector machine (SVM) is a machine learning algorithm based on statistical

learning theory proposed by Vapnik (1995), and used in data classification and

analysis. SVM classifies binary data by determining the separating hyperplane in a

high dimensional feature space, where the maximum margin between the two classes is

obtained from the training data. The SVM approach has been used in pattern

recognition, such as handwriting character recognition, text classification and medical

imaging analysis. The approach has also been applied in the remote sensing field.

SVM was used to classify ASTER data (Zhu and Blumberg 2002), in comparison with

the maximum likelihood classifier (MLC) approach, and also to estimate the accuracy

of classification for remotely sensed imagery (Keuchel et al. 2003). It has been used in

land cover change detection (Nemmour and Chibani 2006) and applied to

classification of hyper-spectral data (Melgani and Bruzzone 2004, Pal and Mather

2005). In addition, SVM has been used in the automatic extraction of man-made

objects from high-spatial resolution remote sensing images (Inglada 2007).

For remote sensing data classification, SVM can often achieve higher accuracy

compared with some traditional classification algorithms, such as MLC, decision

*Corresponding author. Email: [email protected]

International Journal of Remote Sensing

Vol. 29, No. 20, 20 October 2008, 6029–6036

International Journal of Remote SensingISSN 0143-1161 print/ISSN 1366-5901 online # 2008 Taylor & Francis

http://www.tandf.co.uk/journalsDOI: 10.1080/01431160802220151

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 15:

03 3

0 A

pril

2013

tree (DT), and artificial neural networks (ANN). However, the SVM algorithm still

contains some technical and conceptual restrictions. Firstly, because the final

predictor depends on how the training data have been scaled, the solution of a SVM

learning problem is scale sensitive. Secondly, the computational cost of SVM

classification is large; therefore, it is time-consuming, especially when the image size

is large. Moreover, the kernel function used in the SVM approach must meet the

Mercer condition, which means the kernel has to be positive semi-definite. To

overcome the disadvantages of the standard SVM mentioned above, Hochreiter and

Obermayer (2006) proposed a novel SVM method—Potential SVM (P-SVM). The

new approach defines a new objective function and constraints and it has been

applied to biomedical data analysis such as protein classification and variable

selection for genetic data.

In this letter, the novel P-SVM approach was introduced into remote sensing

classification. Experiments on ASTER multi-spectral data and ADS40 airborne

digital sensor imagery of Beijing were performed. A comparison of the accuracy of

classification and the time cost between the P-SVM and standard SVM were

presented.

2. Theory of P-SVM

The P-SVM algorithm includes several improvements over SVM. It uses a novel

objective function to overcome the problem of scale sensitivity in the standard SVM.

To enforce the empirical error, newly introduced constraints are employed.

Compared with the standard SVM, the number of support vectors found by P-

SVM is usually smaller (Hochreiter and Obermayer 2006). In general, the

classification time is dominated by the kernel evaluations and it is finally

proportional to the number of support vectors; hence, the smaller number of

support vectors leads to reduced processing time. A modified sequential minimal

optimization (SMO) algorithm based on the original SMO algorithm proposed by

Platt (1998) is implemented to decompose the optimization problem. In the

following, the mathematical formulations of P-SVM are outlined briefly.

Considering a two class classification task, (x1, y1), (x2, y2), …, (xk, yk), y g { + 1,

21} denotes the training samples with k members. The objective function and

constraints of the primal problem for classification of P-SVM can be written in

equations (1) and (2):

minw, jz, j{

1

2X T

w w��

��

2

zC jzzj{� �

ð1Þ

s:t: KT X Tw {y

� �

zjz§0, KT X T

w {y� �

{j{ƒ0, 0ƒjz, j{, ð2Þ

where j + , j2 are slack variables, Xw is the mapping in high dimensional feature

space, K is the kernel function and C is the penalty parameter.

To obtain the dual problem, the Lagrange function L is introduced, where a + >0,

a2>0, m + >0, m2>0 denote the Lagrange multipliers and a5a + –a2:

L~1

2wT XwX T

w wzC jzzj{� �

{ azð ÞT KT X Tw w{y

� �

zjz� �

z a{ð ÞT KT X Tw w{y

� �

{j{� �

{ mzð ÞT jz{ m{ð ÞT j{

ð3Þ

6030 R. Zhang and J. Ma

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 15:

03 3

0 A

pril

2013

The objective function and constraints in dual optimization form are then obtained:

mina

1

2aT KT Ka{yT Ka ð4Þ

s:t: {CƒaƒC, ð5Þ

where the parameter C is used to limit the ceiling of a.

The final predictor or discriminate function of the SVM solution is:

f xð Þ~sgnX

support vector

aiK xi:xð Þzb

!

, ð6Þ

where b is the bias and b~ 1m

Pm

i~1

yi. xi is the support vector used in the predictor.

In standard SVM, the kernel function must be the Mercer kernel. The original

Mercer theorem can be found in Cristianini and Shawe-Taylor (2000) and it will not

be given in detail in this letter. The conclusion of this theorem is that the conditions

for the Mercer theorem are equivalent to requiring that for any finite subset of input

space X, the corresponding matrix is positive semi-definite. The kernel functions

satisfying this theorem are often called Mercer kernels. In contrast to SVM, P-SVM

can handle non-Mercer kernels directly. The proof of this point of view will not be

given in this letter, but theoretical justification is provided in the original paper. For

further details, see Hochreiter and Obermayer (2006).

3. Experiments and discussion

To assess the applicability and potential for remotely sensed data, the new algorithm

was applied to ASTER SWIR and ADS40 datasets respectively. The experiments

made comparisons between P-SVM and standard SVM on accuracy of classifica-

tion, time for training and classification, and number of support vectors.

The procedure for the P-SVM and standard SVM experiments includes selection

of samples, training and classification. All the samples were pure pixels extracted

randomly from the images, and the samples for training and testing were selected

independently. The class definitions, descriptions and the number of samples for

each experiment are listed in tables 1 and 5. Before training, the hyperparameters

used in P-SVM and SVM are determined by grid search and cross validation. Grid

search represents a process where hyperparameters are chosen sequentially from a

range with a certain interval, and determined by cross validation finally.

Table 1. Description of samples extracted from ASTER SWIR data.

Class no. Class name Description Training Testing

1 Water Rivers and lakes 202 4102 Vegetation Forests 121 2693 Crops With coverage 208 3524 Cut crops Without coverage 224 3495 Built-up area City and towns 221 5676 Bare land Bare soil and uncultivated land 184 3877 Asphalt roads Main roads, over 50 m in width 66 102

Remote Sensing Letters 6031

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 15:

03 3

0 A

pril

2013

In this letter, 5-fold cross validation was used and the optimal hyperparameters

were acquired by the program. To compare with standard SVM, the P-SVM

experiment employed an RBF kernel function K xi, xj

� �

~e{c xi{xjk k2

. Although the

P-SVM algorithm can handle non-Mercer kernels such as the sine kernel

(Hochreiter and Obermayer 2006) directly in calculation, the parameters of this

kernel are complex and difficult to determine. Therefore, the property of the non-

Mercer kernel and its applicability for remotely sensed data will not be included in

this letter, and they need further exploration in future work. The SVM series

algorithms were originally designed for binary classification problems, and they

cannot directly handle the multi-class problem. However, this issue can be resolved

by training several SVMs simultaneously in a one-against-all or one-against-one

scheme. The most common multi-class approach is the one-against-all and it has

been demonstrated to be robust and accurate (Rifkin and Klautau 2004). The one-

against-all approach of multi-class SVM is used in this letter. Not only is it a robust

and accurate approach, but it facilitates access to support vectors directly. The

implementation of standard SVM used in the letter is LibSVM (Chang 2001). The

PC used in the experiments was equipped with an Intel E6600 dual-core processor

and 2 GB memory, and the operating system was Windows XP.

3.1 ASTER SWIR data experiment

In this experiment, P-SVM and standard SVM were applied to ASTER SWIR data

acquired on 2 July 2004. The study area is the city of Beijing, China. The image has

six bands, the spatial resolution is 30 m, and 103361112 pixels were subset to apply

to the experiment. The samples used in training and testing are defined in table 1.

The number of support vectors, training time, and classification time are listed in

table 2, and the points used as support vectors are listed in table 3. Confusion

matrices were used to evaluate the accuracy of classification (table 4).

According to table 2, the training time of P-SVM is longer than that of SVM.

Because the objective function and constraints of P-SVM are more complex than

those of SVM, more time is needed to solve the dual form of optimization problem

to find the support vectors. The classification time of P-SVM and SVM was 46 s

verses 103 s in this experiment. The less time spent by P-SVM is mainly attributed to

the smaller number of support vectors. In table 3, some of the points used as support

vectors are listed. The placement of the support vectors is different for P-SVM and

SVM. P-SVM tends to find support vectors in the area around the normal vector of

the hyperplane, while standard SVM tries to find support vectors around the

hyperplane (Hochreiter and Obermayer 2006). The practical situation for the

support vectors depends strongly on the specific data, and it may vary for different

datasets. For the ASTER data in this experiment, the distribution of support vectors

in P-SVM and SVM is quite different, but the overall accuracy and Kappa

coefficient obtained by both approaches are similar. According to table 4, the overall

accuracy of P-SVM is slightly greater than that of SVM.

Table 2. Number of support vectors and time.

Approach Training time Classification time SV

P-SVM 11 s 46 s 16, 18, 18, 16, 16, 17, 20 (121)SVM 2 s 103 s 15, 38, 11, 33, 14, 39, 51 (201)


Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 15:

03 3

0 A

pril

2013

For clarity, user’s accuracy and producer’s accuracy are shown in figure 1. The

water class achieves high accuracy by both approaches. For the crops class and bare

land class, the two approaches obtain similar user’s accuracy. The P-SVM approach

produces large user’s accuracy for the vegetation and asphalt roads class, especially

for the asphalt roads. However, the standard SVM approach performs slightly more

accurately than P-SVM for the cut crops class and built-up areas class. The P-SVM

approach performs more accurately for the vegetation, cut crops, and asphalt roads

class in producer’s accuracy, whereas the SVM approach achieves greater accuracy

Table 3. Points used as support vectors (in total 1226 points).

Classno. Approach Support vector index (part)

1 P-SVM 83, 107, 267, 356, 371, 508, 509, 512, 514, 528, 742, 789, 1121, 1130, 1159

SVM 83, 268, 370, 557, 672, 708, 731, 743, 769, 781, 979, 992, 1016, 1121, 1133

3 P-SVM 66, 260, 270, 398, 489, 505, 564, 716, 730, 742, 803, 972, 975, 1132, 1196

SVM 325, 400, 405, 448, 460, 461, 573, 591, 847, 971, 974

5 P-SVM 36, 82, 137, 304, 368, 404, 420, 568, 686, 692, 698, 776, 918, 929, 1132

SVM 400, 405, 457, 460, 743, 897, 899, 900, 913, 971, 974, 1193, 1222, 1223

7 P-SVM 128, 267, 270, 272, 273, 369, 404, 730, 768, 806, 902, 1132, 1137, 1163, 1171

SVM 209, 211, 219, 227, 230, 247, 251, 253, 254, 307, 308, 897, 900, 904, 913

Table 4. Confusion matrices and kappa coefficients for P-SVM and SVM classification ofASTER data.

Class no. 1 2 3 4 5 6 7

User’saccuracy

(%)

P-SVM: overall accuracy588.71%, kappa coefficient50.86501 409 0 0 0 0 0 0 1002 1 245 0 0 0 0 26 90.073 0 0 301 0 36 0 0 89.324 0 0 51 283 0 60 0 71.835 0 14 0 0 529 0 9 95.836 0 0 0 66 0 327 0 83.217 0 10 0 0 2 0 67 84.81Producer’s accuracy (%) 99.76 91.08 85.51 81.09 93.30 84.50 65.69SVM: overall accuracy587.77%, kappa coefficient50.85411 409 0 0 0 0 0 0 1002 1 214 0 0 0 0 43 82.953 0 0 315 0 36 0 0 89.744 0 0 37 276 0 51 0 75.825 0 3 0 0 529 0 0 99.446 0 0 0 73 0 336 0 82.157 0 52 0 0 2 0 59 52.21Producer’s accuracy (%) 99.76 79.55 89.49 79.08 93.30 86.82 57.84


Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 15:

03 3

0 A

pril

2013

for the crops and bare land class in producer’s accuracy. From table 4 and figure 1, a

conclusion cannot be drawn about which approach is more accurate for

classification. The result only indicates that the new P-SVM approach can produce

at least not lower accuracy than the standard SVM, and it is a sensible choice for

remotely sensed data classification because its processing time is lower.

3.2 ADS40 data experiment

The ADS40 airborne digital sensor image acquired on 11 May 2005, located on

the north area of Beijing, was used to evaluate the accuracy of classification of P-

SVM for ultra high-spatial resolution data. The image has three bands, the spatial

resolution is about 0.1 m, and 195261980 pixels were subset to perform the

experiment. The samples used in training and testing are defined in table 5. The

number of support vectors, training time, and classification time are listed in

table 6. Confusion matrices were used to evaluate the accuracy of classification

(table 7).

This experiment on ADS40 data indicates similar results to the previous one. The

training time of the P-SVM approach was longer than that of standard SVM, but

Figure 1. Comparison of accuracy of P-SVM and SVM classification. (a) User’s accuracy,(b) producer’s accuracy.

Table 6. Number of support vectors and time.

Approach Training time Classification time SV

P-SVM 34 s 348 s 28, 43, 45, 16, 59, 37 (228)SVM 3 s 541 s 41, 30, 30, 44, 129, 68 (342)

Table 5. Description of samples extracted from ADS40 data.

Class no. Class Description Training Testing

1 Water Ponds 655 8732 Grassland Grassland 672 9283 Trees Trees 314 6864 Asphalt roads Highways, more than 50 m in width 237 4665 Paved roads Internal roads, less than 50 m in width 323 4886 Concrete buildings Housing on side of roads 459 1140


Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 15:

03 3

0 A

pril

2013

the SVM spent more time in classification. Hence, P-SVM needs less time combining

the training and classification. Fewer support vectors lead to reduced time.

According to the confusion matrices (table 7) no approach is more accurate than the

other. P-SVM performs slightly more accurately than SVM in terms of overall

accuracy and kappa coefficient, but it produces larger errors of omission in the trees

class and asphalt roads class. For error of commission, P-SVM produces a more

accurate result for almost every class. Experimental results indicate that the P-SVM

approach is competitive, often performs more accurately than the standard SVM

approach, and the total time spent on training and classification is less.

4. Conclusion

The P-SVM algorithm is a novel classification approach for remotely sensed data,

and it was demonstrated to be applicable through two experiments on different

remote sensing datasets. Even though exact conclusions cannot be drawn due to the

limited experiments and datasets, the results indicate that the new approach is at

least competitive, often performs more accurately than the standard SVM approach,

and the total time needed is less. The use of non-Mercer kernel in remote sensing will

be explored in future research.

Acknowledgements

This study was supported financially by the Knowledge Innovation Program of the

Chinese Academy of Sciences (No.kzcx2-yw-313-3), and the National High

Technology Research and Development Program of China (No. 2007AA12Z157,

No. 2006AA12Z130). We are grateful to the referees for their helpful comments.

ReferencesCHANG, C.C. and LIN, C.J., 2001, LIBSVM: a library for support vector machines. Available

online at: http://www.csie.ntu.edu.tw/,cjlin/libsvm (accessed 20 July 2007).

Table 7. Confusion matrices and kappa coefficients for P-SVM and SVM classification ofADS40 data.

Class no. 1 2 3 4 5 6User’s

accuracy (%)

P-SVM: overall accuracy590.24%, kappa coefficient50.88191 823 0 0 0 0 0 1002 0 928 14 0 0 0 98.513 0 0 663 0 0 84 88.764 50 0 8 466 0 77 77.545 0 0 0 0 457 182 71.526 0 0 1 0 31 797 96.14Producer’s accuracy (%) 94.27 100 96.65 100 93.65 69.91SVM: overall accuracy588.23%, kappa coefficient50.85721 823 0 0 0 0 0 1002 1 928 18 0 0 141 85.293 0 0 667 0 0 0 1004 1 0 0 466 51 70 79.255 0 0 0 0 404 175 69.786 48 0 1 0 33 754 90.19Producer’s accuracy (%) 94.27 100 97.23 100 82.79 66.14


Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 15:

03 3

0 A

pril

2013

CRISTIANINI, N. and SHAWE-TAYLOR, J., 2000, An Introduction to Support Vector Machines

and other Kernel-based Learning Methods (Cambridge: Cambridge University Press).

HOCHREITER, S. and OBERMAYER, K., 2006, Support vector machines for dyadic data. Neural

Computation, 18, pp. 1472–1510.

INGLADA, J., 2007, Automatic recognition of man-made objects in high resolution optical

remote sensing images by SVM classification of geometric image features. ISPRS

Journal of Photogrammetry and Remote Sensing, 62, pp. 236–248.

KEUCHEL, J., NAUMANN, S., HEILER, M. and SIEGMUND, A., 2003, Automatic land cover

analysis for Tenerife by supervised classification using remotely sensed data. Remote

Sensing of Environment, 86, pp. 530–541.

MELGANI, F. and BRUZZONE, L., 2004, Classification of hyperspectral remote sensing images

with support vector machines. IEEE Transactions on Geoscience and Remote Sensing,

42, pp. 1778–1790.

NEMMOUR, H. and CHIBANI, Y., 2006, Multiple support vector machines for land cover

change detection: An application for mapping urban extensions. ISPRS Journal of

Photogrammetry and Remote Sensing, 61, pp. 125–133.

PAL, M. and MATHER, P.M., 2005, Support vector machines for classification in remote

sensing. International Journal of Remote Sensing, 26, pp. 1007–1011.

PLATT, J., 1998, Sequential minimal optimization: a fast algorithm for training support vector

machines. In Advances in Kernel Methods: Support Vector learning, B.

Scholkopf, C.J.C. Burges and A.J. Smola (Eds), pp. 169–182 (Cambridge, MA:

MIT Press).

RIFKIN, R. and KLAUTAU, A., 2004, In defense of one-vs-all classification. Journal of Machine

Learning Research, 5, pp. 101–141.

VAPNIK, V.N., 1995, The Nature of Statistical Learning Theory (New York: Springer-Verlag).

ZHU, G. and BLUMBERG, D.G., 2002, Classification using ASTER data and SVM algorithms.

The case study of Beer Sheva, Israel. Remote Sensing of Environment, 80, pp. 233–240.

6036 Remote Sensing Letters

Dow

nloa

ded

by [

Uni

vers

ity o

f T

enne

ssee

, Kno

xvill

e] a

t 15:

03 3

0 A

pril

2013

an improved svm method p‐svm for classification of remotely sensed data

Documents