an improved svm method p‐svm for classification of remotely sensed data
TRANSCRIPT
This article was downloaded by: [University of Tennessee, Knoxville]On: 30 April 2013, At: 15:03Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
International Journal of RemoteSensingPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/tres20
An improved SVM method P‐SVM forclassification of remotely sensed dataR. Zhang a b & J. Ma aa State Key Laboratory of Remote Sensing Science, JointlySponsored by the Institute of Remote Sensing Applications,Chinese Academy of Sciences, and Beijing Normal University,Beijing, 100101, PR Chinab Graduate University, Chinese Academy of Sciences, Beijing,100049, PR ChinaPublished online: 20 Sep 2008.
To cite this article: R. Zhang & J. Ma (2008): An improved SVM method P‐SVM for classification ofremotely sensed data, International Journal of Remote Sensing, 29:20, 6029-6036
To link to this article: http://dx.doi.org/10.1080/01431160802220151
PLEASE SCROLL DOWN FOR ARTICLE
Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions
This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representationthat the contents will be complete or accurate or up to date. The accuracy of anyinstructions, formulae, and drug doses should be independently verified with primarysources. The publisher shall not be liable for any loss, actions, claims, proceedings,demand, or costs or damages whatsoever or howsoever caused arising directly orindirectly in connection with or arising out of the use of this material.
Letter
An improved SVM method P-SVM for classification of remotely senseddata
R. ZHANG*{{ and J. MA{
{State Key Laboratory of Remote Sensing Science, Jointly Sponsored by the Institute of
Remote Sensing Applications, Chinese Academy of Sciences, and Beijing Normal
University, Beijing, 100101, PR China
{Graduate University, Chinese Academy of Sciences, Beijing, 100049, PR China
(Received 13 October 2007; in final form 20 May 2008 )
A support vector machine (SVM) is a mathematical tool which is based on the
structural risk minimization principle. It tries to find a hyperplane in high
dimensional feature space to solve some linearly inseparable problems. SVM has
been applied within the remote sensing community to multispectral and
hyperspectral imagery analysis. However, the standard SVM faces some technical
disadvantages. For instance, the solution of an SVM learning problem is scale
sensitive, and the process is time-consuming. A novel Potential SVM (P-SVM)
algorithm is proposed to overcome the shortcomings of standard SVM and it has
shown some improvements. In this letter, the P-SVM algorithm is introduced
into multispectral and high-spatial resolution remotely sensed data classification,
and it is applied to ASTER imagery and ADS40 imagery respectively.
Experimental results indicate that the P-SVM is competitive with the standard
SVM algorithm in terms of accuracy of classification of remotely sensed data,
and the time needed is less.
1. Introduction
A support vector machine (SVM) is a machine learning algorithm based on statistical
learning theory proposed by Vapnik (1995), and used in data classification and
analysis. SVM classifies binary data by determining the separating hyperplane in a
high dimensional feature space, where the maximum margin between the two classes is
obtained from the training data. The SVM approach has been used in pattern
recognition, such as handwriting character recognition, text classification and medical
imaging analysis. The approach has also been applied in the remote sensing field.
SVM was used to classify ASTER data (Zhu and Blumberg 2002), in comparison with
the maximum likelihood classifier (MLC) approach, and also to estimate the accuracy
of classification for remotely sensed imagery (Keuchel et al. 2003). It has been used in
land cover change detection (Nemmour and Chibani 2006) and applied to
classification of hyper-spectral data (Melgani and Bruzzone 2004, Pal and Mather
2005). In addition, SVM has been used in the automatic extraction of man-made
objects from high-spatial resolution remote sensing images (Inglada 2007).
For remote sensing data classification, SVM can often achieve higher accuracy
compared with some traditional classification algorithms, such as MLC, decision
*Corresponding author. Email: [email protected]
International Journal of Remote Sensing
Vol. 29, No. 20, 20 October 2008, 6029–6036
International Journal of Remote SensingISSN 0143-1161 print/ISSN 1366-5901 online # 2008 Taylor & Francis
http://www.tandf.co.uk/journalsDOI: 10.1080/01431160802220151
Dow
nloa
ded
by [
Uni
vers
ity o
f T
enne
ssee
, Kno
xvill
e] a
t 15:
03 3
0 A
pril
2013
tree (DT), and artificial neural networks (ANN). However, the SVM algorithm still
contains some technical and conceptual restrictions. Firstly, because the final
predictor depends on how the training data have been scaled, the solution of a SVM
learning problem is scale sensitive. Secondly, the computational cost of SVM
classification is large; therefore, it is time-consuming, especially when the image size
is large. Moreover, the kernel function used in the SVM approach must meet the
Mercer condition, which means the kernel has to be positive semi-definite. To
overcome the disadvantages of the standard SVM mentioned above, Hochreiter and
Obermayer (2006) proposed a novel SVM method—Potential SVM (P-SVM). The
new approach defines a new objective function and constraints and it has been
applied to biomedical data analysis such as protein classification and variable
selection for genetic data.
In this letter, the novel P-SVM approach was introduced into remote sensing
classification. Experiments on ASTER multi-spectral data and ADS40 airborne
digital sensor imagery of Beijing were performed. A comparison of the accuracy of
classification and the time cost between the P-SVM and standard SVM were
presented.
2. Theory of P-SVM
The P-SVM algorithm includes several improvements over SVM. It uses a novel
objective function to overcome the problem of scale sensitivity in the standard SVM.
To enforce the empirical error, newly introduced constraints are employed.
Compared with the standard SVM, the number of support vectors found by P-
SVM is usually smaller (Hochreiter and Obermayer 2006). In general, the
classification time is dominated by the kernel evaluations and it is finally
proportional to the number of support vectors; hence, the smaller number of
support vectors leads to reduced processing time. A modified sequential minimal
optimization (SMO) algorithm based on the original SMO algorithm proposed by
Platt (1998) is implemented to decompose the optimization problem. In the
following, the mathematical formulations of P-SVM are outlined briefly.
Considering a two class classification task, (x1, y1), (x2, y2), …, (xk, yk), y g { + 1,
21} denotes the training samples with k members. The objective function and
constraints of the primal problem for classification of P-SVM can be written in
equations (1) and (2):
minw, jz, j{
1
2X T
w w���
���
2
zC jzzj{� �
ð1Þ
s:t: KT X Tw {y
� �
zjz§0, KT X T
w {y� �
{j{ƒ0, 0ƒjz, j{, ð2Þ
where j + , j2 are slack variables, Xw is the mapping in high dimensional feature
space, K is the kernel function and C is the penalty parameter.
To obtain the dual problem, the Lagrange function L is introduced, where a + >0,
a2>0, m + >0, m2>0 denote the Lagrange multipliers and a5a + –a2:
L~1
2wT XwX T
w wzC jzzj{� �
{ azð ÞT KT X Tw w{y
� �
zjz� �
z a{ð ÞT KT X Tw w{y
� �
{j{� �
{ mzð ÞT jz{ m{ð ÞT j{
ð3Þ
6030 R. Zhang and J. Ma
Dow
nloa
ded
by [
Uni
vers
ity o
f T
enne
ssee
, Kno
xvill
e] a
t 15:
03 3
0 A
pril
2013
The objective function and constraints in dual optimization form are then obtained:
mina
1
2aT KT Ka{yT Ka ð4Þ
s:t: {CƒaƒC, ð5Þ
where the parameter C is used to limit the ceiling of a.
The final predictor or discriminate function of the SVM solution is:
f xð Þ~sgnX
support vector
aiK xi:xð Þzb
!
, ð6Þ
where b is the bias and b~ 1m
Pm
i~1
yi. xi is the support vector used in the predictor.
In standard SVM, the kernel function must be the Mercer kernel. The original
Mercer theorem can be found in Cristianini and Shawe-Taylor (2000) and it will not
be given in detail in this letter. The conclusion of this theorem is that the conditions
for the Mercer theorem are equivalent to requiring that for any finite subset of input
space X, the corresponding matrix is positive semi-definite. The kernel functions
satisfying this theorem are often called Mercer kernels. In contrast to SVM, P-SVM
can handle non-Mercer kernels directly. The proof of this point of view will not be
given in this letter, but theoretical justification is provided in the original paper. For
further details, see Hochreiter and Obermayer (2006).
3. Experiments and discussion
To assess the applicability and potential for remotely sensed data, the new algorithm
was applied to ASTER SWIR and ADS40 datasets respectively. The experiments
made comparisons between P-SVM and standard SVM on accuracy of classifica-
tion, time for training and classification, and number of support vectors.
The procedure for the P-SVM and standard SVM experiments includes selection
of samples, training and classification. All the samples were pure pixels extracted
randomly from the images, and the samples for training and testing were selected
independently. The class definitions, descriptions and the number of samples for
each experiment are listed in tables 1 and 5. Before training, the hyperparameters
used in P-SVM and SVM are determined by grid search and cross validation. Grid
search represents a process where hyperparameters are chosen sequentially from a
range with a certain interval, and determined by cross validation finally.
Table 1. Description of samples extracted from ASTER SWIR data.
Class no. Class name Description Training Testing
1 Water Rivers and lakes 202 4102 Vegetation Forests 121 2693 Crops With coverage 208 3524 Cut crops Without coverage 224 3495 Built-up area City and towns 221 5676 Bare land Bare soil and uncultivated land 184 3877 Asphalt roads Main roads, over 50 m in width 66 102
Remote Sensing Letters 6031
Dow
nloa
ded
by [
Uni
vers
ity o
f T
enne
ssee
, Kno
xvill
e] a
t 15:
03 3
0 A
pril
2013
In this letter, 5-fold cross validation was used and the optimal hyperparameters
were acquired by the program. To compare with standard SVM, the P-SVM
experiment employed an RBF kernel function K xi, xj
� �
~e{c xi{xjk k2
. Although the
P-SVM algorithm can handle non-Mercer kernels such as the sine kernel
(Hochreiter and Obermayer 2006) directly in calculation, the parameters of this
kernel are complex and difficult to determine. Therefore, the property of the non-
Mercer kernel and its applicability for remotely sensed data will not be included in
this letter, and they need further exploration in future work. The SVM series
algorithms were originally designed for binary classification problems, and they
cannot directly handle the multi-class problem. However, this issue can be resolved
by training several SVMs simultaneously in a one-against-all or one-against-one
scheme. The most common multi-class approach is the one-against-all and it has
been demonstrated to be robust and accurate (Rifkin and Klautau 2004). The one-
against-all approach of multi-class SVM is used in this letter. Not only is it a robust
and accurate approach, but it facilitates access to support vectors directly. The
implementation of standard SVM used in the letter is LibSVM (Chang 2001). The
PC used in the experiments was equipped with an Intel E6600 dual-core processor
and 2 GB memory, and the operating system was Windows XP.
3.1 ASTER SWIR data experiment
In this experiment, P-SVM and standard SVM were applied to ASTER SWIR data
acquired on 2 July 2004. The study area is the city of Beijing, China. The image has
six bands, the spatial resolution is 30 m, and 103361112 pixels were subset to apply
to the experiment. The samples used in training and testing are defined in table 1.
The number of support vectors, training time, and classification time are listed in
table 2, and the points used as support vectors are listed in table 3. Confusion
matrices were used to evaluate the accuracy of classification (table 4).
According to table 2, the training time of P-SVM is longer than that of SVM.
Because the objective function and constraints of P-SVM are more complex than
those of SVM, more time is needed to solve the dual form of optimization problem
to find the support vectors. The classification time of P-SVM and SVM was 46 s
verses 103 s in this experiment. The less time spent by P-SVM is mainly attributed to
the smaller number of support vectors. In table 3, some of the points used as support
vectors are listed. The placement of the support vectors is different for P-SVM and
SVM. P-SVM tends to find support vectors in the area around the normal vector of
the hyperplane, while standard SVM tries to find support vectors around the
hyperplane (Hochreiter and Obermayer 2006). The practical situation for the
support vectors depends strongly on the specific data, and it may vary for different
datasets. For the ASTER data in this experiment, the distribution of support vectors
in P-SVM and SVM is quite different, but the overall accuracy and Kappa
coefficient obtained by both approaches are similar. According to table 4, the overall
accuracy of P-SVM is slightly greater than that of SVM.
Table 2. Number of support vectors and time.
Approach Training time Classification time SV
P-SVM 11 s 46 s 16, 18, 18, 16, 16, 17, 20 (121)SVM 2 s 103 s 15, 38, 11, 33, 14, 39, 51 (201)
6032 R. Zhang and J. Ma
Dow
nloa
ded
by [
Uni
vers
ity o
f T
enne
ssee
, Kno
xvill
e] a
t 15:
03 3
0 A
pril
2013
For clarity, user’s accuracy and producer’s accuracy are shown in figure 1. The
water class achieves high accuracy by both approaches. For the crops class and bare
land class, the two approaches obtain similar user’s accuracy. The P-SVM approach
produces large user’s accuracy for the vegetation and asphalt roads class, especially
for the asphalt roads. However, the standard SVM approach performs slightly more
accurately than P-SVM for the cut crops class and built-up areas class. The P-SVM
approach performs more accurately for the vegetation, cut crops, and asphalt roads
class in producer’s accuracy, whereas the SVM approach achieves greater accuracy
Table 3. Points used as support vectors (in total 1226 points).
Classno. Approach Support vector index (part)
1 P-SVM 83, 107, 267, 356, 371, 508, 509, 512, 514, 528, 742, 789, 1121, 1130, 1159
SVM 83, 268, 370, 557, 672, 708, 731, 743, 769, 781, 979, 992, 1016, 1121, 1133
3 P-SVM 66, 260, 270, 398, 489, 505, 564, 716, 730, 742, 803, 972, 975, 1132, 1196
SVM 325, 400, 405, 448, 460, 461, 573, 591, 847, 971, 974
5 P-SVM 36, 82, 137, 304, 368, 404, 420, 568, 686, 692, 698, 776, 918, 929, 1132
SVM 400, 405, 457, 460, 743, 897, 899, 900, 913, 971, 974, 1193, 1222, 1223
7 P-SVM 128, 267, 270, 272, 273, 369, 404, 730, 768, 806, 902, 1132, 1137, 1163, 1171
SVM 209, 211, 219, 227, 230, 247, 251, 253, 254, 307, 308, 897, 900, 904, 913
Table 4. Confusion matrices and kappa coefficients for P-SVM and SVM classification ofASTER data.
Class no. 1 2 3 4 5 6 7
User’saccuracy
(%)
P-SVM: overall accuracy588.71%, kappa coefficient50.86501 409 0 0 0 0 0 0 1002 1 245 0 0 0 0 26 90.073 0 0 301 0 36 0 0 89.324 0 0 51 283 0 60 0 71.835 0 14 0 0 529 0 9 95.836 0 0 0 66 0 327 0 83.217 0 10 0 0 2 0 67 84.81Producer’s accuracy (%) 99.76 91.08 85.51 81.09 93.30 84.50 65.69SVM: overall accuracy587.77%, kappa coefficient50.85411 409 0 0 0 0 0 0 1002 1 214 0 0 0 0 43 82.953 0 0 315 0 36 0 0 89.744 0 0 37 276 0 51 0 75.825 0 3 0 0 529 0 0 99.446 0 0 0 73 0 336 0 82.157 0 52 0 0 2 0 59 52.21Producer’s accuracy (%) 99.76 79.55 89.49 79.08 93.30 86.82 57.84
Remote Sensing Letters 6033
Dow
nloa
ded
by [
Uni
vers
ity o
f T
enne
ssee
, Kno
xvill
e] a
t 15:
03 3
0 A
pril
2013
for the crops and bare land class in producer’s accuracy. From table 4 and figure 1, a
conclusion cannot be drawn about which approach is more accurate for
classification. The result only indicates that the new P-SVM approach can produce
at least not lower accuracy than the standard SVM, and it is a sensible choice for
remotely sensed data classification because its processing time is lower.
3.2 ADS40 data experiment
The ADS40 airborne digital sensor image acquired on 11 May 2005, located on
the north area of Beijing, was used to evaluate the accuracy of classification of P-
SVM for ultra high-spatial resolution data. The image has three bands, the spatial
resolution is about 0.1 m, and 195261980 pixels were subset to perform the
experiment. The samples used in training and testing are defined in table 5. The
number of support vectors, training time, and classification time are listed in
table 6. Confusion matrices were used to evaluate the accuracy of classification
(table 7).
This experiment on ADS40 data indicates similar results to the previous one. The
training time of the P-SVM approach was longer than that of standard SVM, but
Figure 1. Comparison of accuracy of P-SVM and SVM classification. (a) User’s accuracy,(b) producer’s accuracy.
Table 6. Number of support vectors and time.
Approach Training time Classification time SV
P-SVM 34 s 348 s 28, 43, 45, 16, 59, 37 (228)SVM 3 s 541 s 41, 30, 30, 44, 129, 68 (342)
Table 5. Description of samples extracted from ADS40 data.
Class no. Class Description Training Testing
1 Water Ponds 655 8732 Grassland Grassland 672 9283 Trees Trees 314 6864 Asphalt roads Highways, more than 50 m in width 237 4665 Paved roads Internal roads, less than 50 m in width 323 4886 Concrete buildings Housing on side of roads 459 1140
6034 R. Zhang and J. Ma
Dow
nloa
ded
by [
Uni
vers
ity o
f T
enne
ssee
, Kno
xvill
e] a
t 15:
03 3
0 A
pril
2013
the SVM spent more time in classification. Hence, P-SVM needs less time combining
the training and classification. Fewer support vectors lead to reduced time.
According to the confusion matrices (table 7) no approach is more accurate than the
other. P-SVM performs slightly more accurately than SVM in terms of overall
accuracy and kappa coefficient, but it produces larger errors of omission in the trees
class and asphalt roads class. For error of commission, P-SVM produces a more
accurate result for almost every class. Experimental results indicate that the P-SVM
approach is competitive, often performs more accurately than the standard SVM
approach, and the total time spent on training and classification is less.
4. Conclusion
The P-SVM algorithm is a novel classification approach for remotely sensed data,
and it was demonstrated to be applicable through two experiments on different
remote sensing datasets. Even though exact conclusions cannot be drawn due to the
limited experiments and datasets, the results indicate that the new approach is at
least competitive, often performs more accurately than the standard SVM approach,
and the total time needed is less. The use of non-Mercer kernel in remote sensing will
be explored in future research.
Acknowledgements
This study was supported financially by the Knowledge Innovation Program of the
Chinese Academy of Sciences (No.kzcx2-yw-313-3), and the National High
Technology Research and Development Program of China (No. 2007AA12Z157,
No. 2006AA12Z130). We are grateful to the referees for their helpful comments.
ReferencesCHANG, C.C. and LIN, C.J., 2001, LIBSVM: a library for support vector machines. Available
online at: http://www.csie.ntu.edu.tw/,cjlin/libsvm (accessed 20 July 2007).
Table 7. Confusion matrices and kappa coefficients for P-SVM and SVM classification ofADS40 data.
Class no. 1 2 3 4 5 6User’s
accuracy (%)
P-SVM: overall accuracy590.24%, kappa coefficient50.88191 823 0 0 0 0 0 1002 0 928 14 0 0 0 98.513 0 0 663 0 0 84 88.764 50 0 8 466 0 77 77.545 0 0 0 0 457 182 71.526 0 0 1 0 31 797 96.14Producer’s accuracy (%) 94.27 100 96.65 100 93.65 69.91SVM: overall accuracy588.23%, kappa coefficient50.85721 823 0 0 0 0 0 1002 1 928 18 0 0 141 85.293 0 0 667 0 0 0 1004 1 0 0 466 51 70 79.255 0 0 0 0 404 175 69.786 48 0 1 0 33 754 90.19Producer’s accuracy (%) 94.27 100 97.23 100 82.79 66.14
Remote Sensing Letters 6035
Dow
nloa
ded
by [
Uni
vers
ity o
f T
enne
ssee
, Kno
xvill
e] a
t 15:
03 3
0 A
pril
2013
CRISTIANINI, N. and SHAWE-TAYLOR, J., 2000, An Introduction to Support Vector Machines
and other Kernel-based Learning Methods (Cambridge: Cambridge University Press).
HOCHREITER, S. and OBERMAYER, K., 2006, Support vector machines for dyadic data. Neural
Computation, 18, pp. 1472–1510.
INGLADA, J., 2007, Automatic recognition of man-made objects in high resolution optical
remote sensing images by SVM classification of geometric image features. ISPRS
Journal of Photogrammetry and Remote Sensing, 62, pp. 236–248.
KEUCHEL, J., NAUMANN, S., HEILER, M. and SIEGMUND, A., 2003, Automatic land cover
analysis for Tenerife by supervised classification using remotely sensed data. Remote
Sensing of Environment, 86, pp. 530–541.
MELGANI, F. and BRUZZONE, L., 2004, Classification of hyperspectral remote sensing images
with support vector machines. IEEE Transactions on Geoscience and Remote Sensing,
42, pp. 1778–1790.
NEMMOUR, H. and CHIBANI, Y., 2006, Multiple support vector machines for land cover
change detection: An application for mapping urban extensions. ISPRS Journal of
Photogrammetry and Remote Sensing, 61, pp. 125–133.
PAL, M. and MATHER, P.M., 2005, Support vector machines for classification in remote
sensing. International Journal of Remote Sensing, 26, pp. 1007–1011.
PLATT, J., 1998, Sequential minimal optimization: a fast algorithm for training support vector
machines. In Advances in Kernel Methods: Support Vector learning, B.
Scholkopf, C.J.C. Burges and A.J. Smola (Eds), pp. 169–182 (Cambridge, MA:
MIT Press).
RIFKIN, R. and KLAUTAU, A., 2004, In defense of one-vs-all classification. Journal of Machine
Learning Research, 5, pp. 101–141.
VAPNIK, V.N., 1995, The Nature of Statistical Learning Theory (New York: Springer-Verlag).
ZHU, G. and BLUMBERG, D.G., 2002, Classification using ASTER data and SVM algorithms.
The case study of Beer Sheva, Israel. Remote Sensing of Environment, 80, pp. 233–240.
6036 Remote Sensing Letters
Dow
nloa
ded
by [
Uni
vers
ity o
f T
enne
ssee
, Kno
xvill
e] a
t 15:
03 3
0 A
pril
2013