a multi-objective genetic algorithm for pruning support vector machines

17
Support Vector Machine SVM Pruning Experiments Conclusion Future Work A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines Mohamed Abdel Hady, Wessam Herbawi, Friedhelm Schwenker Institute of Neural Information Processing University of Ulm, Germany {mohamed.abdel-hady}@uni-ulm.de November 4, 2011 1 / 15

Upload: mohamed-farouk

Post on 14-Jun-2015

168 views

Category:

Technology


0 download

DESCRIPTION

Support vector machines (SVMs) often contain a large number of support vectors which reduce the run-time speeds of decision functions. In addition, this might cause an overfitting effect where the resulting SVM adapts itself to the noise in the training set rather than the true underlying data distribution and will probably fail to correctly classify unseen examples. To obtain more fast and accurate SVMs, many methods have been proposed to prune SVs in trained SVMs. In this paper, we propose a multi-objective genetic algorithm to reduce the complexity of support vector machines as well as to improve generalization accuracy by the reduction of overfitting. Experiments on four benchmark datasets show that the proposed evolutionary approach can effectively reduce the number of support vectors included in the decision functions of SVMs without sacrificing their classification accuracy.

TRANSCRIPT

Page 1: A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines

Support Vector Machine SVM Pruning Experiments Conclusion Future Work

A Multi-Objective Genetic Algorithm forPruning Support Vector Machines

Mohamed Abdel Hady, Wessam Herbawi,Friedhelm Schwenker

Institute of Neural Information ProcessingUniversity of Ulm, Germany

{mohamed.abdel-hady}@uni-ulm.de

November 4, 2011

1 / 15

Page 2: A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines

Support Vector Machine SVM Pruning Experiments Conclusion Future Work

Support Vector Machine

+

- +

+

+

+

+

+

+ +

-

-

-

-

--

-

-

-

+

{x|‹w, ϕ(x)›+b = 0}

w

y = -1 y = +1

{x|‹w, ϕ(x)›+b = -1}

{x|‹w, ϕ(x)›+b = +1}

Maximum

margin

+

+

-

-

є1

є4

є2

є3

+-

2 / 15

Page 3: A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines

Support Vector Machine SVM Pruning Experiments Conclusion Future Work

Support Vector Machine

To obtain the optimal hyperplane, one solves the following convex quadraticoptimization problem with respect to weight vector w and bias b:

minw,b

12‖w‖2 + C

n∑i=1

εi , (1)

subject to the constraints:

yi (〈w , φ(xi )〉+ b) ≥ 1− εi , εi ≥ 0 for i = 1 . . . , n (2)

The regularization parameter C controls the trade-off between maximizing the margin1/ ‖w‖ and minimizing the sum of slack variables of the training examples

εi = max(0, 1− yi (〈w , φ(xi )〉+ b))for i = 1, . . . , n. (3)

The training example xi is correctly classified if 0 ≤ εi < 1 and is misclassified when

εi ≥ 1.

3 / 15

Page 4: A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines

Support Vector Machine SVM Pruning Experiments Conclusion Future Work

Support Vector Machine

The problem is converted into its equivalent dual problem, using standard Lagrangiantechniques, whose number of variables is the number of training examples.

maxα

n∑i=1

αi −12

n∑i,j=1

αiαj yi yj k(xi , xj ) (4)

subject to the constraints

n∑i=1

αi yi = 0 and 0 ≤ αi ≤ C for i = 1, . . . n. (5)

where the coefficients α∗i are the optimal solution of the dual problem and k is thekernel function. Hence, the decision function to classify unseen example x can bewritten as:

f (x) =nsv∑i=1

α∗i yi k(x , xi ) + b∗, (6)

The training examples xi with α∗i > 0 are called support vectors and the number of

support vectors is denoted by nsv ≤ n.

4 / 15

Page 5: A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines

Support Vector Machine SVM Pruning Experiments Conclusion Future Work

SVM Pruning

The classification time complexity of the SVM classifier scales with the number ofsupport vectors (O(nsv )).

To reduce the complexity of SVM, the number of support vectors should bereduced

To reduce the overfitting (over-training) of SVM, the number of support vectorsshould be reduced

Indirect methods: reduce the number of training examples{(xi , yi ) : i = 1, . . . , n} [Pedrajas, IEEE TNN 2009]

Direct methods: The multiobjective evolutionary SVM proposed in this paper isthe first evolutionary algorithm that reformulates SVM pruning as a combinatorialmulti-objective optimization problem.

5 / 15

Page 6: A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines

Support Vector Machine SVM Pruning Experiments Conclusion Future Work

Genetic Algorithm for Support Vector Selection

Evaluate SVM

simplified decision

function

GA Operators

(Selection, Crossover

and Mutation)

Evaluate the fitness of

individuals in

population

Number of support

vectors

Training error

Genetic Algorithm

support vectors

indices

6 / 15

Page 7: A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines

Support Vector Machine SVM Pruning Experiments Conclusion Future Work

Representation (Encoding)

For support vector selection a binary encoding is appropriate. Here, the t th

candidate solution in a population is an nsv -dimensional bit vector st ∈ {0, 1}nsv .The j th support vector will be included in the decision function if stj = 1 andexcluded when stj = 0. For instance, if we have a problem with 7 supportvectors, the t th individual solution of the population can be represented asst = (1, 0, 0, 1, 1, 1, 0) or st = (0, 1, 0, 1, 1, 0, 1).

Then for each solution with bit vector st , only the summation of the n′sv selectedsupport vectors are performed to define the reduced decision function (freduced ),which is used in Eq. (9) to evaluate the fitness of solution st .

freduced (xi , st ) =

nsv∑j=1

stjα∗j yj Kij + b∗, (7)

7 / 15

Page 8: A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines

Support Vector Machine SVM Pruning Experiments Conclusion Future Work

Selection Criteria (Objectives)

determine the quality of each candidate solution in the population. We want todesign classifiers with high generalization ability.

There is a trade-off between SVM complexity and its training error (the numberof misclassified examples on the set n training examples)

the following two objective functions are used to measure the fitness of a solutionst :

f1(st ) = n′sv =

nsv∑j=1

stj (8)

and

f2(st ) =n∑

i=1

1(yi 6=sgn(freduced (xi ,st ))) (9)

where freduced is the reduced decision function defined in Eq. (7) and sgn is theindicator function with values -1 and +1. It is easy to achieve zero training errorwhen all training examples are support vectors, but this solution is not likely togeneralize well (prone to overfitting).

8 / 15

Page 9: A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines

Support Vector Machine SVM Pruning Experiments Conclusion Future Work

Experimental Setup

soft-margin L1-SVMs with Gaussian kernel function

k(x , xi ) = exp(−γ ‖x − xi‖2) (10)

with γ = 1/d and the regularization term C =1.

four benchmark datasets from UCI Benchmark Repository, ionosphere, diabetes,sick, and german credit where the number of features (d) is 34, 8, 29, and 20,respectively.

All features are normalized to have zero mean and unit variance.

Each dataset is divided randomly into two subsets, 10% are used as testsetDtest , while the remaining 90% are used as training examples Dtrain. Thus, thesize of training sets (n) is 315, 691, 3394 and 900 and the size of test set (m) is36, 77, 378 and 100, respectively.

At the beginning of the experiment, a soft margin L1-norm SVM is constructedusing subset Dtrain and SMO algorithm.

The training error f2(st ) of each individual solution st (support vector subset) isevaluated on subset Dtrain where CE(train) = f2(st )/n. After each run of MOGA,we evaluate the average test set error CE(test) of each solution in the final set ofPareto-optimal solutions using subset Dtest .

9 / 15

Page 10: A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines

Support Vector Machine SVM Pruning Experiments Conclusion Future Work

Experimental Results

For the application of the NSGA-II we choose a population size of 100 and theother parameters of the NSGA-II (pc = 0.9, pmut = 1/nsv , ηc = 20, ηmut = 20)where the two objectives given in Eq. (8) and Eq. (9) are optimized.

For each dataset, ten optimization runs of MOGA are carried out, each of themlasting for 10000 generations.

Pareto-optimal solutions after pruning compared to unpruned SVM

dataset ionosphere diabetes sick german creditbefore [101, 4, 10] [399, 126, 14] [503, 88, 12] [820, 20, 27]

after [0, 202, 23] [0, 450, 50] [0, 208, 23] [8, 259, 26]to [15, 3, 5] to [101, 125, 18] to [92, 83, 13] to [283, 57, 22]

the solutions are written as triple [nsv , n.CE(train), m.CE(test)]

10 / 15

Page 11: A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines

Support Vector Machine SVM Pruning Experiments Conclusion Future Work

Pareto Fronts

0 5 10 150

0.1

0.2

0.3

0.4

0.5

0.6

0.7ionosphere

0 50 100 1500.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8diabetes

after pruning: CE(train)after pruning: CE(test)before pruning: CE(train)before pruning: CE(test)

0 20 40 60 80 1000.02

0.03

0.04

0.05

0.06

0.07sick

0 100 200 3000

0.05

0.1

0.15

0.2

0.25

0.3

0.35german credit

11 / 15

Page 12: A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines

Support Vector Machine SVM Pruning Experiments Conclusion Future Work

Experimental Results

For many solutions for ionosphere and german credit, we can see the effort ofoverfitting as the generalization ability of the SVM classifier was improved afterpruning while the training error get worse.

A typical MOO heuristic is to select a solution (support vector subset) thatcorresponds to an interesting part of the Pareto front.

12 / 15

Page 13: A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines

Support Vector Machine SVM Pruning Experiments Conclusion Future Work

Attainment Surfaces

0 5 10 15 20 250

0.1

0.2

0.3

0.4

0.5

0.6

0.7ionosphere

0 50 100 1500.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8diabetes

attainment surface: 10thattainment surface: 5thattainment surface: 1stbefore pruning

0 50 100 150 2000.02

0.03

0.04

0.05

0.06

0.07sick

0 100 200 300 4000

0.05

0.1

0.15

0.2

0.25

0.3

0.35german credit

13 / 15

Page 14: A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines

Support Vector Machine SVM Pruning Experiments Conclusion Future Work

Experimental Results

The attainment curves have a maximum complexity of 22, 132, 171, and 300 forionosphere, diabetes, sick and german credit, respectively. That is, theevolutionary pruning approach achieved a percentage of complexity reductionequals to 78.2%, 66.9%, 66% and 63.4% for the four datasets, repectivelywithout sacrificing the training error.

14 / 15

Page 15: A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines

Support Vector Machine SVM Pruning Experiments Conclusion Future Work

Conclusion

Support vector selection is a multi-objective optimization problem. We havedescribed a genetic algorithm to reduce the computational complexity of supportvector machines by reducing number of support vectors comprised in theirdecision functions.

The resulting Pareto fronts visualize the trade-off between SVM complexity andits training error for guiding the support vector selection

For some data sets, the experimental results show that the test set classificationaccuracy is improved after pruning without sacrificing the training set accuracy.Thus, the post-pruning of SVMs achieved the same effect of post-pruningdecision trees where it reduces overfitting.

15 / 15

Page 16: A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines

Support Vector Machine SVM Pruning Experiments Conclusion Future Work

Future Work

We plan to extend the application of the proposed approach to regression tasksthat suffer from the same problem of large number of support vectors in thedecision functions of support vector regression machines.

In addition, we will conduct further experiments using other types of kernelfunctions as we used only Gaussian kernels in the presented experiments. Weexpect that the percentage of complexity reduction is kernel-dependent.

16 / 15

Page 17: A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines

Support Vector Machine SVM Pruning Experiments Conclusion Future Work

Thanks for your attention

Questions ??

17 / 15