mathematical programming in support vector machines olvi l. mangasarian university of wisconsin -...
TRANSCRIPT
![Page 1: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/1.jpg)
Mathematical Programming in Support Vector Machines
Olvi L. Mangasarian
University of Wisconsin - Madison
High Performance Computation for Engineering Systems Seminar
MIT October 4, 2000
![Page 2: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/2.jpg)
What is a Support Vector Machine?
An optimally defined surfaceTypically nonlinear in the input spaceLinear in a higher dimensional spaceImplicitly defined by a kernel function
![Page 3: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/3.jpg)
What are Support Vector Machines Used For?
ClassificationRegression & Data FittingSupervised & Unsupervised Learning
(Will concentrate on classification)
![Page 4: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/4.jpg)
Example of Nonlinear Classifier:Checkerboard Classifier
![Page 5: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/5.jpg)
Outline of Talk
Generalized support vector machines (SVMs)Completely general kernel allows complex classification
(No Mercer condition!) Smooth support vector machines
Smooth & solve SVM by a fast Newton method Lagrangian support vector machines
Very fast simple iterative scheme-One matrix inversion: No LP. No QP.
Reduced support vector machinesHandle large datasets with nonlinear kernels
![Page 6: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/6.jpg)
Generalized Support Vector Machines2-Category Linearly Separable Case
A+
A-
wx0w = í + 1
x0w = í à 1
![Page 7: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/7.jpg)
Generalized Support Vector MachinesAlgebra of 2-Category Linearly Separable Case
Given m points in n dimensional space Represented by an m-by-n matrix A Membership of each in class +1 or –1 specified by:A i
An m-by-m diagonal matrix D with +1 & -1 entries
D(Awà eí )=e;
More succinctly:
where e is a vector of ones.
x0w = í æ1: Separate by two bounding planes,
A iw=í + 1; for D i i = + 1;A iw5í à 1; for D i i = à 1:
![Page 8: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/8.jpg)
Generalized Support Vector MachinesMaximizing the Margin between Bounding Planes
wx0w = í + 1
x0w = í à 1
A+
A-
jjwjj22
![Page 9: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/9.jpg)
Generalized Support Vector MachinesThe Linear Support Vector Machine Formulation
s.t. D(Awà eí ) + y = e
Solve the following mathematical program for some :
w;í ;ymin ÷e0y+ 2
kwk
y = 0:
÷> 0
The nonnegative slack variable is zero iff: Convex hulls of and do not intersect is sufficiently large
yA + A à
÷
D(Awà eí )=e
![Page 10: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/10.jpg)
Breast Cancer Diagnosis Application97% Tenfold Cross Validation Correctness780 Samples:494 Benign, 286 Malignant
![Page 11: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/11.jpg)
Another Application: Disputed Federalist PapersBosch & Smith 1998
56 Hamilton, 50 Madison, 12 Disputed
![Page 12: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/12.jpg)
Generalized Support Vector Machine Motivation
(Nonlinear Kernel Without Mercer Condition)
Linear SVM: Linear separating surface: x0w = ímin ÷e0y+ k w k1
s.t. D(Awà eí ) + y=e; y=0 Set w = A0Du. Resulting linear surface:x0A0Du = í
min ÷e0y+ k u k1
s.t. D(AA0Du à eí ) + y=e; y=0Replace AA0by arbitrary nonlinear kernel K (A;A0) Resulting nonlinear surface: K (x0;A0)Du = í
min ÷e0y+ k u k1
s.t. D(K (A;A0)Du à eí ) + y=e;y=0
![Page 13: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/13.jpg)
SSVM: Smooth Support Vector Machine(SVM as Unconstrained Minimization Problem)
Changing to 2-norm and measuring margin in( ) space:
![Page 14: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/14.jpg)
Smoothing the Plus Function: Integrate the Sigmoid Function
![Page 15: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/15.jpg)
SSVM: The Smooth Support Vector Machine Smoothing the Plus Function
Integrating the sigmoid approximation to the step function:
s(x;ë) = 1+"à ëx1 ;
gives a smooth, excellent approximation to the plus function:
p(x;ë) = x + ë1 log(1+ "à ëx); ë > 0:
Replacing the plus function in the nonsmooth SVMby the smooth approximation gives our SSVM:
min Ðë(w;í ) :=
min2÷k p(eà D(Awà eí );ë) k2
2 + 21 k w;í k2
2
![Page 16: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/16.jpg)
Newton: Minimize a sequence of quadratic approximationsto the strongly convex objective function, i.e. solve a sequenceof linear equations in n+1 variables. (Small dimensional inputspace.)
Armijo: Shorten distance between successive iterates so as to generate sufficient decrease in objective function. (In computational reality, not needed!)
Global Quadratic Convergence: Starting from any point,the iterates guaranteed to converge to the unique solution at a quadratic rate, i.e. errors get squared. (Typically, 6 to 8 iterations without an Armijo.)
![Page 17: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/17.jpg)
SSVM with a Nonlinear Kernel Nonlinear Separating Surface in Input Space
![Page 18: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/18.jpg)
Examples of Kernels Generate Nonlinear Separating Surfaces in Input Space
A 2 Rmâ n;a 2 Rm;ö 2 R;dintegerPolynomial Kernel
(AA0+ öaa0)d
Neural Network Kernel
(AA0+ öaa0)ã(á)ã : R ! f 0;1g
Gaussian (Radial Basis) Kernel
"à ökA ià A jk2; i;j=1;. . .;m:
![Page 19: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/19.jpg)
![Page 20: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/20.jpg)
![Page 21: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/21.jpg)
![Page 22: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/22.jpg)
![Page 23: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/23.jpg)
![Page 24: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/24.jpg)
![Page 25: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/25.jpg)
LSVM: Lagrangian Support Vector MachineDual of SVM
Taking the dual of the SVM formulation:
,
gives the following simple dual problem:
min0ô u2R m 21u0(÷
I + D(AA0+ ee0)D)uà e0u
The variables (w;í ;y) of SSVM are related to u by:
w = A0Du; y = ÷u; í = à e0Du:
![Page 26: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/26.jpg)
LSVM: Lagrangian Support Vector MachineDual SVM as Symmetric Linear Complementarity Problem
The optimality condition for this dual SVM is the LCP:
0 ô u ? Qu à eõ 0;
min 0ô u2Rm f (u) := 21u0Qu à e0u:
Reduces the dual SVM to:
Defining the two matrices:
H = D[A à e]; Q = ÷I + HH0
which, by Implicit Lagrangian Theory, is equivalent to:
Qu à e= ((Qu à e) à ëu)+:ë > 0;
![Page 27: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/27.jpg)
LSVM AlgorithmSimple & Linearly Convergent – One Small Matrix Inversion
ui+1 = Qà 1(e+ ((Qui à e) à ëui)+); i = 0;1; . . .Where:
0< ë < ÷2
Key Idea: Sherman-Morrison-Woodbury formula allows the inversioninversion of an extremely large m-by-m matrix Q by merely invertinga much smaller n-by-n matrix as follows:
(÷I + HH0)à 1 = ÷(I à H(÷
I + H0H)à 1H0):
![Page 28: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/28.jpg)
LSVM Algorithm – Linear Kernel11 Lines of MATLAB Code
function [it, opt, w, gamma] = svml(A,D,nu,itmax,tol)% lsvm with SMW for min 1/2*u'*Q*u-e'*u s.t. u=>0,% Q=I/nu+H*H', H=D[A -e]% Input: A, D, nu, itmax, tol; Output: it, opt, w, gamma% [it, opt, w, gamma] = svml(A,D,nu,itmax,tol); [m,n]=size(A);alpha=1.9/nu;e=ones(m,1);H=D*[A -e];it=0; S=H*inv((speye(n+1)/nu+H'*H)); u=nu*(1-S*(H'*e));oldu=u+1; while it<itmax & norm(oldu-u)>tol z=(1+pl(((u/nu+H*(H'*u))-alpha*u)-1)); oldu=u; u=nu*(z-S*(H'*z)); it=it+1; end; opt=norm(u-oldu);w=A'*D*u;gamma=-e'*D*u;
function pl = pl(x); pl = (abs(x)+x)/2;
![Page 29: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/29.jpg)
LSVM Algorithm – Linear KernelComputational Results
2 Million random points in 10 dimensional spaceClassified in 6.7 minutes in 6 iterations & e-5 accuracy250 MHz UltraSPARC II with 2 gigabyte memoryCPLEX ran out of memory
32562 points in 123-dimensional space (UCI Adult Dataset)Classified in141 seconds & 55 iterations to 85% correctness400 MHz Pentium II with 2 gigabyte memorySVM classified in 178 seconds & 4497 iterationslight
![Page 30: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/30.jpg)
LSVM – Nonlinear KernelFormulation
K (A;B) : Rmâ n â Rnâ l ! Rmâ l;
For the nonlinear kernel:
the separating nonlinear surface is given by:
K ([x0 à 1]; à e0A0
h i)Du = 0
Where u is the solution of the dual problem:
05u2Rmmin f (u) := 2
1u0Qu à e0u;with Q redefined as:
G = [A à e]; Q = ÷I + DK (G;G0)D
![Page 31: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/31.jpg)
LSVM Algorithm – Nonlinear Kernel Application 100 Iterations, 58 Seconds on Pentium II, 95.9% Accuracy
![Page 32: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/32.jpg)
Reduced Support Vector Machines (RSVM)
Large Nonlinear Kernel Classification Problems
is a small random sample ofK (A;Aö0);where Aö0 A0 Key idea: Use a rectangular kernel.
Typically Aö has 1% to 10% of the rows of A
Two important consequences:RSVM can solve very large problems
Aö Nonlinear separator depends on only
uö;í ;ymin
2÷y0y+ 2
1(uö0uö+ í 2)
s:t: D(K (A;Aö0)Döuöà eí ) + y=e;y=0
gives lousy resultsK (Aö;Aö0) Separating surface: K (x0;Aö0)Döuö = í
![Page 33: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/33.jpg)
Conventional SVM Result on Checkerboard Using 50 Random Points Out of 1000
![Page 34: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/34.jpg)
RSVM Result on Checkerboard Using SAME 50 Random Points Out of 1000
![Page 35: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/35.jpg)
RSVM on Large Classification ProblemsStandard Error over 50 Runs = 0.001 to 0.002RSVM Time = 1.24 * (Random Points Time)
![Page 36: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/36.jpg)
Conclusion
Mathematical Programming plays an essential role in SVMs
TheoryNew formulations
Generalized SVMsNew algorithm-generating concepts
Smoothing (SSVM)
Implicit Lagrangian (LSVM)Algorithms
Fast : SSVMMassive: LSVM, RSVM
![Page 37: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/37.jpg)
Future Research
TheoryConcave minimization
Concurrent feature & data selection Multiple-instance problems
SVMs as complementarity problems
Algorithms
Multicategory classification algorithms
Kernel methods in nonlinear programming
Chunking for massive classification: 108
![Page 38: Mathematical Programming in Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison High Performance Computation for Engineering](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649dc55503460f94ab8c0f/html5/thumbnails/38.jpg)
Talk & Papers Available on Web
www.cs.wisc.edu/~olvi