smooth support vector machines for classification and ... · smooth support vector machines for...
TRANSCRIPT
![Page 1: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/1.jpg)
Smooth Support Vector Machinesfor Classification and Regression
Yuh-Jye Lee
National Taiwan University of Science and Technology
International Summer Workshop on the Economics, Financialand Managerial Applications of Computational Intelligence
August 16~20, 2004
![Page 2: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/2.jpg)
Fundamental Problems in Data Mining
Supervised learning:
Feature selection
Classification problems
Too many features could degrade generalizationperformance, curse of dimensionality
Regression problems
Unsupervised learning:Clustering algorithms
Association Rules
![Page 3: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/3.jpg)
Binary Classification Problem
Supervised learning in Machine Learning
Successful applications:
(A Fundamental Problem in Data Mining)
Fisher Linear Discriminator
Find a decision function (rule) to discriminatetwo categories data sets.
Discrimination Analysis in Statistics
Decision Tree, Neural Network, k-NN andSupport Vector Machines, etc.
Marketing, Bioinformatics, Fraud detection
![Page 4: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/4.jpg)
Bankruptcy PredictionBinary classification of firms: solvent vs. bankrupt
The data are financial indicators from middle-marketcapitalization firms in Benelux. From a total of 422firms, 74 went bankrupt and 348 were solvent. Thevariables to be used in the model as explanatory inputsare 40 financial indicators such as: liquidity, profitabilityand solvency measurements.
T. Van Gestel, B. Baesens, J. A. K. Suykens, M. Espinoza, D. Baestaens, J. Vanthienen and B. De Moor, “Bankruptcy Prediction with Least SquaresSupport Vector Machine Classifiers”, International Conference in Computational Intelligence and FinancialEngineering, 2003.
![Page 5: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/5.jpg)
Binary Classification ProblemLinearly Separable Case
A-
A+
x0w + b = à 1
wx0w + b = + 1x0w + b = 0
Bankrupt
Solvent
![Page 6: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/6.jpg)
Why Use Support Vector Machines?Powerful tools for Data Mining
SVM classifier is an optimally defined surfaceSVMs have a good geometric interpretationSVMs can be generated very efficientlyCan be extended from linear to nonlinear case
Typically nonlinear in the input spaceLinear in a higher dimensional “feature” spaceImplicitly defined by a kernel function
Have a sound theoretical foundationBased on Statistical Learning Theory
![Page 7: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/7.jpg)
Support Vector MachinesMaximizing the Margin between Bounding Planes
x0w + b = + 1
x0w + b = à 1
A+
A-
w
||w||22 = Margin
![Page 8: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/8.jpg)
Algebra of the Classification ProblemLinearly Separable Case
Given l points in the n dimensional real space Rn
Represented by an `â n matrix A
orMembership of each point Ai in the classes Aà A+
is specified by an `â ` diagonal matrix D :Dii = à 1 if Ai ∈ Aà and Dii = 1 Ai ∈ A+if
Separate Aà and A+ by two bounding planes such that:
Aiw+ b > + 1, for Dii = + 1,Aiw+ b 6 à 1, for Dii = à 1
More succinctly: D(Aw+ eb)>e
e = [1, 1, . . ., 1]0 ∈ R`.
, where
![Page 9: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/9.jpg)
Support Vector Classification(Linearly Separable Case)
Let S = {(x1, y1), (x2, y2), . . .(xl, yl)}be a linearly
separable training sample and represented bymatrices
A =
(x1)0
(x2)0...(xl)0
⎡⎢⎣⎤⎥⎦ ∈ Rlân, D =
y1 á á á 0.... . .
...0 á á á yl
" #∈ Rlâl
![Page 10: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/10.jpg)
D(Aw+ eb) >e+ ø
ø>0
where ø : nonnegative slack (error) vector
The term e0ø , 1-norm measure of error vector, is called the training error.
minw, b, ø
e0ø
s.t. (LP)
Robust Linear ProgrammingPreliminary Approach to SVM
For the linearly separable case, at solution of (LP):
ø = 0
![Page 11: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/11.jpg)
xj
x
x
x
x
x
x
x
x
o
o
o
o
o
o
o
oi
í
í
øj
øi
![Page 12: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/12.jpg)
Two Different Measures of Training Error
min(w,b,ø)∈Rn+1+l
21||w||22 + 2
C||ø||22D(Aw+ eb) + ø>e
2-Norm Soft Margin:
1-Norm Soft Margin:min
(w,b,ø)∈Rn+1+l21||w||22 + Ce 0ø
D(Aw+ eb) + ø>e, ø > 0
Margin is maximized by minimizing reciprocal of
margin.
![Page 13: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/13.jpg)
1-Norm Soft Margin Dual Formulation
The Lagrangian for 1-norm soft margin:L(w, b, ø, ë, r) =
21w 0w + Ce 0ø+ë 0[eàD(Aw + eb)à ø]à r 0ø
where ë>0 & r>0
The partial derivatives with respect to primalvariables equal zeros
∂w∂L(w,b,ø,ë) = w à A 0Dë = 0
∂b∂L(w,b,ø,ë) = e 0Dë = 0,
∂ø∂L(w,b,ø,ë) = C à ë à r = 0
![Page 14: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/14.jpg)
Dual Maximization Problemfor 1-Norm Soft Margin
Dual:
0 6 ë 6 Ce
maxë∈Rl
e 0ë à21ë 0DAA 0Dë
e 0Dë = 0
The corresponding KKT complementarity:
06ë ⊥ D(Aw + eb) + øà e>0
06ø ⊥ ëà Ce60
![Page 15: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/15.jpg)
Slack Variables for 1-Norm Soft Margin ( )
Non-zero slack can only occur when ëãi = C
The points for which 0 < ëãi < C lie at the
bounding planesThis will help us to find bã
The contribution of outlier in the decisionrule will be at most CThe trade-off between accuracy and regularization directly controls by C
wã = A 0Dëã
![Page 16: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/16.jpg)
Tuning Procedure
overfitting
The final value of parameter is one with the maximum testing set correctness !
![Page 17: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/17.jpg)
SVM as anUnconstrained Minimization Problem
Hence (QP) is equivalent to the nonsmooth SVM:minw , b 2
C k(e à D(Aw + eb))+k22 + 2
1(kwk22 + b2)
2C køk2
2 + 21(kwk2
2 + b2)
D(Aw + eb) + ø>eø>0,w, bmin
s. t.(QP)
Change (QP) into an unconstrained MP
Reduce (n+1+m) variables to (n+1) variables
At the solution of (QP) :where (á )+= max{á ,0}
ø = (e à D (Aw + eb))+
.
![Page 18: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/18.jpg)
Smooth the Plus Function: Integrate
Step function: xã Sigmoid function:(1+εà5x)
1
Plus function: x+ p-function: p(x, 5)
(1+εàìx)1
p(x, ì) := x +ì1 log(1 + εàìx)
![Page 19: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/19.jpg)
SSVM: Smooth Support Vector Machine
(á )+Replacing the plus function in the nonsmoothSVM by the smooth p(á , ì) , gives our SSVM:
ìnonsmooth SVM as goes to infinity.The solution of SSVM converges to the solution of
ì = 5(Typically, )
min(w, b) ∈ Rn+1 2
Ckp((e à D(Aw + eb)), ì)k22 + 2
1(kwk22 + b2)
, obtained by integrating the sigmoid function(á )+ofHere, p ( á , ì ) is an accurate smooth approximation
of neural networks. (sigmoid = smoothed step)
![Page 20: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/20.jpg)
Newton-Armijo Method: Quadratic Approximation of SSVM
(w i, b i)è é
generated by solving aThe sequence
(wã, bã)quadratic approximation of SSVM, converges to the
of SSVM at a quadratic rate.
At each iteration we solve a linear system of:n+1 equations in n+1 variablesComplexity depends on dimension of input space
Converges in 6 to 8 iterations
unique solution
It might be needed to select a stepsize
![Page 21: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/21.jpg)
Newton-Armijo Algorithm
Start with any (w0, b0) ∈ Rn+1 . Having (wi, bi),
stop if ∇Φì(wi, bi) = 0, else :
(i) Newton Direction :
∇2Φì(wi, bi)d
i = à∇Φì(wi, bi)
0
(ii) Armijo Stepsize :
(wi+1, bi+1) = (wi, bi) + õidi
õ i = { 1 , 21 , 4
1 , . . . }
globally and globally and quadraticallyquadraticallyconverge to converge to unique unique solution in a solution in a finite number finite number of stepsof steps
![Page 22: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/22.jpg)
It can not converge to optimum solution !
f(x) = à 61x 6 + 4
1x 4 + 2x 2
g(x) = f(xi) + f0(xi)(x à xi) + 21 f00(xi)(x à xi)
Why use Why use stepsizestepsize
![Page 23: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/23.jpg)
Comparisons of SSVM with other SVMs
89.17128.15
86.1042.41
89.633.69
Ionosphere 351 x 34
61.834.91
66.233.72
68.181.03
WPBC(60 months)110 x 22
82.0212.50
71.086.25
83.472.32
WPBC(24 months)155 x 32
77.071138.0
74.47286.59
78.121.54
Pima Indians768 x 8
69.86124.23
64.0319.94
70.331.05
BUPA Liver345 x 6
72.1267.55
84.5518.71
86.131.63
Cleveland Heart297 x 13
mâ nDataset Size SSVM SVMíí á
íí2
2SVMíí á
íí1
Tenfold test set correctness % (best in Red)CPU time in seconds
QPLPLinear Eqns.
![Page 24: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/24.jpg)
Two-spiral Dataset(94 White Dots & 94 Red Dots)
![Page 25: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/25.jpg)
X Fþ
þ( ) þ( )
þ( )þ( )
þ( )
þ( )
þ( )þ( )
![Page 26: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/26.jpg)
Dual Representation of SVM(Key of Kernel Methods: )
The hypothesis is determined by (ëã, bã)
h(x) = sgn(êx, A 0Dëãë+ bã)
= sgn(Pi=1
l
yiëãi
êxi, x
ë+ bã)
= sgn(Pëã
i>0
yiëãi
êxi, x
ë+ bã)
w = A0Dëã =Pi=1
`
yiëãiA
0i
Remember : A0i = xi
![Page 27: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/27.jpg)
f(x) =ðP
i=1
?
wiþi(x)ñ+ b
Linear Machine in Feature Space
Let þ : X→ F be a nonlinear map from the
input space to some feature space
The classifier will be in the form (Primal):
Make it in the dual form:
f(x) =ðP
i=1
l
ëiyiêþ(xi) á þ(x)ëñ+ b
![Page 28: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/28.jpg)
K(x, z) =êþ(x) á þ(z)ë
Kernel: Represent Inner Product in Feature Space
The classifier will become:
f(x) =ðP
i=1
l
ëiyiK(xi, x)ñ+ b
Definition: A kernel is a function K : XâX→ R
such that for all x, z ∈ X
where þ : X→ F
![Page 29: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/29.jpg)
A Simple Example of Kernel
Polynomial Kernel of Degree 2: K(x, z) =êx, z
ë2Let x =
x1
x2
ô õ, z =
z1z2
ô õ∈ R2 and the nonlinear map
þ : R2 7→R3 defined by þ(x) =x21
x22
2√
x1x2
⎡⎣ ⎤⎦ .
Thenêþ(x), þ(z)
ë=
êx, z
ë2= K(x, z).
There are many other nonlinear maps, ψ(x), that
satisfy the relation:êψ(x),ψ(z)
ë=
êx, z
ë2= K(x, z)
![Page 30: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/30.jpg)
Power of the Kernel Technique
Consider a nonlinear map þ : Rn 7→Rp that consistsof distinct features of all the monomials of degree d.
Then p = n+ dà 1d
ð ñ.
For example: n = 11, d = 10, p = 92378
Is it necessary? We only need to knowêþ(x), þ(z)
ë!
This can be achieved K(x, z) =êx, z
ëd
![Page 31: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/31.jpg)
More Examples of KernelK(A,B) : Rmân âRnâl 7à→Rmâl
A ∈ Rmân, a ∈ Rm, ö ∈ R, d is an integer:
Polynomial Kernel : (AA 0 + öaa 0)dï)(Linear KernelAA 0 : ö = 0, d = 1
Gaussian (Radial Basis) Kernel :
εàökAiàAjk22, i, j=1, . . .,mK(A,A0)ij =
The ij-entry of K(A,A0) represents the “similarity”of data pointsAi Ajand
![Page 32: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/32.jpg)
The value of kernel function represents the inner product of two training points in feature space
Kernel functions merge two steps1. map input data from input space to
feature space (might be infinite dim.)2. do inner product in the feature space
Kernel TechniqueBased on Mercer’s Condition (1909)
![Page 33: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/33.jpg)
Nonlinear SVM Motivation
Linear SVM: (Linear separator:x 0w + b = 0 )
2C k øk 2
2 + 21(kw k 2
2 + b 2 )
D (Aw + eb) + ø> eø>0, w, bmin
s. t.(QP)
By QP “duality”, w = A 0Dë. Maximizing the marginin the “dual space” gives:
2Ckp(e à D(AA 0Dë + eb), ì)k2
2+ 21(këk22 + b2)
ë, bmin
Dual SSVM with separator: x0A0Dë+ b = 0
2C k øk 2
2 + 21(kë k 2
2 + b 2 )
D (AA 0Dë + eb) + ø> eø>0, ë, bmin
s. t.
![Page 34: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/34.jpg)
Nonlinear Smooth SVMK (x 0, A 0)Dë + b = 0
K (A, A 0)ReplaceAA 0by a nonlinear kernel :
2Ckp(eàD(K(A,A0)Dë+ eb, ì)k22+ 2
1(këk22 + b2)
ë, bmin
Use Newton-Armijo algorithm to solve the problem
Each iteration solves m+1 linear equations in m+1 variables
Nonlinear classifier depends on entire dataset :
K (x 0, A 0)Dë + b = 0
Nonlinear Classifier:
![Page 35: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/35.jpg)
Difficulties with Nonlinear SVM for Large Problems
The nonlinear kernel K(A,A0) ∈ Rlâl is fully dense
Computational complexity depends on # of example
Separating surface depends on almost entire datasetComplexity of nonlinear SSVM ø O((l+ 1)3)
Runs out of memory while storing the kernel matrixLong CPU time to compute the dense kernel matrix
O(l2)Need to generate and store entries
Need to store the entire dataset even after solvingthe problem
![Page 36: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/36.jpg)
Solving the SVM with Massive Dataset
Limit the SVM to dataset of a few thousand points
Solution I: SMO (Sequential Minimal Optimization)
Standard optimization techniques require that the the data are held in memory
Solve the sub-optimization problem defined by the working set (size =2)Increase the objective function iteratively
Solution II: RSVM (Reduced Support Vector Machine)
![Page 37: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/37.jpg)
Support Vector Regression(Linear Case: f(x) = x0w+ b)
Given the training set:S = {(xi, yi)| xi ∈ Rn, yi ∈ R, i = 1, . . ., l}
Find a linear function, f(x) = x0w+ b where(w, b) is determined by solving a minimizationproblem that guarantees the smallest overallexperiment error made by f(x) = x0w+ b
Motivated by SVM:||w||2 should be as small as possibleSome tiny error should be discard
![Page 38: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/38.jpg)
-Insensitive Loss Functionï
-insensitive loss function:ï
|yi à f(xi)|ï = max{0, |yi à f(xi)| à ï}
The loss made by the estimation function, fat the data point (xi, yi) is
|ø|ï = max{0, |ø| à ï}= 0 if |ø|6ï|ø| à ï otherwise
úIf ø ∈ Rn then |ø|ï ∈ Rn is defined as:
(|ø|ï)i = |øi|ï , i = 1. . .n
![Page 39: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/39.jpg)
x
x
x
x
x
x
x
x
x
ε
ε
-Insensitive Linear Regressionε
f(x) = x0w + b
yj à f(xj)à εf(xk)à yk à ε
Find (w, b) with the smallest overall error
![Page 40: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/40.jpg)
ï - insensitive Support Vector Regression Model
Motivated by SVM:||w||2 should be as small as possible
Some tiny error should be discarded
min(w,b,ø)∈Rn+1+m
21||w||22 + Ce0 ø| |ï
where ø| |ï ∈ Rm, ( ø| |ï)i = max(0, Aiw+ bà yi| | à ï)
![Page 41: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/41.jpg)
Reformulated - SVR as a Constrained Minimization Problem
min(w,b,ø,øã)∈Rn+1+2m
21w0w+ Ce0(ø+ øã)
yàAwà eb 6 eï+ øAw+ ebà y 6 eï+ øã
ø, øã > 0
subject to
n+1+2m variables and 2m constrains minimization problem
ï
Enlarge the problem size and computational complexity for solving the problem
![Page 42: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/42.jpg)
Five Wild Used Loss Functions
![Page 43: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/43.jpg)
SV Regression by Minimizing Quadratic -Insensitive Lossï
We minimize ||(w, b)||22 at the same timeOccam’s razor: the simplest is the best
min(w,b,ø)∈Rn+1+l
21(||w||22 + b2) +
2C ||(|ø|ε)||22
We have the following (nonsmooth) problem:
where (|ø|ε)i = |yi à (w0xi + b)|ε
Have the strong convexity of the problem
![Page 44: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/44.jpg)
- insensitive Loss Functionï
(à x à ï)+ (xà ï)+x| |ï
![Page 45: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/45.jpg)
Quadratic -insensitive Loss Functionï
x| |2ï = ((xà ï)+ + (à xà ï)+)2
= (x à ï)2+ + (à x à ï)
2+
(x à ï ) + á (à x à ï ) + = 0
![Page 46: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/46.jpg)
p2ï-function replace UseïQuadratic -insensitive Function
p(x, ì)
p2ï(x, ì) = (p(xà ï, ì))2 + (p(à xà ï, ì))2
which is defined byp(x, ì) = x+
ì1 log(1 + expàìx)
p -function withë=10, p(x,10), x∈[à3,3]
![Page 47: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/47.jpg)
x| | 2ï p 2ï(x, ì), ï = 1, ì = 5
![Page 48: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/48.jpg)
-insensitive Smooth Support Vector Regression
ï
min(w,b)∈Rn+1
21(w0w+ b2) +
2CPi=1
m
p2ï(Aiw+ bà yi, ì)2CPi=1
m
A iw + b à y i| |2ï
This problem is a strongly convexstrongly convexminimization problem without any constrainsThe object function is twice differentiabletwice differentiablethus we can use a fast NewtonNewton--ArmijoArmijomethodmethod to solve this problem
min(w,b)∈Rn+1
Φï,ë(w, b) :=
![Page 49: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/49.jpg)
Nonlinear -SVRï
w = A0ë, ë ∈ Rm
Based on duality theorem and KKT–optimalityconditions
y ù Aw + eb
y ù AA0ë + eb
y ù K(A,A0)ë + ebIn nonlinear case :
![Page 50: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/50.jpg)
Nonlinear SVR
min(ë,b)∈Rm+1
21||ë||22 + C
Pi=1
m
K(Ai,A0)ë + b à yi| |ï
A ∈ Rmân, B ∈ Rnâ l
K(A, B)
ï à
Letand Rmân â Rnâl=⇒Rmâl
K(Ai,A0) ∈ R1âmand
Nonlinear regression function : f(x) = K(x,A0)ë + b
![Page 51: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/51.jpg)
min(ë,b)∈Rm+1
21(ë0ë+ b2)
+ 2CPi=1
m
p2ï(K(Ai,A0)u+ bà yi, ë)+ 2
CPi=1
m
K (A i, A0)ë + b à y i| |2ï
Nonlinear Smooth Support Vector -insensitive Regressionï
![Page 52: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/52.jpg)
Training set and testing set (Slice methodSlice method)Gaussian kernelGaussian kernel is used to generate nonlinear -SVR in all experimentsReduced kernel techniqueReduced kernel technique is utilized when training dataset is bigger then 1000Error measure : 2-norm relative error
Numerical Results
ï
yk k 2
yà yêk k 2 : observations: predicted valuesyê
y
![Page 53: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/53.jpg)
f(x) = 0.5 ã sinc(ù10x)+noise
Noise: mean=0
x ∈ [à 1, 1], 101 points
û = 0.04
Parameter:÷ = 50, ö = 5, ε = 0.02
Training time : 0.3 sec.
101 Data Points inRâR
Nonlinear SSVR with Kernel: expàö||xiàxj||22
![Page 54: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/54.jpg)
First Artificial Dataset
f(x) = 0.5 ãù30x
sinc( ù30x)
+ ú ú random noise with mean=0,standard deviation 0.04
Training Time : 0.016 sec.Error : 0.059
Training Time : 0.015 sec.Error : 0.068
ï - SSVR LIBSVM
![Page 55: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/55.jpg)
Original Function
Noise : mean=0 , û = 0.4
Parameter : ÷ = 50, ö = 1, ε = 0.5
Training time : 9.61 sec.Mean Absolute Error (MAE) of 49x49 mesh points : 0.1761
Estimated Function
481 Data Points in R2 â R
![Page 56: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/56.jpg)
Noise : mean=0 , û = 0.4
Estimated Function Original Function
Using Reduced Kernel: K(A,A 0) ∈ R28900â300
Parameter :
Training time : 22.58 sec.MAE of 49x49 mesh points : 0.0513
C = 10000, ö = 1, ï = 0.2
![Page 57: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/57.jpg)
Real Datasets
![Page 58: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/58.jpg)
Linear -SSVRTenfold Numerical Result
ï
![Page 59: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/59.jpg)
Nonlinear -SSVRTenfold Numerical Result 1/2
ï
![Page 60: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/60.jpg)
Nonlinear -SSVRTenfold Numerical Result 2/2
ï
![Page 61: Smooth Support Vector Machines for Classification and ... · Smooth Support Vector Machines for Classification and Regression Yuh-Jye Lee National Taiwan University of Science and](https://reader030.vdocuments.site/reader030/viewer/2022040214/5ead37a47e60962a51156cab/html5/thumbnails/61.jpg)
We introduced SVMs for classification and regression.SVMs can be extended from linear to nonlinear by using kernel trick.
We applied smooth technique to SVMs to propose smooth SVMs for classification and regression.
We also described the Newton-Armijo algorithm to solve smooth SVMswhich has been shown convergent globally and quadratically in finite steps to the solution.
The numerical results show the effectiveness and correctness of linear and nonlinear smooth SVMs.
Our smooth SVMs formulation is only need to solve a system of linear equations iteratively instead of solving a convex quadratic problem.
Conclusion