data mining - svmricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · dr. jean-michel richer data...
TRANSCRIPT
![Page 1: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/1.jpg)
Data Mining - SVM
Dr. Jean-Michel RICHER
Dr. Jean-Michel RICHER Data Mining - SVM 1 / 55
![Page 2: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/2.jpg)
Outline
1. Introduction
2. Linear regression
3. Support Vector Machine
4. Principle
5. Mathematical point of view
6. Primal and Dual formulation
7. Example
Dr. Jean-Michel RICHER Data Mining - SVM 2 / 55
![Page 3: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/3.jpg)
1. Introduction
Dr. Jean-Michel RICHER Data Mining - SVM 3 / 55
![Page 4: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/4.jpg)
What we will cover
What we will coverremainder of linear regressionSupport Vector Machine
I principleI primal and dual formulationI soft margin and slack variablesI kernelI iris dataset in python
Dr. Jean-Michel RICHER Data Mining - SVM 4 / 55
![Page 5: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/5.jpg)
2. Linear regression - aremainder
Dr. Jean-Michel RICHER Data Mining - SVM 5 / 55
![Page 6: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/6.jpg)
Linear regression principle
PrincipleGiven a set of individuals X = {x1, x2, . . . , xn} where eachxi = (x1
i , . . . , xdi ) and a vector y of size n:
find a function f (x) = w .x + w0 = y + ε such that the yiare close to f (x)
X y f (xi)
x11 . . . xd
1 y1 y ′i...
. . ....
......
x1n . . . xd
n yn y ′n︸ ︷︷ ︸input data
Dr. Jean-Michel RICHER Data Mining - SVM 6 / 55
![Page 7: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/7.jpg)
Linear regression principle - Example
Consider the following example where n = 5 and d = 1:
X y1.00 1.002.00 2.003.00 1.304.00 3.755.00 2.25
Dr. Jean-Michel RICHER Data Mining - SVM 7 / 55
![Page 8: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/8.jpg)
Linear regression principle - Example
Consider the following example where n = 5 and d = 1:
In linear regression, theobservations (color points)are assumed to be theresult of random deviations(green) from an underlyingrelationship (black)
Dr. Jean-Michel RICHER Data Mining - SVM 8 / 55
![Page 9: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/9.jpg)
Linear regression principle
FormulaOrdinary least squares (OLS) is the simplest and most com-mon estimator that minimizes the sum of squared residuals
RSS =n∑
i=1
(yi − f (xi))2
we can obtain w :
w = (X T .X)−1)X T y = (∑
(xixTi ))−1(
∑xiyi)
Dr. Jean-Michel RICHER Data Mining - SVM 9 / 55
![Page 10: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/10.jpg)
Linear regression principle
FormulaIf d = 1, given
X the mean of X , y the mean of y ,σX , σY the standard deviation of X and y
R the correlation between X and Ywe can compute
the slope w1 = R × σYσX
the intercept w0 = y −w1 × X
Dr. Jean-Michel RICHER Data Mining - SVM 10 / 55
![Page 11: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/11.jpg)
Linear regression - slope and intercept
With our example:
X y σX σY R
3 2.06 1.581 1.072 0.627
w1 = (0.627)(1.072)/1.581 = 0.425w0 = 2.06− (0.425)(3) = 0.785
Dr. Jean-Michel RICHER Data Mining - SVM 11 / 55
![Page 12: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/12.jpg)
Linear regression - prediction
y ′ = f (xi) = 0.425× xi + 0.785
X y y ′ y − y ′ (y − y ′)2
1.00 1.00 1.210 -0.210 0.0442.00 2.00 1.635 0.365 0.1333.00 1.30 2.060 -0.760 0.5784.00 3.75 2.485 1.265 1.6005.00 2.25 2.910 -0.660 0.436
So the total sum of errors is 2.791
Dr. Jean-Michel RICHER Data Mining - SVM 12 / 55
![Page 13: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/13.jpg)
Linear regression - but
So linear regression is good, but, see Anscombe’s quartet
Dr. Jean-Michel RICHER Data Mining - SVM 13 / 55
![Page 14: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/14.jpg)
Linear regression principle
Anscombe’s quartet (Francis Anscombe, 1973)four datasets that have nearly identical simple descriptivestatistics, yet appear very different when graphed
Mean of x 9variance of x 11
Mean of y 7.50variance of y 4.125Correlation 0.816
Linear regression line y = 3.00 + 0.500xCoefficient of determination 0.67
often used to illustrate the importance of looking at a setof data graphically before starting to analyze it
Dr. Jean-Michel RICHER Data Mining - SVM 14 / 55
![Page 15: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/15.jpg)
Linear regression principle
Anscombe’s quartet
Dr. Jean-Michel RICHER Data Mining - SVM 15 / 55
![Page 16: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/16.jpg)
3. Support Vector Machine
Dr. Jean-Michel RICHER Data Mining - SVM 16 / 55
![Page 17: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/17.jpg)
SVM principle
SVM PrincipleSVM are called large-margin classifier:
separate two classes or morethey look for an hyperplane with a large (wide)margin in order to make it:
I less sensitive to perturbationsI more stable for prediction
the initial primal formulation is transformed into a dualformulation easier to solveslack variables help for misclassified elementsnon linear examples can be transformed into linearexamples using a kernel function
Dr. Jean-Michel RICHER Data Mining - SVM 17 / 55
![Page 18: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/18.jpg)
SVM principle
Large margin principleSeparate students in a classroom:
class 1: students close to the windowclass 2: students close to the door
middle of tables middle of the room
Dr. Jean-Michel RICHER Data Mining - SVM 18 / 55
![Page 19: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/19.jpg)
SVM
PrincipleGiven a set of individuals X = {x1, x2, . . . , xn} where eachxi = (x1
i , . . . , xdi ) and a vector y of size n of classes {−1,+1}:
find a function f (x) = w .x + w0 such that{if yi = +1, f (xi) ≥ +1if yi = −1, f (xi) ≤ −1
or simply:yi × f (xi) ≥ 1
f (x) = 0 is the hyperplane (or separation line)
Dr. Jean-Michel RICHER Data Mining - SVM 19 / 55
![Page 20: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/20.jpg)
SVM - Graphically explained
w
f(x) = +1
f(x) = 0
f(x) = −1
positiveexamples
negativeexamples
margin
supportvectors
1||w ||
d(xi) =w xi + w0
||w ||
Dr. Jean-Michel RICHER Data Mining - SVM 20 / 55
![Page 21: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/21.jpg)
SVM
Important !
Support Vectorsindividuals (objects) closest to the separatinghyperplanevery few compared to n
Aim of SVMorientate the hyperplane in order it is as far as possible fromthe SV of both classes
Dr. Jean-Michel RICHER Data Mining - SVM 21 / 55
![Page 22: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/22.jpg)
SVM on a mathematical point of view
FormulationMax 2
||w ||
subject to yi(w xi + w0) ≥ 1, ∀i(1)
or Min 12 ||w ||
2
subject to yi(w xi + w0)− 1 ≥ 0, ∀i(2)
Minimizing ||w || is equivalent to minimizing 1/2||w ||2 but inthis case we can perform Quadratic Programming (QP)optimization
Dr. Jean-Michel RICHER Data Mining - SVM 22 / 55
![Page 23: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/23.jpg)
SVM on a mathematical point of view
Quadratic programming (QP)is the process of solving a special type ofmathematical optimization problemoptimize (minimize or maximize) a quadratic function(x2)subject to linear constraints
Dr. Jean-Michel RICHER Data Mining - SVM 23 / 55
![Page 24: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/24.jpg)
SVM on a mathematical point of view
how to solve the problem ?the problem of finding the optimal hyper plane is anoptimization problem and can be solved byoptimization techniquesthe problem is difficult to solve this way but we canrewrite itwe use Lagrange multipliers to get this problem into aform that can be solved analytically.
Dr. Jean-Michel RICHER Data Mining - SVM 24 / 55
![Page 25: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/25.jpg)
SVM on a mathematical point of view
Joseph-Louis Lagrangeborn Guiseppe LodovicoLagrangia (1736 - 1813) was afranco-italian mathematician andastronomermade significant contributions tothe fields of analysis, numbertheory, and both classical andcelestial mechanicsin 1787, at age 51, moved fromBerlin to Paris and became amember of the French Academyof Sciencesremained in France until the end ofhis life
Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55
![Page 26: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/26.jpg)
Lagrangian multipliers
Principleyou want to minimize or maximize f (x) subject tog(x) = 0under certain conditions (quadratic, linearconstraints)define the function
L(x , α) = f (x) + αg(x)
where α ≥ 0 ∈ R is called the lagrangian multiplier
Dr. Jean-Michel RICHER Data Mining - SVM 26 / 55
![Page 27: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/27.jpg)
Lagrangian multipliers
Resolutiona solution of L(x , α) is a point of gradient 0so compute and solve
∂L(x ,α)∂x = 0
∂L(x ,α)∂α = g(x) = 0
or reuse in L(x , α)
Dr. Jean-Michel RICHER Data Mining - SVM 27 / 55
![Page 28: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/28.jpg)
Method of Lagrange - example
Statement of the exampleSuppose you want to put a fence around some fieldwhich as a form of a rectangle (x , y) and you want tomaximize the area knowing that you have P meters offence: {
Max x × ysuch that P = 2x + 2y
then f (x , y) = xy and g(x) = P − 2x − 2y = 0{Max xy
such that P − 2x − 2y = 0
Dr. Jean-Michel RICHER Data Mining - SVM 28 / 55
![Page 29: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/29.jpg)
Method of Lagrange - example
Lagrange formulation
L(x , y , α) = xy + α(P − 2x − 2y)
the derivatives give us
∂L(x , y , α)
∂x= y − 2α = 0
∂L(x , y , α)
∂y= x − 2α = 0
∂L(x , y , α)
∂α= P − 2x − 2y = 0
Dr. Jean-Michel RICHER Data Mining - SVM 29 / 55
![Page 30: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/30.jpg)
Method of Lagrange - example
Resolutionthe first two constraints give us y = 2α = x , so x = y
in other words, the area is a squareand the last one that P = 4x = 4y
consequently α = P/8 because α = x/2 = y/2
Dr. Jean-Michel RICHER Data Mining - SVM 30 / 55
![Page 31: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/31.jpg)
Back to the SVM formulation
Primal formulation
Min 12 ||w ||
2
such that yi(wxi + w0)− 1 ≥ 0, ∀i
Lagrangian of the primal
LP(w ,w0, α) =12||w ||2 − α
[∑i
yi(wxi + w0)− 1
]with α ≥ 0
Dr. Jean-Michel RICHER Data Mining - SVM 31 / 55
![Page 32: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/32.jpg)
Back to the SVM formulation
Lagrangian of the primal
LP(w ,w0, α) =12||w ||2 − α
[∑i
yi(wxi + w0)− 1
]
LP(w ,w0, α) =12||w ||2 −
∑j
αj
[∑i
yi(wxi + w0)− 1
]
LP(w ,w0, α) =12||w ||2 −
∑i
∑j
αiyi(wxi + w0) +n∑j
αj (3)
Dr. Jean-Michel RICHER Data Mining - SVM 32 / 55
![Page 33: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/33.jpg)
Back to the SVM formulation
Lagrangian of the primalwe want to find w and w0 which minimizes and α that max-imizes (3):
∂LP(w ,w0, α)
∂w= w −
∑i
αiyixi = 0 (4)
∂LP(w ,w0, α)
∂w0=∑
i
αiyi = 0 (5)
Dr. Jean-Michel RICHER Data Mining - SVM 33 / 55
![Page 34: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/34.jpg)
Back to the SVM formulation
Dual formulationBy substituting (4) and (5) into (3), we get:
Maxα
LD(α) =∑
j αj − 12∑
i,j αiαjyiyjxixj
such that αi ≥ 0, ∀i∑i αiyi = 0
or Maxα
LD(α) =∑
j αj − 12∑
i,j αiHijαj
such that αi ≥ 0, ∀i∑i αiyi = 0
with Hij = yiyjxixj
(6)
Need only the dot product of x
Dr. Jean-Michel RICHER Data Mining - SVM 34 / 55
![Page 35: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/35.jpg)
Back to the SVM formulation
Dual resolutionfrom (6) with a QP solver we can find α and thus w
w0 can be obtained from the Support Vectors
w0 = ys −∑m∈S
αmymxmxs
where S is the set of indices of SV, such that αi > 0
Dr. Jean-Michel RICHER Data Mining - SVM 35 / 55
![Page 36: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/36.jpg)
Extension of SVMs
What ifthe data are not classified to be linearly separable ?
I we use slack variableswe don’t have a linear support ?
I we use a kernel function
Dr. Jean-Michel RICHER Data Mining - SVM 36 / 55
![Page 37: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/37.jpg)
Slack variable
Not fully linearly separablewe need to relax the constraintsintroduce slack variables to keep a wide margin
w xi + w0 ≥ +1− ξi for yi = +1w xi + w0 ≤ −1 + ξi for yi = −1ξi ≥ 0, ∀i
Dr. Jean-Michel RICHER Data Mining - SVM 37 / 55
![Page 38: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/38.jpg)
Slack variable
positiveexamples
negativeexamples
Violates the large margin principle !
Dr. Jean-Michel RICHER Data Mining - SVM 38 / 55
![Page 39: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/39.jpg)
Slack variable
w
f(x) = +1
f(x) = 0
f(x) = −1
positiveexamples
negativeexamples
margin1||w ||
ξ||w ||
Dr. Jean-Michel RICHER Data Mining - SVM 39 / 55
![Page 40: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/40.jpg)
Slack variable
Formulation with slack variablesWe have a soft margin: Min 1
2 ||w ||2 + C
∑ni=1 ξi
subject to yi(w xi + w0)− 1 + ξi ≥ 0, ∀i(7)
The parameter C controls the trade-off between the slackvariable penalty and the size of the margin
Dr. Jean-Michel RICHER Data Mining - SVM 40 / 55
![Page 41: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/41.jpg)
Kernels
Why a kernel ?Many problems are not linearly separable in thespace of the input X
k(xi , xj) = φ(xi)φ(xj)
Dr. Jean-Michel RICHER Data Mining - SVM 41 / 55
![Page 42: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/42.jpg)
Kernel
A problem where individuals are not linearly separable inX with d = 2
Dr. Jean-Michel RICHER Data Mining - SVM 42 / 55
![Page 43: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/43.jpg)
Kernel
But individuals are linearly separable in X + y (consider that0 on z represents the class of the negative individuals)
or use polar coordinates (r , θ)
Dr. Jean-Michel RICHER Data Mining - SVM 43 / 55
![Page 44: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/44.jpg)
SVM in Python
Import svm from sklearn package:
from sklearn import svm
svc = svm.SVC(kernel='linear', C=1,
gamma='auto').fit(X, y)
You can choose different kernels : linear, poly, rbf, sigmoid
Dr. Jean-Michel RICHER Data Mining - SVM 44 / 55
![Page 45: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/45.jpg)
SVM in Python
Kernel functionslinear: 〈x , x ′〉polynomial: (γ〈x , x ′〉+ r)d
I d is specified by keyword degreeI and r by coef0
rbf (Radial Basis Function): exp(−γ||x − x ′||2)I γ is specified by keyword gamma, must be greater than
0.sigmoid: tanh(γ〈x , x ′〉+ r))
I where r is specified by coef0
Dr. Jean-Michel RICHER Data Mining - SVM 45 / 55
![Page 46: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/46.jpg)
7. Example
Dr. Jean-Michel RICHER Data Mining - SVM 46 / 55
![Page 47: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/47.jpg)
Example in python
IRISwe use the IRIS example of sklearn150 individuals (50 Setosa, 50 Versicolour, 50 Virginica)use the SVC or SVR implementation
Dr. Jean-Michel RICHER Data Mining - SVM 47 / 55
![Page 48: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/48.jpg)
Example in python
IRISfrom sklearn import svm
svc = svm.SVC(kernel='linear', C=1, gamma='auto')
# svc = svm.SVC(kernel='rbf', C=100, gamma=100)
# svc = svm.SVR(kernel='rbf', C=1, gamma=1)
svc.fit(X, y)
y_predict = svc.predict(X)
Dr. Jean-Michel RICHER Data Mining - SVM 48 / 55
![Page 49: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/49.jpg)
Example in python - Linear SVM
Linear svm, C = 1
7 mispredicted
Dr. Jean-Michel RICHER Data Mining - SVM 49 / 55
![Page 50: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/50.jpg)
Example in python - RBF SVM
RBF C = 1
6 mispredicted
Dr. Jean-Michel RICHER Data Mining - SVM 50 / 55
![Page 51: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/51.jpg)
Example in python - RBF SVM
RBF C = 100
6 mispredicted
Dr. Jean-Michel RICHER Data Mining - SVM 51 / 55
![Page 52: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/52.jpg)
Example in python - RBF SVM
RBF C = 100, γ = 100
1 mispredicted
Dr. Jean-Michel RICHER Data Mining - SVM 52 / 55
![Page 53: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/53.jpg)
Example in python - WEKA
WEKAuse LibSVM (install it using package manager in Tools)use default parameters (C-SVC, rbf)set C (cost) to 10 and gamma to 10in order to get 150 correctly classified instances
Dr. Jean-Michel RICHER Data Mining - SVM 53 / 55
![Page 54: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/54.jpg)
7. End
Dr. Jean-Michel RICHER Data Mining - SVM 54 / 55
![Page 55: Data Mining - SVMricher/dm/data_mining_5_svm.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - SVM 25 / 55. Lagrangian multipliers Principle you want to minimize or maximize](https://reader036.vdocuments.site/reader036/viewer/2022071611/614ab62a12c9616cbc6997bf/html5/thumbnails/55.jpg)
University of Angers - Faculty of Sciences
UA - Angers2 Boulevard Lavoisier 49045 AngersCedex 01Tel: (+33) (0)2-41-73-50-72
Dr. Jean-Michel RICHER Data Mining - SVM 55 / 55