support vector machines - ucf computer sciencegqi/cap5610/cap5610lecture07.pdf · datapoints that...
TRANSCRIPT
![Page 1: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition](https://reader036.vdocuments.site/reader036/viewer/2022063007/5fb9b8d62567ec34065351e2/html5/thumbnails/1.jpg)
1
CAP 5610: Machine Learning
Instructor: Guo-Jun QI
Support Vector Machines
![Page 2: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition](https://reader036.vdocuments.site/reader036/viewer/2022063007/5fb9b8d62567ec34065351e2/html5/thumbnails/2.jpg)
Linear Classifier
Naive Bayes
Assume each attribute is drawn from Gaussian
distribution with the same variance
Generative model: estimate the mean and
variance with closed-form solution
Logistic regression
Directly maximizing the log likelihood to fit the
model into the training data
Discriminative model: no closed-form solution,
a gradient ascent method is used.
2
![Page 3: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition](https://reader036.vdocuments.site/reader036/viewer/2022063007/5fb9b8d62567ec34065351e2/html5/thumbnails/3.jpg)
Drawback
Lacking of a geometric intuition to explain
what’s a good linear classifier in high
dimensional space.
3
![Page 4: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition](https://reader036.vdocuments.site/reader036/viewer/2022063007/5fb9b8d62567ec34065351e2/html5/thumbnails/4.jpg)
SVM
Supervised learning methods used for
Classification
Regression
A special property: simultaneously
minimize the classification error
maximize the geometric margin
maximum margin classifier
Excellent theory and good performance
4
![Page 5: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition](https://reader036.vdocuments.site/reader036/viewer/2022063007/5fb9b8d62567ec34065351e2/html5/thumbnails/5.jpg)
Outline
Linear SVM – hard margin
Linear SVM – soft margin
Non-linear SVM
Application
5
![Page 6: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition](https://reader036.vdocuments.site/reader036/viewer/2022063007/5fb9b8d62567ec34065351e2/html5/thumbnails/6.jpg)
Outline
Linear SVM – hard margin
Linear SVM – soft margin
Non-linear SVM
Application
6
![Page 7: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition](https://reader036.vdocuments.site/reader036/viewer/2022063007/5fb9b8d62567ec34065351e2/html5/thumbnails/7.jpg)
Linear Classifiersf x y
denotes +1
denotes -1
f(x,w,b) = sign(w x + b)
How would you classify this data?
w x + b<0
w x + b>0
Label y:
Parameters
![Page 8: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition](https://reader036.vdocuments.site/reader036/viewer/2022063007/5fb9b8d62567ec34065351e2/html5/thumbnails/8.jpg)
Linear Classifiersf x yest
denotes +1
denotes -1
f(x,w,b) = sign(w x + b)
How would you classify this data?
Parameters
![Page 9: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition](https://reader036.vdocuments.site/reader036/viewer/2022063007/5fb9b8d62567ec34065351e2/html5/thumbnails/9.jpg)
Linear Classifiersf x yest
denotes +1
denotes -1
f(x,w,b) = sign(w x + b)
How would you classify this data?
Parameters
![Page 10: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition](https://reader036.vdocuments.site/reader036/viewer/2022063007/5fb9b8d62567ec34065351e2/html5/thumbnails/10.jpg)
Linear Classifiersf x yest
denotes +1
denotes -1
f(x,w,b) = sign(w x + b)
Any of these would be fine..
..but which is best?
Parameters
![Page 11: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition](https://reader036.vdocuments.site/reader036/viewer/2022063007/5fb9b8d62567ec34065351e2/html5/thumbnails/11.jpg)
Linear Classifiersf x yest
denotes +1
denotes -1
f(x,w,b) = sign(w x + b)
How would you classify this data?
Misclassified
to +1 class
Parameters
![Page 12: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition](https://reader036.vdocuments.site/reader036/viewer/2022063007/5fb9b8d62567ec34065351e2/html5/thumbnails/12.jpg)
Classifier Marginf x yest
denotes +1
denotes -1 Define the marginof a linear classifier as the width that the boundary could be increased by before hitting a data point.
Classifier Marginf x yest
denotes +1
denotes -1
f(x,w,b) = sign(w x + b)
Parameters
![Page 13: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition](https://reader036.vdocuments.site/reader036/viewer/2022063007/5fb9b8d62567ec34065351e2/html5/thumbnails/13.jpg)
Maximum Marginf x yest
denotes +1
denotes -1
f(x,w,b) = sign(w x + b)
The maximum margin linear classifier is the linear classifier with the maximum margin.
This is the simplest kind of SVM (Called an LSVM)
Linear SVM
Support Vectors are those datapoints that the margin pushes up against
1. Maximizing the margin makes sense according to intuition
2. Implies that only support vectors are important; other training examples can be discarded without affecting the training result.
Parameters
![Page 14: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition](https://reader036.vdocuments.site/reader036/viewer/2022063007/5fb9b8d62567ec34065351e2/html5/thumbnails/14.jpg)
Maximum Marginf x yest
denotes +1
denotes -1
f(x,w,b) = sign(w x + b)
The maximum margin linear classifier is the linear classifier with the maximum margin.
This is the simplest kind of SVM (Called an LSVM)
Linear SVM
Keeping only support vectors will not change the maximum margin classifier.
Robust to the small changes (noises) in non-support vectors
Parameters
![Page 15: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition](https://reader036.vdocuments.site/reader036/viewer/2022063007/5fb9b8d62567ec34065351e2/html5/thumbnails/15.jpg)
Basics to SVM math
15
w/|w|w/|w|:
Perpendicular to line wx+b=0
Unit length
Margin of two parallel lines is
where
x1
x2
||
||
||
|| 2121
w
bb
w
xxw
w/|w|
1 1
1 2 2 1
2 2
0( )
0
wx bw x x b b
wx b
![Page 16: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition](https://reader036.vdocuments.site/reader036/viewer/2022063007/5fb9b8d62567ec34065351e2/html5/thumbnails/16.jpg)
Decision rule:
Positive examples: w . x+ + b > +1
Negative examples: w . x- + b < -1
Subtracting two equations: w . (x+-x-) = 2
x-
x+
Linear SVM Mathematically
![Page 17: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition](https://reader036.vdocuments.site/reader036/viewer/2022063007/5fb9b8d62567ec34065351e2/html5/thumbnails/17.jpg)
What we know:
w . x+ + b = +1
w . x- + b = -1
w . (x+-x-) = 2
x-
x+
ww
wxxM
2)(
M=Margin Width
Linear SVM Mathematically
![Page 18: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition](https://reader036.vdocuments.site/reader036/viewer/2022063007/5fb9b8d62567ec34065351e2/html5/thumbnails/18.jpg)
Linear SVM Mathematically
Goal: 1) Correctly classify all training data
if yi = +1
if yi = -1
for all i
2) Maximize the Margin
same as minimize
We can formulate a Quadratic Optimization Problem and solve for w and b
Minimize
subject to
wM
2
www t
2
1)(
1bwxi
1bwxi
1)( bwxy ii
1)( bwxy ii
i
wwt
2
1
![Page 19: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition](https://reader036.vdocuments.site/reader036/viewer/2022063007/5fb9b8d62567ec34065351e2/html5/thumbnails/19.jpg)
Solving the Optimization Problem
Need to optimize a quadratic function subject to linear
constraints.
Use Lagrange multiplier. αi is associated with every
constraint : , dual problem
Find α1…αN such that
Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi
1)( bwxy ii
Refer: Christopher J. C. Burges: A Tutorial on Support Vector Machines for
Pattern Recognition, Data Mining and Knowledge Discovery, 1998
![Page 20: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition](https://reader036.vdocuments.site/reader036/viewer/2022063007/5fb9b8d62567ec34065351e2/html5/thumbnails/20.jpg)
The Optimization Problem Solution
The solution has the form:
αi must satisfy Karush-Kuhn-Tucker conditions:
αi [yi(wTxi+b)-1]=0, for any i
If αi >0, yi(wTxi+b)-1=0, xi is on the margin
If yi(wTxi+b)>1, αi =0
Each non-zero αi indicates that corresponding xi
is a support vector.
w =Σαiyixi b= yk- wTxkfor any xk such that αk 0
![Page 21: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition](https://reader036.vdocuments.site/reader036/viewer/2022063007/5fb9b8d62567ec34065351e2/html5/thumbnails/21.jpg)
Maximum Margin
denotes +1
denotes -1
w, b depends only on Support Vectors via active constraints
yi(wTxk+b)-1=0
![Page 22: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition](https://reader036.vdocuments.site/reader036/viewer/2022063007/5fb9b8d62567ec34065351e2/html5/thumbnails/22.jpg)
The Optimization Problem Solution
To classify the new test point x, we use
f(x) = wx + b =ΣαiyixiTx + b
Find α1…αN such that
Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi