Download - Support Vector Machines
![Page 1: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/1.jpg)
Support Vector Machines
Refer to Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials
![Page 2: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/2.jpg)
Support Vector Machines: Slide 2
Linear Classifiers
f x
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x - b)
How would you classify this data?
![Page 3: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/3.jpg)
Support Vector Machines: Slide 3
Linear Classifiers
f x
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x - b)
How would you classify this data?
![Page 4: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/4.jpg)
Support Vector Machines: Slide 4
Linear Classifiers
f x
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x - b)
How would you classify this data?
![Page 5: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/5.jpg)
Support Vector Machines: Slide 5
Linear Classifiers
f x
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x - b)
How would you classify this data?
![Page 6: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/6.jpg)
Support Vector Machines: Slide 6
Linear Classifiers
f x
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x - b)
Any of these would be fine..
..but which is best?
![Page 7: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/7.jpg)
Support Vector Machines: Slide 7
Classifier Margin
f x
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x - b)
Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.
![Page 8: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/8.jpg)
Support Vector Machines: Slide 8
Maximum Margin
f x
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x - b)
The maximum margin linear classifier is the linear classifier with the, um, maximum margin.
This is the simplest kind of SVM (Called an LSVM)Linear SVM
![Page 9: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/9.jpg)
Support Vector Machines: Slide 9
Maximum Margin
f x
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x - b)
The maximum margin linear classifier is the linear classifier with the, um, maximum margin.
This is the simplest kind of SVM (Called an LSVM)
Support Vectors are those datapoints that the margin pushes up against
Linear SVM
![Page 10: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/10.jpg)
Support Vector Machines: Slide 10
Why Maximum Margin?
denotes +1
denotes -1
f(x,w,b) = sign(w. x - b)
The maximum margin linear classifier is the linear classifier with the, um, maximum margin.
This is the simplest kind of SVM (Called an LSVM)
Support Vectors are those datapoints that the margin pushes up against
1. Intuitively this feels safest.
2. If we’ve made a small error in the location of the boundary (it’s been jolted in its perpendicular direction) this gives us least chance of causing a misclassification.
3. LOOCV is easy since the model is immune to removal of any non-support-vector datapoints.
4. There’s some theory (using VC dimension) that is related to (but not the same as) the proposition that this is a good thing.
5. Empirically it works very very well.
![Page 11: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/11.jpg)
Support Vector Machines: Slide 11
Estimate the Margin
• What is the distance expression for a point x to a line wx+b= 0?
denotes +1
denotes -1 x wx +b = 0
![Page 12: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/12.jpg)
Support Vector Machines: Slide 12
Estimate the Margin
• What is the expression for margin?
denotes +1
denotes -1 wx +b = 0
Margin
![Page 13: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/13.jpg)
Support Vector Machines: Slide 13
Maximize Margindenotes
+1
denotes -1wx +b =
0
Margin
![Page 14: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/14.jpg)
Support Vector Machines: Slide 14
Maximize Margindenotes
+1
denotes -1wx +b =
0
Margin
• Min-max problem game problem
![Page 15: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/15.jpg)
Support Vector Machines: Slide 15
Maximize Margindenotes
+1
denotes -1wx +b =
0
Margin
Strategy:
![Page 16: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/16.jpg)
Support Vector Machines: Slide 16
Maximum Margin Linear Classifier
• How to solve it?
![Page 17: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/17.jpg)
Support Vector Machines: Slide 17
Learning via Quadratic Programming
• QP is a well-studied class of optimization algorithms to maximize a quadratic function of some real-valued variables subject to linear constraints.
![Page 18: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/18.jpg)
Support Vector Machines: Slide 18
Quadratic Programming
2maxarg
uuud
u
Rc
TT Find
nmnmnn
mm
mm
buauaua
buauaua
buauaua
...
:
...
...
2211
22222121
11212111
)()(22)(11)(
)2()2(22)2(11)2(
)1()1(22)1(11)1(
...
:
...
...
enmmenenen
nmmnnn
nmmnnn
buauaua
buauaua
buauaua
And subject to
n additional linear inequality constraints
e a
dd
ition
al
linear
eq
uality
co
nstra
ints
Quadratic criterion
Subject to
![Page 19: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/19.jpg)
Support Vector Machines: Slide 19
Quadratic Programming
2maxarg
uuud
u
Rc
TT Find
Subject to
nmnmnn
mm
mm
buauaua
buauaua
buauaua
...
:
...
...
2211
22222121
11212111
)()(22)(11)(
)2()2(22)2(11)2(
)1()1(22)1(11)1(
...
:
...
...
enmmenenen
nmmnnn
nmmnnn
buauaua
buauaua
buauaua
And subject to
n additional linear inequality constraints
e a
dd
ition
al
linear
eq
uality
co
nstra
ints
Quadratic criterion
There exist algorithms for
finding such constrained
quadratic optima much
more efficiently and
reliably than gradient
ascent.
(But they are very fiddly…you
probably don’t want to
write one yourself)
![Page 20: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/20.jpg)
Support Vector Machines: Slide 20
Uh-oh!
denotes +1
denotes -1
This is going to be a problem!
What should we do?
![Page 21: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/21.jpg)
Support Vector Machines: Slide 21
Uh-oh!
denotes +1
denotes -1
This is going to be a problem!
What should we do?
Idea 1:
Find minimum w.w, while minimizing number of training set errors.
Problemette: Two things to minimize makes for an ill-defined optimization
![Page 22: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/22.jpg)
Support Vector Machines: Slide 22
Uh-oh!
denotes +1
denotes -1
This is going to be a problem!
What should we do?
Idea 1.1:
Minimize
w.w + C (#train errors)
There’s a serious practical problem that’s about to make us reject this approach. Can you guess what it is?
Tradeoff parameter
![Page 23: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/23.jpg)
Support Vector Machines: Slide 23
Uh-oh!
denotes +1
denotes -1
This is going to be a problem!
What should we do?
Idea 1.1:
Minimize
w.w + C (#train errors)
There’s a serious practical problem that’s about to make us reject this approach. Can you guess what it is?
Tradeoff parameterCan’t be expressed as a Quadratic
Programming problem.
Solving it may be too slow.
(Also, doesn’t distinguish between disastrous errors and near
misses)
So… any
other
ideas?
![Page 24: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/24.jpg)
Support Vector Machines: Slide 24
Uh-oh!
denotes +1
denotes -1
This is going to be a problem!
What should we do?
Idea 2.0:
Minimize w.w + C (distance of error points to their correct place)
![Page 25: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/25.jpg)
Support Vector Machines: Slide 25
Support Vector Machine (SVM) for Noisy Data
• Any problem with the above formulism?
denotes +1
denotes -1
1
2
3
![Page 26: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/26.jpg)
Support Vector Machines: Slide 26
Support Vector Machine (SVM) for Noisy Data
• Balance the trade off between margin and classification errors
denotes +1
denotes -1
1
2
3
![Page 27: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/27.jpg)
Support Vector Machines: Slide 27
Support Vector Machine for Noisy Data
How do we determine the appropriate value for c ?
![Page 28: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/28.jpg)
Support Vector Machines: Slide 28
An Equivalent QP: Determine b
A linear programming problem !
Fix w
![Page 29: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/29.jpg)
Support Vector Machines: Slide 29
Suppose we’re in 1-dimension
What would SVMs do with this data?
x=0
![Page 30: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/30.jpg)
Support Vector Machines: Slide 30
Suppose we’re in 1-dimension
Not a big surprise
Positive “plane” Negative “plane”
x=0
![Page 31: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/31.jpg)
Support Vector Machines: Slide 31
Harder 1-dimensional dataset
That’s wiped the smirk off SVM’s face.
What can be done about this?
x=0
![Page 32: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/32.jpg)
Support Vector Machines: Slide 32
Harder 1-dimensional datasetRemember how
permitting non-linear basis functions made linear regression so much nicer?
Let’s permit them here too
x=0 ),( 2kkk xxz
![Page 33: Support Vector Machines](https://reader034.vdocuments.site/reader034/viewer/2022050805/56814c51550346895db963ac/html5/thumbnails/33.jpg)
Support Vector Machines: Slide 33
Harder 1-dimensional datasetRemember how
permitting non-linear basis functions made linear regression so much nicer?
Let’s permit them here too
x=0 ),( 2kkk xxz