support vector machines piyush kumar. perceptrons revisited class 1 : (+1) class 2 : (-1) is this...
TRANSCRIPT
![Page 1: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/1.jpg)
Support Vector Machines
Piyush Kumar
![Page 2: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/2.jpg)
Perceptrons revisited
Class 1 : (+1)
Class 2 : (-1)
Is this unique?
![Page 3: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/3.jpg)
Which one is the best?
• Perceptron outputs :
![Page 4: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/4.jpg)
Perceptrons: What went wrong?
• Slow convergence • Can overfit • Cant do complicated functions
easily• Theoretical guarantees are not as
strong
![Page 5: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/5.jpg)
Perceptron: The first NN
Proposed by Frank Rosenblatt in 1956Neural net researchers accuse
Rosenblatt of promising ‘too much’ Numerous variants Also helps to study LP One of the simplest Neural Network.
![Page 6: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/6.jpg)
The new concept CarFrom Perceptrons to SVMs
• Margins• Linearization • Kernels
![Page 7: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/7.jpg)
Support Vector Machines
Margin
![Page 8: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/8.jpg)
Classification Margin• Distance from example to the separator is • Examples closest to the hyperplane are support vectors. • Margin ρ of the separator is the width of separation
between classes.
w
xwT
r
r
ρ
![Page 9: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/9.jpg)
Support Vector Machines
• Maximizing the margin is good according to intuition and PAC theory.
• Implies that only support vectors are important; other training examples are ignorable.
• Leads to Simple classifiers and hence better? (Simple = large margin)
![Page 10: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/10.jpg)
Let’s start some math…
)),(),...,,(),,{( 2211 mm yxyxyx
0. xw
0. wx j
N samples :
Can we find a hyperplane that separates the two classes?(labeled by y) i.e.
: For all j such that y = +1
0. wx j
: For all j such that y = -1
Where y = +/- 1 are labels for the data. nx R
![Page 11: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/11.jpg)
Further assumption 1Which we will relax later!
Lets assume that the hyperplane that we are looking forpasses thru the origin
![Page 12: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/12.jpg)
Further assumption 2
• Lets assume that we are looking for a halfspace that contains a set of points
Relax now!!
![Page 13: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/13.jpg)
Lets Relax FA 1 now
• “Homogenize” the coordinates by adding a new coordinate to the input.
• Think of it as moving the whole red and blue points in one higher dimension
• From 2D to 3D it is just the x-y plane shifted to z = 1. This takes care of the “bias” or our assumption that the halfspace can pass thru the origin.
![Page 14: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/14.jpg)
Further Assumption 3
• Assume all points on a unit sphere!• If they are not after applying
transformations for FA 1 and FA 2 , make them so.
Relax now!
![Page 15: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/15.jpg)
What did we want?
• Maximize the margin.
• What does it mean in the new space?
![Page 16: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/16.jpg)
What’s the new optimization problem?
• Max |ρ|• subject to
– xi.w >= ρ • (Note that we have gotten rid of the y’s
by mirroring around the origin).• Here w is a unit vector. ||w|| = 1.
![Page 17: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/17.jpg)
Same Problem
• Min 1/ρ
• subject to xi.((1/ρ)w) >= 1
• Let v = (1/ρ) w
• Then the constraint becomes xi.v >= 1.
• Objective = Min 1/ρ = Min || (1/ρ) w || = Min ||v|| is the same as Min ||v||2
![Page 18: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/18.jpg)
New formulation
Min ||v||2 Subject to : v.xi >= 1
Using matlab, this is a piece of cake to solve.Decision boundary sign(w.xi)
Only for support vectors v.xi = 1.
![Page 19: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/19.jpg)
Support Vector Machines
• Linear Learning Machines like perceptrons.
• Map non-linearly to higher dimension to overcome the linearity constraint.
• Select between hyperplanes, Use margin as a test (This is what perceptrons don’t do) From learning theory, maximum margin is good
![Page 20: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/20.jpg)
Another Reformulation
Unlike Perceptrons SVMs have a unique solutionbut are harder to solve.
<QP>
![Page 21: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/21.jpg)
Support Vector Machines
• There are very simple algorithms to solve SVMs ( as simple as perceptrons )
• If you are interested in learning those, come and talk to me. (Out of reach for this course)
![Page 22: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/22.jpg)
Another twist : Linearization
• If the data is separable with say a sphere, how would you use a svm to separate it? (Ellipsoids?)
![Page 23: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/23.jpg)
Linearization a.k.a Feature Expansion
Delaunay!??
Lift the points to a paraboloid in one higher dimension,For instance if the data is in 2D, (x,y) -> (x,y,x2+y2)
![Page 24: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/24.jpg)
Linearization
• Note that replacing x by (x) the decision boundary changes from w.x = 0 to w.(x) = 0
• This helps us get non-linear separators compared to linear separators when is non-linear (as in the last example).
• Another feature expansion example: – (x,y) -> (x^2, xy, y^2, x, y) – What kind of separators are there?
![Page 25: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/25.jpg)
Linearization
• The more features, the more power.• There is a danger of overfitting.• When there are lot of features
(sometimes even infinite), we can use the “kernel trick” to solve the optimization problem faster.
• Lets look back at optimization for a moment again…
![Page 26: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/26.jpg)
Lagrange Multipliers
![Page 27: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/27.jpg)
Lagrangian function
![Page 28: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/28.jpg)
![Page 29: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/29.jpg)
At optimum
![Page 30: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/30.jpg)
More precisely
![Page 31: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/31.jpg)
The optimization Problem Revisited
![Page 32: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/32.jpg)
Removing v
![Page 33: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/33.jpg)
Support Vectors
v is a linear combination of ‘some examples’ or support vectors.More than likely if we see too many support vectors, we are overfitting. Simple and Short classifiers are preferable.
![Page 34: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/34.jpg)
Substitution
![Page 35: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/35.jpg)
Gram Matrix
![Page 36: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/36.jpg)
The decision surface Recovered
![Page 37: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/37.jpg)
What is Gram Matrix reduction good for?
• The Kernel Trick• Even if the number of features is
infinite, G might still be small and hence the optimization problem solvable.
• We could compute G without computing X, at least sometimes (by redefining the dot product in the feature space).
![Page 38: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/38.jpg)
Recall
![Page 39: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/39.jpg)
The kernel Matrix
• The trick that ML community uses for Linearization is to use a function that redefines distances between points.
• Example :
• The optimization problem no longer needs to be explicitly evaluated. As long as we can figure out the distance between two mapped points, its enough.
2|| || / 2( , ) x zK x z e
![Page 40: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/40.jpg)
Example Kernels
![Page 41: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/41.jpg)
The decision Surface?
![Page 42: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/42.jpg)
![Page 43: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/43.jpg)
A demo using libsvm
• Some implementations of SVM – libsvm– svmlight– svmtorch
![Page 44: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/44.jpg)
Checkerboard Dataset
![Page 45: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/45.jpg)
k-Nearest Neighbor Algorithm
![Page 46: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/46.jpg)
LSVM on Checkerboard
![Page 47: Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649e045503460f94af0696/html5/thumbnails/47.jpg)
Conclusions
• SVM is an step towards improving perceptrons
• They use large margin for good genralization• In order to make large feature expansions,
we can use the gram matrix formulation of the optimization problem (or use kernels).
• SVMs are popular classifiers because they achieve good accuracy on real world data.