uva cs 4501: machine learning lecture 16: support vector ...q support vector machine (svm) ü...
TRANSCRIPT
![Page 1: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/1.jpg)
UVACS4501:MachineLearning
Lecture16:SupportVectorMachine
Dr.Yanjun Qi
UniversityofVirginiaDepartmentofComputerScience
![Page 2: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/2.jpg)
Wherearewe?èFivemajorsectionsofthiscourse
q Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheoryq Graphicalmodels
4/3/18 2
Dr.YanjunQi/UVA
![Page 3: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/3.jpg)
Today
qSupportVectorMachine(SVM)ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Optimizationtolearnmodelparameters(w,b)ü LinearlyNon-separablecaseü Optimizationwithdualformü Nonlineardecisionboundaryü MulticlassSVM
4/3/18 3
Dr.YanjunQi/UVA
![Page 4: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/4.jpg)
HistoryofSVM• SVMisinspiredfromstatisticallearningtheory[3]• SVMwasfirstintroducedin1992[1]
• SVMbecomespopularbecauseofitssuccessinhandwrittendigitrecognition(1994)– 1.1%testerror rateforSVM.– Thesameastheerrorratesofacarefullyconstructedneuralnetwork,LeNet 4.
• Section5.11in[2]orthediscussionin[3]fordetails
• Regardedasanimportantexampleof“kernelmethods”,arguablythehottestareainmachinelearning20yearsago
[1] B.E. Boser et al. A Training Algorithm for Optimal Margin Classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory 5 144-152, Pittsburgh, 1992.
[2] L. Bottou et al. Comparison of classifier methods: a case study in handwritten digit recognition. Proceedings of the 12th IAPR International Conference on Pattern Recognition, vol. 2, pp. 77-82, 1994.
[3] V. Vapnik. The Nature of Statistical Learning Theory. 2nd edition, Springer, 1999.
4/3/18 4
Dr.YanjunQi/UVA
Theoreticallysound/Impactful
![Page 5: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/5.jpg)
ApplicationsofSVMs
• ComputerVision• TextCategorization• Ranking(e.g.,Googlesearches)• HandwrittenCharacterRecognition• Timeseriesanalysis• Bioinformatics• ……….
àLotsofverysuccessfulapplications!!!
4/3/18
Dr.YanjunQi/UVA
5
![Page 6: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/6.jpg)
Handwrittendigitrecognition
4/3/18
Dr.YanjunQi/UVA
6
![Page 7: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/7.jpg)
ADatasetforbinary
classification
• Data/points/instances/examples/samples/records:[rows]• Features/attributes/dimensions/independent
variables/covariates/predictors/regressors:[columns,exceptthelast]• Target/outcome/response/label/dependentvariable:specialcolumntobe
predicted[lastcolumn]
4/3/18 7
Output as Binary Class:
only two possibilities
Dr.YanjunQi/UVA
![Page 8: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/8.jpg)
AffineHyperplanes
• https://en.wikipedia.org/wiki/Hyperplane• Anyhyperplanecanbegivenin coordinates asthesolutionofasinglelinear(algebraic)equation ofdegree 1.
4/3/18
Dr.YanjunQi/UVA
8
![Page 9: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/9.jpg)
Review:VectorProduct,Orthogonal,andNorm
Fortwovectorsx andy,
xTy
iscalledthe(inner)vectorproduct.
Thesquarerootoftheproductofavectorwithitself,
iscalledthe2-norm (|x|2),canalsowriteas|x|
x andy arecalledorthogonal if
xTy=0
4/3/18 9
Dr.YanjunQi/UVA
![Page 10: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/10.jpg)
Today
qSupportVectorMachine(SVM)ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Optimizationtolearnmodelparameters(w,b)ü LinearlyNon-separablecaseü Optimizationwithdualformü Nonlineardecisionboundaryü MulticlassSVM
4/3/18 10
Dr.YanjunQi/UVA
![Page 11: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/11.jpg)
LinearClassifiersfx yest
denotes+1denotes-1
Howwouldyouclassifythisdata?
4/3/18 11
Dr.YanjunQi/UVA
![Page 12: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/12.jpg)
LinearClassifiersfx yest
denotes+1denotes-1
Howwouldyouclassifythisdata?
4/3/18 12
Dr.YanjunQi/UVA
![Page 13: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/13.jpg)
LinearClassifiersfx yest
denotes+1denotes-1
Howwouldyouclassifythisdata?
4/3/18 13
Dr.YanjunQi/UVA
![Page 14: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/14.jpg)
LinearClassifiersfx yest
denotes+1denotes-1
Howwouldyouclassifythisdata?
4/3/18 14
Dr.YanjunQi/UVA
![Page 15: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/15.jpg)
LinearClassifiersfx yest
denotes+1denotes-1
Anyofthesewouldbefine..
..butwhichisbest?
4/3/18 15
Dr.YanjunQi/UVA
![Page 16: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/16.jpg)
ClassifierMarginfx yest
denotes+1denotes-1 Definethemargin of
alinearclassifierasthewidththattheboundarycouldbeincreasedbybeforehittingadatapoint.
4/3/18 16
Dr.YanjunQi/UVA
![Page 17: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/17.jpg)
MaximumMarginfx yest
denotes+1denotes-1 Themaximum
marginlinearclassifier isthelinearclassifierwiththe,maximummargin.ThisisthesimplestkindofSVM(CalledanLSVM)
LinearSVM4/3/18 17
Dr.YanjunQi/UVA
![Page 18: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/18.jpg)
MaximumMarginfx yest
denotes+1denotes-1 Themaximum
marginlinearclassifier isthelinearclassifierwiththe,maximummargin.ThisisthesimplestkindofSVM(CalledanLSVM)
Support Vectorsarethosedatapointsthatthemarginpushesupagainst
LinearSVM4/3/18 18
Dr.YanjunQi/UVA
x2
x1
![Page 19: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/19.jpg)
MaximumMarginfx yest
denotes+1denotes-1 Themaximum
marginlinearclassifier isthelinearclassifierwiththe,maximummargin.ThisisthesimplestkindofSVM(CalledanLSVM)
Support Vectorsarethosedatapointsthatthemarginpushesupagainst
LinearSVM
f(x,w,b) =sign(wTx +b)
4/3/18 19
Dr.YanjunQi/UVA
x2
x1
![Page 20: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/20.jpg)
Max margin classifiers
• Instead of fitting all points, focus on boundary points
• Learn a boundary that leads to the largest margin from both sets of points
From all the possible boundary lines, this leads to the largest margin on both sides
4/3/18 20
Dr.YanjunQi/UVA
x2
x1
![Page 21: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/21.jpg)
Max margin classifiers
• Instead of fitting all points, focus on boundary points
• Learn a boundary that leads to the largest margin from points on both sides
D
DWhy MAX margin?
• Intuitive, ‘makes sense’
• Some theoretical support
• Works well in practice
4/3/18 21
Dr.YanjunQi/UVA
x2
x1
![Page 22: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/22.jpg)
Max margin classifiers
• Instead of fitting all points, focus on boundary points
• Learn a boundary that leads to the largest margin from points on both sides
D
DThat is why Also known as linear support vector machines (SVMs)
These are the vectors supporting the boundary
4/3/18 22
Dr.YanjunQi/UVA
x2
x1
![Page 23: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/23.jpg)
Max-margin&DecisionBoundary
Class -1
Class 1
4/3/18
Dr.YanjunQi/UVA
23
W is a p-dim vector; b is a
scalar
x2
x1
![Page 24: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/24.jpg)
Max-margin&DecisionBoundary• Thedecisionboundaryshouldbeasfarawayfrom
thedataofbothclassesaspossible
Class -1
Class 1
W is a p-dim vector; b is a
scalar
4/3/18
Dr.YanjunQi/UVA
24
![Page 25: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/25.jpg)
Specifying a max margin classifier
Classify as +1 if wTx+b >= 1
Classify as -1 if wTx+b <= - 1
Undefined if -1 <wTx+b < 1
Class +1 plane
boundary
Class -1 plane
4/3/18 25
Dr.YanjunQi/UVA
![Page 26: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/26.jpg)
Specifying a max margin classifier
Classify as +1 if wTx+b >= 1
Classify as -1 if wTx+b <= - 1
Undefined if -1 <wTx+b < 1
Is the linear separation assumption realistic?
We will deal with this shortly, but lets assume it for now
4/3/18 26
Dr.YanjunQi/UVA
![Page 27: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/27.jpg)
Today
q SupervisedClassificationq SupportVectorMachine(SVM)
ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Optimizationtolearnmodelparameters(w,b)ü LinearlyNon-separablecaseü Optimizationwithdualformü Nonlineardecisionboundaryü MulticlassSVM
4/3/18 27
Dr.YanjunQi/UVA
![Page 28: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/28.jpg)
Maximizing the marginClassify as +1 if wTx+b >= 1Classify as -1 if wTx+b <= - 1Undefined if -1 <wTx+b < 1
• Lets define the width of the margin by M
• How can we encode our goal of maximizing M in terms of our parameters (w and b)?
• Lets start with a few obsevrations
4/3/18 28
Dr.YanjunQi/UVA
![Page 29: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/29.jpg)
MarginM
4/3/18
Dr.YanjunQi/UVA
29
![Page 30: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/30.jpg)
4/3/18
Dr.YanjunQi/UVA
30
• wT x+ + b = +1
• wT x- + b = -1
• M = | x+ - x- | = ?
![Page 31: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/31.jpg)
Maximizing the margin: observation-1
• Observation 1: the vector w is orthogonal to the +1 plane
• Why?
Corollary: the vector w is orthogonal to the -1 plane 4/3/18 31
Dr.YanjunQi/UVA
![Page 32: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/32.jpg)
Maximizing the margin: observation-1
Classify as +1 if wTx+b >= 1Classify as -1 if wTx+b <= - 1Undefined if -1 <wTx+b < 1
• Observation 1: the vector w is orthogonal to the +1 plane
• Why?
Let u and v be two points on the +1 plane, then for the vector defined by u and v we have wT(u-v) = 0
Corollary: the vector w is orthogonal to the -1 plane 4/3/18 32
Dr.YanjunQi/UVA
![Page 33: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/33.jpg)
Maximizing the margin: observation-1
Classify as +1 if wTx+b >= 1Classify as -1 if wTx+b <= - 1Undefined if -1 <wTx+b < 1
• Observation 1: the vector w is orthogonal to the +1 plane
• Why?
Let u and v be two points on the +1 plane, then for the vector defined by u and v we have wT(u-v) = 0
Corollary: the vector w is orthogonal to the -1 plane 4/3/18 33
Dr.YanjunQi/UVA
![Page 34: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/34.jpg)
Observation1è Review: Orthogonal
4/3/18 34
Dr.YanjunQi/UVA
![Page 35: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/35.jpg)
Maximizing the margin: observation-1
• Observation1:thevectorwisorthogonaltothe+1plane
Class 1
Class 2
M
4/3/18
Dr.YanjunQi/UVA
35
![Page 36: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/36.jpg)
Maximizing the margin: observation-2
Classify as +1 if wTx+b >= 1Classify as -1 if wTx+b <= - 1Undefined if -1 <wTx+b < 1
• Observation 1: the vector w is orthogonal to the +1 and -1 planes
• Observation 2: if x+ is a point on the +1 plane and x- is the closest point to x+ on the -1 plane then
x+ = λ w + x-
Since w is orthogonal to both planes we need to ‘travel’ some distance along w to get from x+ to x-
4/3/18 36
Dr.YanjunQi/UVA
![Page 37: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/37.jpg)
4/3/18
Dr.YanjunQi/UVA
37
• wT x+ + b = +1
• wT x- + b = -1
• M = | x+ - x- | = ?
• x+ = λ w + x-
Putting it together
![Page 38: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/38.jpg)
Putting it together
• wT x+ + b = +1
• wT x- + b = -1
• x+ = λ w + x-
• | x+ - x- | = M
We can now define M in terms of w and b
4/3/18 38
Dr.YanjunQi/UVA
![Page 39: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/39.jpg)
4/3/18
Dr.YanjunQi/UVA
39
![Page 40: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/40.jpg)
Putting it together
• wT x+ + b = +1
• wT x- + b = -1
• x+ = λw + x-
• | x+ - x- | = M
We can now define M in terms of w and b
wT x+ + b = +1
=>
wT (λw + x-) + b = +1=>
wTx- + b + λwTw = +1
=>
-1 + λwTw = +1
=>
λ = 2/wTw
4/3/18 40
Dr.YanjunQi/UVA
![Page 41: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/41.jpg)
Putting it together
• wT x+ + b = +1
• wT x- + b = -1
• x+ = λ w + x-
• | x+ - x- | = M
• λ = 2/wTw
We can now define M in terms of w and b
M = |x+ - x-|
=>
=>
M =| λw |= λ | w |= λ wTw
€
M = 2 wTwwTw
=2wTw
4/3/18 41
Dr.YanjunQi/UVA
![Page 42: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/42.jpg)
Finding the optimal parameters
€
M =2wTw
We can now search for the optimal parameters by finding a solution that:
1. Correctly classifies all points
2. Maximizes the margin (or equivalently minimizes wTw)
Several optimization methods can be used: Gradient descent, OR SMO (see extra slides)
4/3/18 42
Dr.YanjunQi/UVA
![Page 43: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/43.jpg)
4/3/18
Dr.YanjunQi/UVA
43
![Page 44: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/44.jpg)
4/3/18
Dr.YanjunQi/UVA
44
Thegradientpointsinthedirection ofthegreatestrateofincreaseofthefunctionanditsmagnitude istheslopeofthegraphinthatdirection
![Page 45: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/45.jpg)
Today
q SupportVectorMachine(SVM)ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Optimizationtolearnmodelparameters(w,b)ü LinearlyNon-separablecaseü Optimizationwithdualformü Nonlineardecisionboundaryü PracticalGuide
4/3/18 45
Dr.YanjunQi/UVA
![Page 46: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/46.jpg)
Optimization Step i.e. learning optimal parameter for SVM
€
M =2wTw
4/3/18 46
Dr.YanjunQi/UVA
![Page 47: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/47.jpg)
Optimization Step i.e. learning optimal parameter for SVM
€
M =2wTw
Min (wTw)/2 subject to the following constraints:
4/3/18 47
Dr.YanjunQi/UVA
![Page 48: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/48.jpg)
Optimization Step i.e. learning optimal parameter for SVM
€
M =2wTw
Min (wTw)/2 subject to the following constraints:
For all x in class + 1
wTx+b >= 1
For all x in class - 1
wTx+b <= -1
} A total of n constraints if we have n training samples
4/3/18 48
Dr.YanjunQi/UVA
![Page 49: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/49.jpg)
Optimization Reformulation
Min (wTw)/2 subject to the following constraints:
For all x in class + 1
wTx+b >= 1
For all x in class - 1
wTx+b <= -1
} A total of n constraints if we have n input samples
4/3/18 49
Dr.YanjunQi/UVA
![Page 50: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/50.jpg)
Optimization Reformulation
Min (wTw)/2 subject to the following constraints:
For all x in class + 1
wTx+b >= 1
For all x in class - 1
wTx+b <= -1
} A total of n constraints if we have n input samples
4/3/18 50
Dr.YanjunQi/UVA
argminw,b
wi2
i=1p∑
subject to ∀x i ∈Dtrain : yi x i ⋅w + b( ) ≥1
![Page 51: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/51.jpg)
Optimization Reformulation
Min (wTw)/2 subject to the following constraints:
For all x in class + 1
wTx+b >= 1
For all x in class - 1
wTx+b <= -1
}A total of n constraints if we have n input samples
4/3/18 51
Dr.YanjunQi/UVA
argminw,b
wi2
i=1p∑
subject to ∀x i ∈Dtrain : yi wT x i + b( ) ≥1
Quadratic Objective
Quadratic programming i.e., - Quadratic objective - Linear constraints
![Page 52: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/52.jpg)
Today
q SupportVectorMachine(SVM)ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Optimizationtolearnmodelparameters(w,b)ü LinearlyNon-separablecase(softSVM)ü Optimizationwithdualformü Nonlineardecisionboundaryü PracticalGuide
4/3/18 52
Dr.YanjunQi/UVA
![Page 53: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/53.jpg)
Linearly Non separable case• So far we assumed that a linear hyperplane can perfectly separate the points
• But this is not usually the case
- noise, outliers How can we convert this to a QP problem?
- Minimize training errors?
min wTw
min #errors
Hard to solve (two minimization problems)
4/3/18 53
Dr.YanjunQi/UVA
![Page 54: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/54.jpg)
Linearly Non separable case• So far we assumed that a linear plane can perfectly separate the points
• But this is not usally the case
- noise, outliersHow can we convert this to a QP problem?
- Minimize training errors?
min wTw
min #errors
- Penalize training errors:
min wTw+C*(#errors)
Hard to solve (two minimization problems)
Hard to encode in a QP problem
4/3/18 54
Dr.YanjunQi/UVA
![Page 55: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/55.jpg)
Linearly Non separable case• Instead of minimizing the number of misclassified points we can minimize the distance between these points and their correct plane
-1 plane
+1 plane
jk
The new optimization problem is:
!!minw
wTw2 + Cε i
i=1
n
∑
4/3/18 55
Dr.YanjunQi/UVA
ε ε
![Page 56: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/56.jpg)
Linearly Non separable case• Instead of minimizing the number of misclassified points we can minimize the distance between these points and their correct plane
-1 plane
+1 plane
jk
The new optimization problem is:
subject to the following inequality constraints:For all xi in class + 1
wTxi+b >= 1- i
For all xi in class - 1
wTxi+b <= -1+ i
minw
wTw2 +C ε i
i=1
n
∑
Wait. Are we missing something?
4/3/18 56
Dr.YanjunQi/UVA
ε
εε ε
![Page 57: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/57.jpg)
Final optimization for linearly non-separable case
The new optimization problem is:
subject to the following inequality constraints:
minw
wTw2 +C ε i
i=1
n
∑
For all i
} A total of n constraints
} Another n constraints
4/3/18 57
Dr.YanjunQi/UVA
!!ε i ≥0
![Page 58: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/58.jpg)
4/3/18
Dr.YanjunQi/UVA
58
Two optimization problems:
For the separable and non separable cases
For all x in class + 1
wTx+b >= 1
For all x in class - 1
wTx+b <=-1
minw
wTw2 +C ε i
i=1
n
∑
For all i
€
minwwTw2
!!ε i ≥0
![Page 59: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/59.jpg)
HingeLossforSoftSVM
4/3/18
Dr.YanjunQi/UVA
59
argminw,b
wTw2
+C max(0,1− yi wT x i + b( ))i=1
n
∑
subject to:
yi wT x i + b( ) ≥1− ε i
ε i ≥ 0
minw
wTw2 +C ε i
i=1
n
∑
For all i
ε i ≥0
soft
argminw,b
wi2
i=1p∑
subject to ∀x i ∈Dtrain : yi wT x i + b( ) ≥1
vs.HardSVM
![Page 60: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/60.jpg)
4/3/18
Dr.YanjunQi/UVA
60
ModelSelection,findrightC
Selecttheright
penaltyparameter
C
![Page 61: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/61.jpg)
4/3/18
Dr.YanjunQi/UVA
61
ModelSelection,findrightC
AlargevalueofCmeansthatmisclassificationsarebad- resultinginsmallermarginsandlesstrainingerror(butmoreexpectedtrueerror).
AsmallCresultsinmoretrainingerror,hopefullybettertrueerror.
![Page 62: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/62.jpg)
Today
q SupportVectorMachine(SVM)ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Optimizationtolearnmodelparameters(w,b)üLinearlynon-separablecaseü Optimizationwithdualformü Nonlineardecisionboundaryü PracticalGuide
4/3/18 62
Dr.YanjunQi/UVA
![Page 63: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/63.jpg)
Two optimization problems: For the separable and non separable cases
Min (wTw)/2 minw
wTw2 +C ε i
i=1
n
∑
For all i
• Instead of solving these QPs directly we will solve a dual formulation of the SVM optimization problem
• The main reason for switching to this type of representation is that it would allow us to use a neat trick that will make our lives easier (and the run time faster)
4/3/18 63
Dr.YanjunQi/UVA
!!ε i ≥0
![Page 64: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/64.jpg)
Optimization Review: ConstrainedOptimization
4/3/18
Dr.YanjunQi/UVA
64
minu u2
s.t. u>=b
b Global min
Allowed min
b Global min
Allowed min
Case1:
Case2:
![Page 65: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/65.jpg)
Optimization Review: ConstrainedOptimization
4/3/18
Dr.YanjunQi/UVA
65
minu u2
s.t. u>=b
b Global min
Allowed min
b Global min
Allowed min
Case1:
Case2:
![Page 66: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/66.jpg)
Optimization Review: ConstrainedOptimization
4/3/18
Dr.YanjunQi/UVA
66
minu u2
s.t. u>=b
b Global min
Allowed min
b Global min
Allowed min
Case1:
Case2:
![Page 67: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/67.jpg)
Optimization Review: Ingredients
• Objectivefunction• Variables• Constraints
Findvaluesofthevariablesthatminimizeormaximizetheobjectivefunctionwhilesatisfyingtheconstraints
674/3/18
Dr.YanjunQi/UVA
![Page 68: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/68.jpg)
Dr.YanjunQi/UVA
Optimization Review: Lagrangian Duality(Extra)
• ThePrimalProblem
Primal:
ThegeneralizedLagrangian:
theα's(αι≥0)arecalledtheLagarangianmultipliersLemma:
Are-writtenPrimal:
minws.t.
f0(w)fi(w)≤0,i =1,…,k
L(w ,α )= f0(w)+ α i fi(w)
i=1
k
∑
maxα ,α i≥0L(w ,α )=
f0(w) ifw satisfiesprimalconstraints∞ o/w
⎧⎨⎪
⎩⎪
minwmaxα ,α i≥0L(w ,α ) ©EricXing@CMU,2006-20084/3/18
“MethodofLagrangemultipliers”converttoahigher-dimensional problem
![Page 69: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/69.jpg)
Dr.YanjunQi/UVA
69
Optimization Review: Lagrangian Duality,cont.(Extra)
• RecallthePrimalProblem:
• TheDualProblem:
• Theorem(weakduality):
• Theorem(strongduality):Iff thereexistasaddlepointof
wehave
minwmaxα ,α i≥0L(w ,α )
maxα ,α i≥0minwL(w ,α )
d* =maxα ,α i≥0minwL(w ,α ) ≤ minwmaxα ,α i≥0L(w ,α )= p
*
** pd = L(w ,α )
4/3/18
![Page 70: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/70.jpg)
An alternative representation of the SVM QP
• We will start with the linearly separable case
• Instead of encoding the correct classification rule and constraint we will use Lagrange multiplies to encode it as part of the our minimization problem
Min (wTw)/2
s.t.
(wTxi+b)yi >= 1
4/3/18 70
Dr.YanjunQi/UVA
Recall that Lagrange multipliers can be applied to turn the following problem:
Lprimal(w ,b,α )=
12 w
2− α i yi(w ⋅x i +b)−1( )
i=1
N
∑
![Page 71: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/71.jpg)
Dr.YanjunQi/UVA
71
***( )
TheDualProblem(Extra)
• WeminimizeL withrespecttow andb first:
Notethat(*) implies:
• Plus(***)backtoL ,andusing(**),wehave:
maxα i≥0minw ,bL(w ,b,α )
!! ∇wL(w ,b,α )! =w− α i yixi =0
i=1
train
∑ ,
!! ∇bL(w ,b,α )! = α i yi =0
i=1
train
∑ ,
!!w = α i yixi
i=1
train
∑
*()
!!! L(w ,b,α )= α i
i=1∑ − 12 α iα j yi y j(x iTx j )
i , j=1∑
**( )
4/3/18
![Page 72: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/72.jpg)
Solvingforw thatgivesmaximummargin:1. Combineobjectivefunctionandconstraintsintonew
objectivefunction,usingLagrangemultipliers \alphai
2. TominimizethisLagrangian,wetakederivativesofw andbandsetthemto0:
Summary:DualforSVM(Extra)
!!!Lprimal =
12 w
2− α i yi(w ⋅x i +b)−1( )
i=1
N
∑
4/3/18
Dr.YanjunQi/UVA
72
![Page 73: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/73.jpg)
3. Substitutingandrearranginggivesthedual oftheLagrangian:
whichwetrytomaximize(notminimize).
4. Oncewehavethe\alphai,wecansubstituteintopreviousequationstogetw andb.
5. Thisdefinesw andb aslinearcombinationsofthetrainingdata.
Summary:DualforSVM(Extra)
Ldual = α i
i=1
N
∑ − 12 i∑ α iα j yi y jx i ⋅x j
j∑
4/3/18
Dr.YanjunQi/UVA
73!!w = α i yixi
i=1
train
∑
![Page 74: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/74.jpg)
Summary: Dual SVM for linearly separable case
Dual formulation
maxα αi −i∑ 1
2αiα jyiyj
i,j∑ xi
Txj
αiyi = 0i∑
αi ≥ 0 ∀i
4/3/18 74
Dr.YanjunQi/UVA
EasierthanoriginalQP,moreefficientalgorithmsexisttofindαi
![Page 75: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/75.jpg)
Dual SVM for linearly separable case – Training
Our dual target function: maxα αi −i∑ 1
2αiα jyiyj
i,j∑ xi
Txj
αiyi = 0i∑
αi ≥ 0 ∀i
Dot product for all training samples
Dr.YanjunQi/UVA
4/3/18 75
![Page 76: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/76.jpg)
Dr.YanjunQi/UVA
76
Supportvectors:non-zero αi• onlyafewαi's canbenonzero!!
α i yi(w ⋅x i +b)−1( ) =0,i =1,…,n
Wecallthetrainingdatapointswhoseαi's arenonzerothesupportvectors (SV)
4/3/18
![Page 77: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/77.jpg)
4/3/18
Dr.YanjunQi/UVA
77
α6=1.4
Class 1
Class 2
α1=0.8
α2=0
α3=0
α4=0
α5=0
α7=0
α8=0.6
α9=0
α10=0
!!! α i yi(w ⋅x i +b)−1( ) =0,!!!!i =1,…,n
![Page 78: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/78.jpg)
4/3/18
Dr.YanjunQi/UVA
• onlyafewαi canbenonzero!!
!!! α i yi(w ⋅x i +b)−1( ) =0,!!!!i =1,…,n
α6=1.4
Class 1
Class 2
α1=0.8
α2=0
α3=0
α4=0
α5=0α7=0
α8=0.6
α9=0
α10=0
![Page 79: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/79.jpg)
Dual SVM - interpretation
€
w = α ixiyii∑
For i that are 0, no influence
4/3/18 79
Dr.YanjunQi/UVA
α
![Page 80: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/80.jpg)
Dual SVM for linearly separable case –
Testing To evaluate a new sample xtswe need to compute:
yts! = sign(wTxts +b)= sign( α iyi
i=1..n∑ xi
Txts +b)
Dot product with (“all” ??) training samples
4/3/18 80
Dr.YanjunQi/UVA
yts! = sign α i yi x i
Txts( )i∈SupportVectors
∑ +b⎛
⎝⎜⎞
⎠⎟
For \alphai that are 0,
no influence
![Page 81: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/81.jpg)
Dual formulation for linearly non-separable case
Dual target function:
!!
maxα α i −i∑ 1
2 α iα jyi y ji,j∑ xi
Txj
α iyi =0i∑C >α i ≥0,∀i
The only difference is that the \alpha are now bounded
4/3/18 81
Dr.YanjunQi/UVA
Hyperparameter Cshouldbetunedthrough k-foldsCV
Thisisverysimilartotheoptimizationproblem inthelinearseparablecase,exceptthatthereisanupperbound C onαi now
Onceagain,efficientalgorithmexisttofindαi
![Page 82: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/82.jpg)
Dual formulation for linearly non-separable case
Dual target function:
!!
maxα α i −i∑ 1
2 α iα jyi y ji,j∑ xi
Txj
α iyi =0i∑C >α i ≥0,∀i
To evaluate a new sample xtswe need to compute:
wTxts +b= α iyi
i∈supportV∑ xi
Txts +b
The only difference is that the \alpha are now bounded
4/3/18 82
Dr.YanjunQi/UVA
Hyperparameter Cshouldbetunedthrough k-foldsCV
Thisisverysimilartotheoptimizationproblem inthelinearseparablecase,exceptthatthereisanupperbound C onαi now
Onceagain,efficientalgorithmexisttofindαi
![Page 83: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/83.jpg)
Support Vector Machine
classification
Kernel Trick Func K(xi, xj)
Margin + Hinge Loss
QP with Dual form
Dual Weights
Task
Representation
Score Function
Search/Optimization
Models, Parameters
€
w = α ixiyii∑
argminw,b
wi2
i=1p∑ +C εi
i=1
n
∑
subject to ∀xi ∈ Dtrain : yi xi ⋅w+b( ) ≥1−εi
K(x, z) :=Φ(x)TΦ(z)
83
maxα α i −i∑ 1
2 α iα jyi y ji,j∑ xi
Txj
α iyi =0i∑ , α i ≥0 ∀i
![Page 84: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model](https://reader030.vdocuments.site/reader030/viewer/2022040304/5e9933f80858193cd76cb464/html5/thumbnails/84.jpg)
References• BigthankstoProf.Ziv Bar-JosephandProf.EricXing@CMUforallowingmetoreusesomeofhisslides
• ElementsofStatisticalLearning,byHastie,Tibshirani andFriedman
• Prof.AndrewMoore@CMU’sslides• TutorialslidesfromDr.Tie-YanLiu,MSRAsia• APracticalGuidetoSupportVectorClassificationChih-WeiHsu,Chih-ChungChang,andChih-JenLin,2003-2010
• TutorialslidesfromStanford“ConvexOptimizationI—Boyd&Vandenberghe
4/3/18 84
Dr.YanjunQi/UVACS