support vector machines - ucf computer sciencegqi/cap5610/cap5610lecture07.pdf · datapoints that...

CAP 5610: Machine Learning

Instructor: Guo-Jun QI

Support Vector Machines

Linear Classifier

Naive Bayes

Assume each attribute is drawn from Gaussian

distribution with the same variance

Generative model: estimate the mean and

variance with closed-form solution

Logistic regression

Directly maximizing the log likelihood to fit the

model into the training data

Discriminative model: no closed-form solution,

a gradient ascent method is used.

Drawback

Lacking of a geometric intuition to explain

what’s a good linear classifier in high

dimensional space.

Supervised learning methods used for

Classification

Regression

A special property: simultaneously

minimize the classification error

maximize the geometric margin

maximum margin classifier

Excellent theory and good performance

Outline

Linear SVM – hard margin

Linear SVM – soft margin

Non-linear SVM

Application

Outline

Linear SVM – hard margin

Linear SVM – soft margin

Non-linear SVM

Application

Linear Classifiersf x y

denotes +1

denotes -1

f(x,w,b) = sign(w x + b)

How would you classify this data?

w x + b<0

w x + b>0

Label y:

Parameters

Linear Classifiersf x yest

denotes +1

denotes -1

Parameters

denotes +1

denotes -1

Parameters

denotes +1

denotes -1

Any of these would be fine..

..but which is best?

Parameters

denotes +1

denotes -1

Misclassified

to +1 class

Parameters

Classifier Marginf x yest

denotes +1

denotes -1 Define the marginof a linear classifier as the width that the boundary could be increased by before hitting a data point.

Classifier Marginf x yest

denotes +1

denotes -1

Parameters

Maximum Marginf x yest

denotes +1

denotes -1

The maximum margin linear classifier is the linear classifier with the maximum margin.

This is the simplest kind of SVM (Called an LSVM)

Linear SVM

Support Vectors are those datapoints that the margin pushes up against

1. Maximizing the margin makes sense according to intuition

2. Implies that only support vectors are important; other training examples can be discarded without affecting the training result.

Parameters

Maximum Marginf x yest

denotes +1

denotes -1

The maximum margin linear classifier is the linear classifier with the maximum margin.

This is the simplest kind of SVM (Called an LSVM)

Linear SVM

Keeping only support vectors will not change the maximum margin classifier.

Robust to the small changes (noises) in non-support vectors

Parameters

Basics to SVM math

w/|w|w/|w|:

Perpendicular to line wx+b=0

Unit length

Margin of two parallel lines is

|| 2121

1 2 2 1

wx bw x x b b

Decision rule:

Positive examples: w . x+ + b > +1

Negative examples: w . x- + b < -1

Subtracting two equations: w . (x+-x-) = 2

Linear SVM Mathematically

What we know:

w . x+ + b = +1

w . x- + b = -1

w . (x+-x-) = 2

M=Margin Width

Linear SVM Mathematically

Goal: 1) Correctly classify all training data

if yi = +1

if yi = -1

for all i

2) Maximize the Margin

same as minimize

We can formulate a Quadratic Optimization Problem and solve for w and b

Minimize

subject to

1)( bwxy ii

Solving the Optimization Problem

Need to optimize a quadratic function subject to linear

constraints.

Use Lagrange multiplier. αi is associated with every

constraint : , dual problem

Find α1…αN such that

Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and

(1) Σαiyi = 0

(2) αi ≥ 0 for all αi

1)( bwxy ii

Refer: Christopher J. C. Burges: A Tutorial on Support Vector Machines for

Pattern Recognition, Data Mining and Knowledge Discovery, 1998

The Optimization Problem Solution

The solution has the form:

αi must satisfy Karush-Kuhn-Tucker conditions:

αi [yi(wTxi+b)-1]=0, for any i

If αi >0, yi(wTxi+b)-1=0, xi is on the margin

If yi(wTxi+b)>1, αi =0

Each non-zero αi indicates that corresponding xi

is a support vector.

w =Σαiyixi b= yk- wTxkfor any xk such that αk 0

Maximum Margin

denotes +1

denotes -1

w, b depends only on Support Vectors via active constraints

yi(wTxk+b)-1=0

The Optimization Problem Solution

To classify the new test point x, we use

f(x) = wx + b =ΣαiyixiTx + b

Find α1…αN such that

Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and

(1) Σαiyi = 0

(2) αi ≥ 0 for all αi

support vector machines - ucf computer sciencegqi/cap5610/cap5610lecture07.pdf · datapoints that...

Documents

left margin right margin 11.90 connected shopper’s

span margin

arxiv:1708.01955v3 [stat.ml] 15 mar 2018 · map of...

appendix e: roc datapoints at far of 0.01% and 1%, tar 98%

design margin analysis & prediction -...

the appside datapoints - september 2011

pranking with rankinglekheng/meetings/...ranking margin w 5...

2018 a turning point for uk retail energy | 2018 – a...

l2 v4 02 datapoints configuration e 01

left margin right margin 13.59 6.56 chart top the power …...

datapoints running the numbers - broadband library ·...

3.14 x axis 6.65 base margin 5.95 top margin 4.52 chart top...

independent component analysis - cs...

crop marks margin margin crop marks your unique family ......

profit margin

crop marks margin margin fra margin … crop marks margin...

page margin

outline - school of computing and information...

crop marks margin the wheely big cycle route map · margin...

crop marks margin margin crop marks gcc listed banks’...