support vector machines - ucf computer sciencegqi/cap5610/cap5610lecture07.pdf · datapoints that...

Post on 16-Aug-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

CAP 5610: Machine Learning

Instructor: Guo-Jun QI

Support Vector Machines

Linear Classifier

Naive Bayes

Assume each attribute is drawn from Gaussian

distribution with the same variance

Generative model: estimate the mean and

variance with closed-form solution

Logistic regression

Directly maximizing the log likelihood to fit the

model into the training data

Discriminative model: no closed-form solution,

a gradient ascent method is used.

2

Drawback

Lacking of a geometric intuition to explain

what’s a good linear classifier in high

dimensional space.

3

SVM

Supervised learning methods used for

Classification

Regression

A special property: simultaneously

minimize the classification error

maximize the geometric margin

maximum margin classifier

Excellent theory and good performance

4

Outline

Linear SVM – hard margin

Linear SVM – soft margin

Non-linear SVM

Application

5

Outline

Linear SVM – hard margin

Linear SVM – soft margin

Non-linear SVM

Application

6

Linear Classifiersf x y

denotes +1

denotes -1

f(x,w,b) = sign(w x + b)

How would you classify this data?

w x + b<0

w x + b>0

Label y:

Parameters

Linear Classifiersf x yest

denotes +1

denotes -1

f(x,w,b) = sign(w x + b)

How would you classify this data?

Parameters

Linear Classifiersf x yest

denotes +1

denotes -1

f(x,w,b) = sign(w x + b)

How would you classify this data?

Parameters

Linear Classifiersf x yest

denotes +1

denotes -1

f(x,w,b) = sign(w x + b)

Any of these would be fine..

..but which is best?

Parameters

Linear Classifiersf x yest

denotes +1

denotes -1

f(x,w,b) = sign(w x + b)

How would you classify this data?

Misclassified

to +1 class

Parameters

Classifier Marginf x yest

denotes +1

denotes -1 Define the marginof a linear classifier as the width that the boundary could be increased by before hitting a data point.

Classifier Marginf x yest

denotes +1

denotes -1

f(x,w,b) = sign(w x + b)

Parameters

Maximum Marginf x yest

denotes +1

denotes -1

f(x,w,b) = sign(w x + b)

The maximum margin linear classifier is the linear classifier with the maximum margin.

This is the simplest kind of SVM (Called an LSVM)

Linear SVM

Support Vectors are those datapoints that the margin pushes up against

1. Maximizing the margin makes sense according to intuition

2. Implies that only support vectors are important; other training examples can be discarded without affecting the training result.

Parameters

Maximum Marginf x yest

denotes +1

denotes -1

f(x,w,b) = sign(w x + b)

The maximum margin linear classifier is the linear classifier with the maximum margin.

This is the simplest kind of SVM (Called an LSVM)

Linear SVM

Keeping only support vectors will not change the maximum margin classifier.

Robust to the small changes (noises) in non-support vectors

Parameters

Basics to SVM math

15

w/|w|w/|w|:

Perpendicular to line wx+b=0

Unit length

Margin of two parallel lines is

where

x1

x2

||

||

||

|| 2121

w

bb

w

xxw

w/|w|

1 1

1 2 2 1

2 2

0( )

0

wx bw x x b b

wx b

Decision rule:

Positive examples: w . x+ + b > +1

Negative examples: w . x- + b < -1

Subtracting two equations: w . (x+-x-) = 2

x-

x+

Linear SVM Mathematically

What we know:

w . x+ + b = +1

w . x- + b = -1

w . (x+-x-) = 2

x-

x+

ww

wxxM

2)(

M=Margin Width

Linear SVM Mathematically

Linear SVM Mathematically

Goal: 1) Correctly classify all training data

if yi = +1

if yi = -1

for all i

2) Maximize the Margin

same as minimize

We can formulate a Quadratic Optimization Problem and solve for w and b

Minimize

subject to

wM

2

www t

2

1)(

1bwxi

1bwxi

1)( bwxy ii

1)( bwxy ii

i

wwt

2

1

Solving the Optimization Problem

Need to optimize a quadratic function subject to linear

constraints.

Use Lagrange multiplier. αi is associated with every

constraint : , dual problem

Find α1…αN such that

Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and

(1) Σαiyi = 0

(2) αi ≥ 0 for all αi

1)( bwxy ii

Refer: Christopher J. C. Burges: A Tutorial on Support Vector Machines for

Pattern Recognition, Data Mining and Knowledge Discovery, 1998

The Optimization Problem Solution

The solution has the form:

αi must satisfy Karush-Kuhn-Tucker conditions:

αi [yi(wTxi+b)-1]=0, for any i

If αi >0, yi(wTxi+b)-1=0, xi is on the margin

If yi(wTxi+b)>1, αi =0

Each non-zero αi indicates that corresponding xi

is a support vector.

w =Σαiyixi b= yk- wTxkfor any xk such that αk 0

Maximum Margin

denotes +1

denotes -1

w, b depends only on Support Vectors via active constraints

yi(wTxk+b)-1=0

The Optimization Problem Solution

To classify the new test point x, we use

f(x) = wx + b =ΣαiyixiTx + b

Find α1…αN such that

Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and

(1) Σαiyi = 0

(2) αi ≥ 0 for all αi

top related