support vector machines

33
Support Vector Machines Refer to Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials

Upload: lottie

Post on 16-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Support Vector Machines. Refer to Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials. a. Linear Classifiers. x. f. y est. f ( x , w ,b ) = sign( w . x - b ). denotes +1 denotes -1. How would you classify this data?. a. Linear Classifiers. x. f. y est. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Support Vector Machines

Support Vector Machines

Refer to Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials

Page 2: Support Vector Machines

Support Vector Machines: Slide 2

Linear Classifiers

f x

yest

denotes +1

denotes -1

f(x,w,b) = sign(w. x - b)

How would you classify this data?

Page 3: Support Vector Machines

Support Vector Machines: Slide 3

Linear Classifiers

f x

yest

denotes +1

denotes -1

f(x,w,b) = sign(w. x - b)

How would you classify this data?

Page 4: Support Vector Machines

Support Vector Machines: Slide 4

Linear Classifiers

f x

yest

denotes +1

denotes -1

f(x,w,b) = sign(w. x - b)

How would you classify this data?

Page 5: Support Vector Machines

Support Vector Machines: Slide 5

Linear Classifiers

f x

yest

denotes +1

denotes -1

f(x,w,b) = sign(w. x - b)

How would you classify this data?

Page 6: Support Vector Machines

Support Vector Machines: Slide 6

Linear Classifiers

f x

yest

denotes +1

denotes -1

f(x,w,b) = sign(w. x - b)

Any of these would be fine..

..but which is best?

Page 7: Support Vector Machines

Support Vector Machines: Slide 7

Classifier Margin

f x

yest

denotes +1

denotes -1

f(x,w,b) = sign(w. x - b)

Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.

Page 8: Support Vector Machines

Support Vector Machines: Slide 8

Maximum Margin

f x

yest

denotes +1

denotes -1

f(x,w,b) = sign(w. x - b)

The maximum margin linear classifier is the linear classifier with the, um, maximum margin.

This is the simplest kind of SVM (Called an LSVM)Linear SVM

Page 9: Support Vector Machines

Support Vector Machines: Slide 9

Maximum Margin

f x

yest

denotes +1

denotes -1

f(x,w,b) = sign(w. x - b)

The maximum margin linear classifier is the linear classifier with the, um, maximum margin.

This is the simplest kind of SVM (Called an LSVM)

Support Vectors are those datapoints that the margin pushes up against

Linear SVM

Page 10: Support Vector Machines

Support Vector Machines: Slide 10

Why Maximum Margin?

denotes +1

denotes -1

f(x,w,b) = sign(w. x - b)

The maximum margin linear classifier is the linear classifier with the, um, maximum margin.

This is the simplest kind of SVM (Called an LSVM)

Support Vectors are those datapoints that the margin pushes up against

1. Intuitively this feels safest.

2. If we’ve made a small error in the location of the boundary (it’s been jolted in its perpendicular direction) this gives us least chance of causing a misclassification.

3. LOOCV is easy since the model is immune to removal of any non-support-vector datapoints.

4. There’s some theory (using VC dimension) that is related to (but not the same as) the proposition that this is a good thing.

5. Empirically it works very very well.

Page 11: Support Vector Machines

Support Vector Machines: Slide 11

Estimate the Margin

• What is the distance expression for a point x to a line wx+b= 0?

denotes +1

denotes -1 x wx +b = 0

Page 12: Support Vector Machines

Support Vector Machines: Slide 12

Estimate the Margin

• What is the expression for margin?

denotes +1

denotes -1 wx +b = 0

Margin

Page 13: Support Vector Machines

Support Vector Machines: Slide 13

Maximize Margindenotes

+1

denotes -1wx +b =

0

Margin

Page 14: Support Vector Machines

Support Vector Machines: Slide 14

Maximize Margindenotes

+1

denotes -1wx +b =

0

Margin

• Min-max problem game problem

Page 15: Support Vector Machines

Support Vector Machines: Slide 15

Maximize Margindenotes

+1

denotes -1wx +b =

0

Margin

Strategy:

Page 16: Support Vector Machines

Support Vector Machines: Slide 16

Maximum Margin Linear Classifier

• How to solve it?

Page 17: Support Vector Machines

Support Vector Machines: Slide 17

Learning via Quadratic Programming

• QP is a well-studied class of optimization algorithms to maximize a quadratic function of some real-valued variables subject to linear constraints.

Page 18: Support Vector Machines

Support Vector Machines: Slide 18

Quadratic Programming

2maxarg

uuud

u

Rc

TT Find

nmnmnn

mm

mm

buauaua

buauaua

buauaua

...

:

...

...

2211

22222121

11212111

)()(22)(11)(

)2()2(22)2(11)2(

)1()1(22)1(11)1(

...

:

...

...

enmmenenen

nmmnnn

nmmnnn

buauaua

buauaua

buauaua

And subject to

n additional linear inequality constraints

e a

dd

ition

al

linear

eq

uality

co

nstra

ints

Quadratic criterion

Subject to

Page 19: Support Vector Machines

Support Vector Machines: Slide 19

Quadratic Programming

2maxarg

uuud

u

Rc

TT Find

Subject to

nmnmnn

mm

mm

buauaua

buauaua

buauaua

...

:

...

...

2211

22222121

11212111

)()(22)(11)(

)2()2(22)2(11)2(

)1()1(22)1(11)1(

...

:

...

...

enmmenenen

nmmnnn

nmmnnn

buauaua

buauaua

buauaua

And subject to

n additional linear inequality constraints

e a

dd

ition

al

linear

eq

uality

co

nstra

ints

Quadratic criterion

There exist algorithms for

finding such constrained

quadratic optima much

more efficiently and

reliably than gradient

ascent.

(But they are very fiddly…you

probably don’t want to

write one yourself)

Page 20: Support Vector Machines

Support Vector Machines: Slide 20

Uh-oh!

denotes +1

denotes -1

This is going to be a problem!

What should we do?

Page 21: Support Vector Machines

Support Vector Machines: Slide 21

Uh-oh!

denotes +1

denotes -1

This is going to be a problem!

What should we do?

Idea 1:

Find minimum w.w, while minimizing number of training set errors.

Problemette: Two things to minimize makes for an ill-defined optimization

Page 22: Support Vector Machines

Support Vector Machines: Slide 22

Uh-oh!

denotes +1

denotes -1

This is going to be a problem!

What should we do?

Idea 1.1:

Minimize

w.w + C (#train errors)

There’s a serious practical problem that’s about to make us reject this approach. Can you guess what it is?

Tradeoff parameter

Page 23: Support Vector Machines

Support Vector Machines: Slide 23

Uh-oh!

denotes +1

denotes -1

This is going to be a problem!

What should we do?

Idea 1.1:

Minimize

w.w + C (#train errors)

There’s a serious practical problem that’s about to make us reject this approach. Can you guess what it is?

Tradeoff parameterCan’t be expressed as a Quadratic

Programming problem.

Solving it may be too slow.

(Also, doesn’t distinguish between disastrous errors and near

misses)

So… any

other

ideas?

Page 24: Support Vector Machines

Support Vector Machines: Slide 24

Uh-oh!

denotes +1

denotes -1

This is going to be a problem!

What should we do?

Idea 2.0:

Minimize w.w + C (distance of error points to their correct place)

Page 25: Support Vector Machines

Support Vector Machines: Slide 25

Support Vector Machine (SVM) for Noisy Data

• Any problem with the above formulism?

denotes +1

denotes -1

1

2

3

Page 26: Support Vector Machines

Support Vector Machines: Slide 26

Support Vector Machine (SVM) for Noisy Data

• Balance the trade off between margin and classification errors

denotes +1

denotes -1

1

2

3

Page 27: Support Vector Machines

Support Vector Machines: Slide 27

Support Vector Machine for Noisy Data

How do we determine the appropriate value for c ?

Page 28: Support Vector Machines

Support Vector Machines: Slide 28

An Equivalent QP: Determine b

A linear programming problem !

Fix w

Page 29: Support Vector Machines

Support Vector Machines: Slide 29

Suppose we’re in 1-dimension

What would SVMs do with this data?

x=0

Page 30: Support Vector Machines

Support Vector Machines: Slide 30

Suppose we’re in 1-dimension

Not a big surprise

Positive “plane” Negative “plane”

x=0

Page 31: Support Vector Machines

Support Vector Machines: Slide 31

Harder 1-dimensional dataset

That’s wiped the smirk off SVM’s face.

What can be done about this?

x=0

Page 32: Support Vector Machines

Support Vector Machines: Slide 32

Harder 1-dimensional datasetRemember how

permitting non-linear basis functions made linear regression so much nicer?

Let’s permit them here too

x=0 ),( 2kkk xxz

Page 33: Support Vector Machines

Support Vector Machines: Slide 33

Harder 1-dimensional datasetRemember how

permitting non-linear basis functions made linear regression so much nicer?

Let’s permit them here too

x=0 ),( 2kkk xxz