support vector machines arthur j. parzygnat

32
Cupcakes versus muffins Support vector machines Arthur J. Parzygnat University of Connecticut SIGMA January 26, 2018 Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 1 / 32

Upload: others

Post on 17-Jun-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Support vector machines Arthur J. Parzygnat

Cupcakes versus muffinsSupport vector machines

Arthur J. Parzygnat

University of ConnecticutSIGMA

January 26, 2018

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 1 / 32

Page 2: Support vector machines Arthur J. Parzygnat

Outline

1 Cupcakes and muffinsWhat’s the difference?

2 HyperplanesParametrizing hyperplanesMarginal hyperplanesThe definition of a support vector machine

3 Existence and uniqueness of SVMsEnlarging marginsExistence of SVMs

4 A simple example of an SVM

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 2 / 32

Page 3: Support vector machines Arthur J. Parzygnat

Cupcakes and muffins What’s the difference?

What’s the difference?

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 3 / 32

Page 4: Support vector machines Arthur J. Parzygnat

Hyperplanes Parametrizing hyperplanes

Parametrizing hyperplanes

Lemma 1

Let H ⊆ Rn be a hyperplane. Then there exists a vector ~w ∈ Rn \ {~0} anda real number c ∈ R such that H is the solution set of 〈~w , ~x〉 − c = 0.

Proof.

Let ~h ∈ H. Then H − ~h is an(n− 1)-dimensional subspace in Rn.Hence, (H − ~h)⊥ is aone-dimensional subspace spannedby some normalized vector u.Because span{u} is perpendicularto H, there exists an a ∈ R suchthat au ∈ H. Then, H is thesolution set of 〈u, ~x〉 − a = 0.

•H

~h

u

au

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 4 / 32

Page 5: Support vector machines Arthur J. Parzygnat

Hyperplanes Marginal hyperplanes

Marginal planes

Note that a hat over a vector signifies that it is a unit vector (hasmagnitude 1). There are many ways to describe hyperplanes. It should notbe obvious why it is preferable to use the description given by a vector anda number. Furthermore, notice that there are redundancies with thisparametrization. For example, (λ~w , λc) and (~w , c) describe the samehyperplane for all λ ∈ R \ {0}.

Definition 2

Let ~w ∈ Rn \ {~0} and c ∈ R with associated plane H given by the solutionset of 〈~w , ~x〉 − c = 0. The marginal planes H+ and H− associated to(~w , c) are the solution sets to 〈~w , ~x〉 − c = 1 and 〈~w , ~x〉 − c = −1,respectively, i.e.

H± :={~x ∈ Rn : 〈~w , ~x〉 − c = ±1

}.

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 5 / 32

Page 6: Support vector machines Arthur J. Parzygnat

Hyperplanes Marginal hyperplanes

Marginal planes example

Example 3

In R2, if ~w =1

3

[31

]and c = 4, then

the marginal planes would look likethe figure on the right since they aredescribed by

H :1

3(3x + y)− 4 = 0

H+ :1

3(3x + y)− 4 = 1

H− :1

3(3x + y)− 4 = −1 •

HH− H+

~w

Notice that the vector

(c

‖~w‖2

)~w =

6

5

[31

]lies on the plane H.

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 6 / 32

Page 7: Support vector machines Arthur J. Parzygnat

Hyperplanes Marginal hyperplanes

The distance between marginal hyperplanes

Lemma 4

Let (~w , c) ∈ (Rn \ {~0})× R describe a hyperplane H. The perpendiculardistance between H and H± is 1

‖~w‖ .

Proof.

Let ~x+ ∈ H+ and ~x ∈ H. Theorthogonal distance between H and

H+ is given by⟨~x+ − ~x , ~w

‖~w‖

⟩=

1‖~w‖(〈~w , ~x+〉 − 〈~w , ~x〉

)=

1‖~w‖((1 + c)− c

)= 1‖~w‖ by the

definition of H and H+ in terms of(~w , c). A similar calculation holds forH−. •

HH− H+

~w

~x+

~x

~x+−~x

w‖~w‖

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 7 / 32

Page 8: Support vector machines Arthur J. Parzygnat

Hyperplanes Marginal hyperplanes

The margin and margin width

Definition 5

Let (~w , c) ∈ (Rn \ {~0})× R describe a hyperplane H. The convex regionbetween H+ and H− is called the margin of (~w , c). The orthogonaldistance between H+ and H−, which is given by 2

‖~w‖ , is called the margin

width of (~w , c).

Even though (~w , c) can be scaled to (λ~w , λc) to give the same H for anyλ ∈ R \ {0}, notice that the marginal planes are different. This is becausethe margin width has scaled by a factor of 1

λ .

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 8 / 32

Page 9: Support vector machines Arthur J. Parzygnat

Hyperplanes Marginal hyperplanes

Parametrizing hyperplanes and their margins

Lemma 6

An ordered pair of distinct parallel hyperplanes (H−,H+) in Rn determinea unique (~w , c) ∈

(Rn \ {~0}

)× R whose marginal planes agree with H−

and H+.

Proof.

Let ~x+ ∈ H+ and pick u ∈ (H+ − ~x+)⊥ suchthat if λu ∈ H− and µu ∈ H+, then λ < µ (i.e.u is perpendicular to H+ and points from H−to H+). Also, let ~x− ∈ H−. Since theorthogonal separation between the planes H+

and H− is 〈~x+ − ~x−, u〉, set ~w :=(

2〈~x+−~x−,u〉

)u

and c := 〈~u,~x++~x−〉〈~u,~x+−~x−〉 . Then (~w , c) has H+ & H−

as its marginal planes.•

H− H+u

~x+

~x−

~x+ − ~x−

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 9 / 32

Page 10: Support vector machines Arthur J. Parzygnat

Hyperplanes Marginal hyperplanes

Parametrizing hyperplanes and their margins

Corollary 7

There is a 1-1 correspondence between the set of (ordered) pairs of parallelhyperplanes (the marginal hyperplanes) and the set

(Rn \ {~0}

)× R.

Henceforth, we will abuse notation a bit so that when we say “hyperplane,”we will always mean an element of

(Rn \ {~0}

)×R unless otherwise stated.

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 10 / 32

Page 11: Support vector machines Arthur J. Parzygnat

Hyperplanes The definition of a support vector machine

Separating hyperplanes

Definition 8

Let (X ,X+,X−) denote a non-empty set X of vectors in Rn that areseparated into the two (disjoint) non-empty sets X+ and X−. Such acollection of sets is called a training data set. A hyperplane H ⊆ Rn,described by (~w , c) ∈ (Rn \ {~0})× R, separates (X ,X+,X−) iff

〈~w , ~x+〉 − c > 0 & 〈~w , ~x−〉 − c < 0

for all ~x+ ∈ X+ and for all ~x− ∈ X−. In this case, H is said to be aseparating hyperplane for (X ,X+,X−). H marginally separates(X ,X+,X−) iff

〈~w , ~x+〉 − c ≥ 1 & 〈~w , ~x−〉 − c ≤ −1

for all ~x+ ∈ X+ and for all ~x− ∈ X−.

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 11 / 32

Page 12: Support vector machines Arthur J. Parzygnat

Hyperplanes The definition of a support vector machine

Support vector machine

Definition 9

Let (X ,X+,X−) denote a training data set in Rn. LetSX ⊆ (Rn \ {~0})× R denote the set of hyperplanes that marginallyseparate (X ,X+,X−). Let f : SX → R be the function defined by

(Rn \ {~0})× R 3 (~w , c) 7→ f (~w , c) :=2

‖~w‖,

i.e. the margin. A support vector machine (SVM) for (X ,X+,X−) is amaximum of f , i.e. an SVM is a marginally separating hyperplane(~w , c) ∈ (Rn \ {~0})× R such that 1

‖~w ′‖ ≤1‖~w‖ for every other marginally

separating hyperplane (~w ′, c ′) ∈ (Rn \ {~0})× R.

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 12 / 32

Page 13: Support vector machines Arthur J. Parzygnat

Hyperplanes The definition of a support vector machine

Support vector machine

Example 10

An SVM is a hyperplane that maximizes the margin

HH− H+

- --- -

- -

-

+

+ +

+

+ +

+

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 13 / 32

Page 14: Support vector machines Arthur J. Parzygnat

Hyperplanes The definition of a support vector machine

The support vectors of a marginally separating hyperplane

How will we know when we have found an SVM? The key to determiningmarginally separating hyperplanes will be in the possible support vectors.Our goal will then be to maximize the margin over all possible supportvectors.

Definition 11

Let (X ,X+,X−) be a training data set and let (~w , c) be a marginallyseparating hyperplane for this set. The elements of X ∩ H± are calledsupport vectors for (~w , c). The set of support vectors is denoted by Hsupp

X .The notation Hsupp

X± := HsuppX ∩ H± will also be used to denote the set of

positive and negative support vectors.

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 14 / 32

Page 15: Support vector machines Arthur J. Parzygnat

Hyperplanes The definition of a support vector machine

Examples of support vectors

Example 12

Two examples of support vectors have been circled for two differentseparating marginal hyperplanes for the same training data set.

HH− H+

- --- -

- -

-

+

+ +

+

+ +

+HH− H+

- --- -

- -

-

+

+ +

+

+ +

+

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 15 / 32

Page 16: Support vector machines Arthur J. Parzygnat

Existence and uniqueness of SVMs Enlarging margins

“Enlarging the margins” Lemma Statement

Lemma 13

Let (X ,X+,X−) be a training data set and let (~w , c) be a marginallyseparating hyperplane for this set. Then there exists a unique marginallyseparating hyperplane (~v , d) such that

v = w &2

‖~v‖= min

~x+∈X+~x−∈X−

〈~x+ − ~x−, w〉 .

In other words, if the marginal planes do not contain any of the trainingdata set, then the separating hyperplane can be translated and the marginwidth can be enlarged until the margin touches both positive and negativetraining data sets. We use the minimum orthogonal distance between thepositive and negative training data sets to define the new margin width.

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 16 / 32

Page 17: Support vector machines Arthur J. Parzygnat

Existence and uniqueness of SVMs Enlarging margins

“Enlarging the margins” Lemma Picture

H

K

H−

K−

H+

K+

- -

-

- -

- -

-

+

+ +

+

+ +

+

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 17 / 32

Page 18: Support vector machines Arthur J. Parzygnat

Existence and uniqueness of SVMs Enlarging margins

“Enlarging the margins” Lemma Proof

Proof.

Notice that(c−1‖~w‖

)w ∈ H−,

(c‖~w‖

)w ∈ H, and

(c+1‖~w‖

)w ∈ H+ provide

explicit vectors in each of these three hyperplanes. Set m+ to be theremaining minimum orthogonal distance between H+ and X+ and set m−to be the remaining minimum orthogonal distance between H− and X−,namely

m± := min~x±∈X±

θ(~x±)

(〈~x±, w〉 −

(c ± 1

‖~w‖

)),

where θ : X → R is the function defined by

X 3 ~x 7→ θ(~x) :=

{+1 if ~x ∈ X+

−1 if ~x ∈ X−.

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 18 / 32

Page 19: Support vector machines Arthur J. Parzygnat

Existence and uniqueness of SVMs Enlarging margins

“Enlarging the margins” Lemma Proof

Proof.

Therefore, the planes K± containing the vectors(c±1‖~w‖ ±m±

)w that are

perpendicular to w intersect X± but do not contain points of X on theinterior of their margin. By Lemma 6, there exists a(~v , d) ∈

(Rn \ {~0}

)× R that describes these marginal separating

hyperplanes. In fact, they can be given explicitly by

~v :=

(2

2‖~w‖ + m+ + m−

)w

and

d :=

⟨~v ,

(c + 1

‖~w‖+ m+

)w

⟩− 1 =

2c + ‖~w‖(m+ −m−)

2 + ‖~w‖(m+ + m−).

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 19 / 32

Page 20: Support vector machines Arthur J. Parzygnat

Existence and uniqueness of SVMs Existence of SVMs

Existence of SVMs

Theorem 14

Let (X ,X+,X−) be training data set for which there exists a separatinghyperplane for (X ,X+,X−). Then there exists a unique SVM for(X ,X+,X−).

Proof.

By Lemma 13, it suffices to maximize the margin function f on the subsetSsuppX ⊆ SX consisting of marginally separating hyperplanes that haveboth positive and negative support vectors, namely on

SsuppX :={

(~w , c) ∈ SX : H± ∩ X± 6= ∅}.

The goal is therefore to maximize the margin function, which is a functionof (~w , c), subject to the constraint

θ(~x)(〈~w , ~x〉 − c

)− 1 = 0 ∀ ~x ∈ SsuppX .

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 20 / 32

Page 21: Support vector machines Arthur J. Parzygnat

Existence and uniqueness of SVMs Existence of SVMs

Proof of existence of SVMs

Proof.

Maximizing the margin function is equivalent to minimizing the function

(Rn \ {~0})× R 3 (~w , c)g7−→ 1

2‖~w‖2 −

∑~x∈X

α~x

(θ(~x)

(〈~w , ~x〉 − c

)− 1).

Here, α~x = 0 for all ~x ∈ X \ HsuppX (remember, Hsupp

X is the set of supportvectors for the given hyperplane) and α~x needs to be determined for all~x ∈ Hsupp

X . This condition guarantees that the function g equals 2f 2 when

restricted to SsuppX (but notice that it does not equal 2f 2 on the larger

domain SX of all marginally separating hyperplanes). The α~x are calledLagrange multipliers. The extrema of g occur at points (~w , c) for whichthe derivative of g vanishes with respect to these coordinates

∂g

∂~w

∣∣∣(~w ,c)

= 0 &∂g

∂c

∣∣∣(~w ,c)

= 0.

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 21 / 32

Page 22: Support vector machines Arthur J. Parzygnat

Existence and uniqueness of SVMs Existence of SVMs

Proof of existence of SVMs

Proof.

The first of these two equations gives

~w =∑~x∈X

α~xθ(~x)~x , (1)

while the second equation gives∑~x∈X

α~xθ(~x) = 0, (2)

which is a condition that the Lagrange multipliers have to satisfy.Plugging in these results back into the function g gives

g(~w , c) =∑~x∈X

α~x −1

2

∑~x ,~y∈X

α~xα~yθ(~x)θ(~y)〈~x , ~y〉

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 22 / 32

Page 23: Support vector machines Arthur J. Parzygnat

Existence and uniqueness of SVMs Existence of SVMs

Proof of existence of SVMs

Proof.

Setting ∂g∂α~x

∣∣∣(~w ,c)

= 0 for each ~x ∈ HsuppX will give additional conditions

that the Lagrange multipliers have to satisfy, which are given by

θ(~x)(〈~w , ~x〉 − c

)= 1

for each ~x ∈ HsuppX . Plugging in the earlier result (1) for ~w into this

constraint gives ∑~y∈X

α~yθ(~y)〈~y , ~x〉 − θ(~x) = c (3)

for each ~x ∈ HsuppX . However, there is one subtle point, and that is that we

do not know what HsuppX is. Nevertheless, there is still an optimization

procedure left over, and it is based on the different possible choices ofHsuppX , i.e. for each pair of nonempty subsets X± ⊆ X±.

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 23 / 32

Page 24: Support vector machines Arthur J. Parzygnat

Existence and uniqueness of SVMs Existence of SVMs

Proof of existence of SVMs

Proof.

For each choice of HsuppX , one has the linear system∑

~y∈X

θ(~y)α~y = 0

{∑~y∈X

θ(~y)〈~y , ~x〉α~y − c = θ(~x)}~x∈X+∪X−

(4)

in the variables⋃

~x∈X+∪X−

{α~x} ∪ {c} obtained from equations (2) and (3).

This describes a linear system of |HsuppX |+ 1 equations in |Hsupp

X |+ 1variables. At least one of these systems is consistent because we haveassumed that the training data set can be separated. Hence, a solution tothe SVM problem exists since it is given by maximizing 1

‖~w‖ over the

(finite) set of pairs of nonempty subsets X± ⊆ X±.Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 24 / 32

Page 25: Support vector machines Arthur J. Parzygnat

A simple example of an SVM

Example of an SVM

Consider the training data set given by

X+ :=

{~x+ :=

[01

]}& X− :=

{~x1− :=

[−1−2

], ~x2− :=

[2−1

]}.

-

-

+

The inner products are given by

〈~x+, ~x+〉 = 1

〈~x+, ~x1−〉 = −2

〈~x+, ~x2−〉 = −1

〈~x1−, ~x

1−〉 = 5

〈~x1−, ~x

2−〉 = 0

〈~x2−, ~x

2−〉 = 5

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 25 / 32

Page 26: Support vector machines Arthur J. Parzygnat

A simple example of an SVM

Example: case i) H suppX = {~x+, ~x

1−, ~x

2−}

For this case, (4) reads

α~x+−α~x1

−−α~x1

−= 0

α~x++2α~x1

−+α~x1

−−c = 1

−2α~x+−5α~x1

−−c = −1

−α~x+−5α~x1

−−c = −1

and the solution to this linear system is

α~x+=

5

16, α~x1

−=

1

8, α~x2

−=

3

16, c = −1

4.

The associated vector ~w and margin are therefore given by

~w =5

16

[01

]− 1

8

[−1−2

]− 3

16

[2−1

]=

1

4

[−13

]=⇒ 2

‖~w‖=

8√10.

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 26 / 32

Page 27: Support vector machines Arthur J. Parzygnat

A simple example of an SVM

Example: case i) H suppX = {~x+, ~x

1−, ~x

2−}

The hyperplanes are then given by the solutions to the three equations

H :1

4(−x + 3y) +

1

4= 0

H+ :1

4(−x + 3y) +

1

4= 1

H− :1

4(−x + 3y) +

1

4= −1

H+

H

H−

-

-

+

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 27 / 32

Page 28: Support vector machines Arthur J. Parzygnat

A simple example of an SVM

Example: case ii) H suppX = {~x+, ~x

1−}

For this case, α~x2−

= 0 and (4) reads

α~x+−α~x1

−= 0

α~x++2α~x1

−−c = 1

−2α~x+−5α~x1

−−c = −1

and the solution to this linear system is

α~x+=

1

5, α~x1

−=

1

5, c = −2

5.

The associated vector ~w and margin are therefore given by

~w =1

5

[01

]− 1

5

[−1−2

]=

1

5

[13

]=⇒ 2

‖~w‖=

10√10.

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 28 / 32

Page 29: Support vector machines Arthur J. Parzygnat

A simple example of an SVM

Example: case ii) H suppX = {~x+, ~x

1−}

The hyperplanes are then given by the solutions to the three equations

H :1

5(x + 3y) +

2

5= 0

H+ :1

5(x + 3y) +

2

5= 1

H− :1

5(x + 3y) +

2

5= −1

• H+

H

H−

-

-

+

Even though the margin is larger here, we must reject this solution becauseit is not a marginally separating hyperplane for the training data set.

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 29 / 32

Page 30: Support vector machines Arthur J. Parzygnat

A simple example of an SVM

Example: case iii) H suppX = {~x+, ~x

2−}

For this case, α~x1−

= 0 and (4) reads

α~x+−α~x1

−= 0

α~x++α~x1

−−c = 1

−α~x+−5α~x1

−−c = −1

and the solution to this linear system is

α~x+=

1

4, α~x2

−=

1

4, c = −1

2.

The associated vector ~w and margin are therefore given by

~w =1

4

[01

]− 1

4

[2−1

]=

1

2

[−11

]=⇒ 2

‖~w‖=

2√2

=2√

5√10.

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 30 / 32

Page 31: Support vector machines Arthur J. Parzygnat

A simple example of an SVM

Example: case iii) H suppX = {~x+, ~x

2−}

The hyperplanes are then given by the solutions to the three equations

H :1

5(x + 3y) +

2

5= 0

H+ :1

5(x + 3y) +

2

5= 1

H− :1

5(x + 3y) +

2

5= −1

H+

H

H−

-

-

+

We must also reject this solution because it is not a marginally separatinghyperplane for the training data set. Hence, our first (~w , c) is the SVM.

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 31 / 32

Page 32: Support vector machines Arthur J. Parzygnat

Thank you

Thank you!

Thanks for your attention!References:

1 Alice Zhao, Support Vector Machines: A Visual Explanation withSample Python Code (2017). Available athttps://youtu.be/N1vOgolbjSc.

2 Patrick Winston. 6.034 Artificial Intelligence “Lecture 16. Learning:Support Vector Machines.” Fall 2010. Massachusetts Institute ofTechnology: MIT OpenCourseWare, https://ocw.mit.edu.License: Creative Commons BY-NC-SA. Available athttps://youtu.be/_PwhiWxHK8o.

3 Benjamin Russo. Private Communication.4 Images of muffin from wikipedia

(https://en.wikipedia.org/wiki/Muffin)5 Images of cupcake from

https://littlecupcakes.com.au/product/red-velvet/.

Arthur J. Parzygnat (University of ConnecticutSIGMA) Cupcakes versus muffins January 26, 2018 32 / 32