introduction to the vc-dimension - github pagesweiningchen.github.io/notes/vc_dim.pdf · outline 1...

40
I NTRODUCTION TO THE VC-DIMENSION Wei-Ning Chen December 28, 2018 WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 1 / 40

Upload: others

Post on 07-Aug-2020

7 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

INTRODUCTION TO THE VC-DIMENSION

Wei-Ning Chen

December 28, 2018

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 1 / 40

Page 2: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

OUTLINE

1 RECAP

2 MOTIVATION

3 THE VC-DIMENSIONDefinitionsExamples

4 THE FUNDAMENTAL THEOREM OF LEARNING THEORY

5 ADVANCED TOPICSGlivenko-Cantelli TheoremVC-entropy and Growth Function

6 EXERCISES AND DISCUSSION

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 2 / 40

Page 3: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

RECAP

DEFINITION (UNIFORM CONVERGENCE)

We say that a hypothesis class H has the uniform convergence property (w.r.t. a domainZ and a loss function `) if there exists a function mUC

H such that for every ε, δ ∈ (0, 1) andfor every probability distribution D over Z, if S is a sample of m ≥ mUC

H (ε, δ) examplesdrawn i.i.d. according to D, then

P(|LS(h)− LD(h)| ≤ ε, ∀h ∈ H) ≥ 1− δ

Equivalently,limm→∞

P(suph∈H|LS(h)− LD(h)| > ε) = 0

Remark: Compare to the difinition of PAC:

P(LD(hS)− infh∈H

LD(h) ≤ ε) ≥ 1− δ

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 3 / 40

Page 4: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

RECAP

THEOREM (NO-FREE-LUNCH)

Let A be any learning algorithm for the task of binary classification with respect to the0− 1 loss over a domain X . Let m be any number smaller than |X |/2, representing atraining set size. Then, there exists a distribution D over X × {0, 1} such that:� There exists a function f : X → {0, 1} with LD(f) = 0.

� P(LD(A(S)) ≥1

8) ≥ 1

7

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 4 / 40

Page 5: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

1 RECAP

2 MOTIVATION

3 THE VC-DIMENSIONDefinitionsExamples

4 THE FUNDAMENTAL THEOREM OF LEARNING THEORY

5 ADVANCED TOPICSGlivenko-Cantelli TheoremVC-entropy and Growth Function

6 EXERCISES AND DISCUSSION

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 5 / 40

Page 6: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

LEARNING FROM INFINITE-SIZE HYPOTHESIS CLASS

In chapter 2, we see that every finite hypothesis class H is learnable; moreover, thesample complexity is bounded by

mH(δ, ε) ≤log(|H|)/δ

ε

So, what if |H| =∞?

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 6 / 40

Page 7: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

EXAMPLE: CONCENTRIC CIRCLE

EXAMPLE

Let X = R2, Y = {0, 1}, and let H be the class of concentric circles in the plane, that is,H = {hr : r ∈ R+}. Prove that H is PAC learnable (assume realizability), and its samplecomplexity is bounded by

mH(ε, δ) ≤log(2δ)

ε.

First, we specify HB.By definition, if h ∈ HB, we have D(h(x) 6= h∗(x)) ≥ ε

h∗ h

X

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 7 / 40

Page 8: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

EXAMPLE: CONCENTRIC CIRCLE

Equivalently, D(h(x) = h∗(x)) ≤ 1− ε

h∗ h

X

If now H is finite, we can apply union bound:

Dm(⋃

h∈HB

∀i = [m]|h(xi) = h∗(xi)) ≤ |H|(1− ε)m ≤ δ

We can slightly modify the union bound for the case |H| =∞.

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 8 / 40

Page 9: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

EXAMPLE: CONCENTRIC CIRCLE

h∗

hr1

X

h∗hr2

X

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 9 / 40

Page 10: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

EXAMPLE: CONCENTRIC CIRCLE

h∗

X

h∗

X

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 10 / 40

Page 11: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

EXAMPLE: CONCENTRIC CIRCLE

h∗

X

h∗

X

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 11 / 40

Page 12: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

EXAMPLE: CONCENTRIC CIRCLE

Let S ∼ Dm, and rmin = minx∈S

rx, rmax = maxx∈S

rx.

We havePS∼Dm(LD(hS) ≥ ε) ≤ PS∼Dm(rmin ≥ r1 ∪ rmax ≤ r0)

≤ PS∼Dm(rmin ≥ r1) + PS∼Dm(rmax ≤ r0)≤ 2(1− ε)m ≤ 2e−mε ≤ δ

Therefore, for all ε and δ, the sample complexity can be bounded by

m ≤ log(2/δ)

ε

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 12 / 40

Page 13: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

� In chapter 2, we see that every finite hypothesis class H is learnable; moreover, thesample complexity is bounded by

mH(δ, ε) ≤log(|H|)/δ

ε

� Also, we see some examples that even the class is infinite-size, it may still belearnable.

� Therefore, we need a measure of H’s complexity

� In this cahpter, we will formally define the complexity of H (VC dimension) ,and show that

H has uniform convergence property ⇐⇒ VCdim(H) <∞

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 13 / 40

Page 14: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

1 RECAP

2 MOTIVATION

3 THE VC-DIMENSIONDefinitionsExamples

4 THE FUNDAMENTAL THEOREM OF LEARNING THEORY

5 ADVANCED TOPICSGlivenko-Cantelli TheoremVC-entropy and Growth Function

6 EXERCISES AND DISCUSSION

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 14 / 40

Page 15: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

THE VC-DIMENSION

DEFINITION (RESTRICTION H TO C)

Let H be a class of function from X to {0, 1} and let C = {c1, ..., cm} ⊂ X . The restrictionof H to C is the set off all functions from C to {0, 1} that can be derived from H. That is,

HC = {h(c1), ..., h(cm)) : h ∈ H

DEFINITION (SHATTERING)

A hypothesis class H shatters a finite set C if the restriction of H to C is the set of allfunctions from C to {0, 1}. That is,|HC | = 2|C|.

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 15 / 40

Page 16: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

THE VC-DIMENSION

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 16 / 40

Page 17: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

THE VC-DIMENSION

COROLLARY (COROLLARY6.4)

Let H be a hypothesis class of functions from X to {0, 1}. Let m be a training set size.Assume that there exists a set C of size 2m that is shattered by H.Then, for any learning algorithm A

P(LD(A(S)) ≥1

8) ≥ 1

7

Remark: This is a direct result from NFL Theorem

Remark2: If for all m, there exists a set C of size 2m that is shattered by H, then H is notPAC learnable

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 17 / 40

Page 18: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

THE VC-DIMENSION

DEFINITION (VC-DIMENSION)

The VC-dimension of a hypothesis class H, denoted VCdim(H), is the maximal size of aset C ⊂ X that can be shattered by H. If H can shatter sets of arbitrarily large size, wesay that H has infinite VC-dimension.

Remark: If VCdim(H)=d, it means that

∃C ⊂ X that can be shattered by H,

NOT∀C ⊂ X that can be shattered by H,

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 18 / 40

Page 19: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

EXAMPLE: THRESHOLD FUNCTIONS

� Let H be the all threshold function on R.

� For an arbitrary set C = {c1}, H shatters C,therefore VDdim(H) ≥ 1.

� For an arbitrary set C = {c1, c2} , H does not shatter C.Therefore, VDdim(H) < 2.

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 19 / 40

Page 20: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

EXAMPLE: INTERVALS

� Let H be the intervals over R; that is, H = {1[a,b](x)|a, b ∈ R}

� It is easy to show that VCdim(H) = 2

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 20 / 40

Page 21: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

EXAMPLE: AXIS ALIGNED RECTANGLES

� Let H be the the class of axis aligned rectangles:

H = {1[a,b]×[c,d]|a, b, c, d ∈ R}

� VDdim(H) = 4

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 21 / 40

Page 22: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

EXAMPLE: AXIS ALIGNED RECTANGLES

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 22 / 40

Page 23: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

1 RECAP

2 MOTIVATION

3 THE VC-DIMENSIONDefinitionsExamples

4 THE FUNDAMENTAL THEOREM OF LEARNING THEORY

5 ADVANCED TOPICSGlivenko-Cantelli TheoremVC-entropy and Growth Function

6 EXERCISES AND DISCUSSION

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 23 / 40

Page 24: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

THE FUNDAMENTAL THEOREM OF LEARNING THEORY

THEOREM (THE FUNDAMENTAL THEOREM OF STATISTICAL LEARNING)

Let H be a hypothesis class of functions from a domain X to {0, 1} and let the lossfunction be the 0-1 loss. Then, the following are quivalent:

1 H has the uniform covergence property.2 Any ERM rule is a successful agnostic PAC learner for H.3 H is agnostic PAC learnable.4 H is PAC learnable.5 Any ERM rule is a successful PAC learner for H.6 H has a finite VC-dimension.

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 24 / 40

Page 25: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

GROWTH FUNCTION ANS SAUER’S LEMMA

The growth function measures the maximal ”effective? size of H on a set of m examples.

DEFINITION (GROWTH FUNCTION)

Let H be a hypothesis class. Then the growth function of H is defined as

τH(m) = supC⊂X :|C|=m

|HC |

In words, τH(m) is the number of different functions from a set C of size m to {0, 1} thatcan be obtained by restricting H to C.

Remark: if VCdim(H) = d, then for any m ≤ d we have τH(m) = 2m.However, what interesting is the case m ≥ d.

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 25 / 40

Page 26: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

GROWTH FUNCTION AND SAUER’S LEMMA

THEOREM (SAUER’S LEMMA)

Let H be a hypothesis with VCdim(H) = d. Then for all m,

τH(m) ≤d∑i=0

(m

i

)≤ (em/d)d

Remark: if VCdim(H) is finite, then the growth function is polynomial in m.

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 26 / 40

Page 27: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

UNIFORM CONVERGENCE IN VC CLASS

THEOREM (UNIFORM CONVERGES IN VC CLASS (THEOREM 6.11))

Let H be a class and let τH(m) be its growth function. Then, for every D and every δ

PS∼Dm(|LD(h)− LS(h)| > ε) ≤ δ

where ε can be choose as4 +

√(log(τH(2m)))

δ√2m

. In other words, this theorem tells us that

VCdim(H) <∞⇐⇒ lim`→∞

log τH(`)

`= 0⇐⇒ uniform convergence property holds.

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 27 / 40

Page 28: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

THE FUNDAMENTAL THEOREM OF LEARNING THEORY

THEOREM (THE FUNDAMENTAL THEOREM-QUANTITATIVE VERSION)

Let H be a hypothesis class from a domain X to {0, 1} and let the loss function be the 0-1loss. Then, there are absolute constants C1, C2 such that:

1 H has the uniform covergence property with sample complexity

C1d + log(1/δ)

ε2≤ m

UCH ≤ C2

d + log(1/δ)

ε2

2 H is agnostic PAC learnable with sample complexity

C1d + log(1/δ)

ε2≤ mH ≤ C2

d + log(1/δ)

ε2

3 H is PAC learnable with sample complexity

C1d + log(1/δ)

ε≤ mH ≤ C2

d log(1/ε) + log(1/δ)

ε

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 28 / 40

Page 29: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

1 RECAP

2 MOTIVATION

3 THE VC-DIMENSIONDefinitionsExamples

4 THE FUNDAMENTAL THEOREM OF LEARNING THEORY

5 ADVANCED TOPICSGlivenko-Cantelli TheoremVC-entropy and Growth Function

6 EXERCISES AND DISCUSSION

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 29 / 40

Page 30: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

GLIVENKO-CANTELLI THEOREM

DEFINITION (EMPIRICAL DISTRIBUTION)

Let X1, ..., Xn be i.i.d. random variables in R with common cdf F (x). The empiricaldistribution function for X1, ..., Xn is given by

Fn(x) =1

n

∑i

1(−∞,x](Xi)

THEOREM (GLIVENKO-CANTELLI THEOREM)

‖Fn − F‖∞ = supx∈R|Fn(x)− F (x)| → 0 almost surely

Remark: ∀x, Fn(x)→ F (x) trivially by LLNWEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 30 / 40

Page 31: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

GLIVENKO-CANTELLI THEOREM

� More generally, consider a space X and σ−field F generated by borel set withprobability measure P and empirical measure Pn

� Then F (x) = P ((−∞, x]), and Fn(x) = Pn((−∞, x])

� We can rewrite supx∈R|Fn(x)− F (x)| as sup

c∈C|Pn(C)− P (C)|,

where C = {(−∞, x]|x ∈ R}

� What happens for the general C ?

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 31 / 40

Page 32: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

GLIVENKO-CANTELLI CLASS

DEFINITION (GC-CLASS)

Let C ⊂ {C|C measurable in X}. Then if

‖Pn − P‖C = supc∈C|Pn(C)− P (C)|

we say class C is a Glivenko-Cantelli class.

DEFINITION (UNIFORMLY GC)

A class is called uniformly Glivenko-Cantelli if the convergence occurs uniformly over allprobability measures P on (X ,F):

supP∈P(S,A)

E‖Pn − P‖C → 0

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 32 / 40

Page 33: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

VC CLASS

DEFINITION (VC CLASS)

A class with finite VC dimension is called a Vapnik-Chervonenkis class or VC class

THEOREM (VAPNIK AND CHERVONENKIS, 1968)

A class of sets Cis uniformly GC if and only if it is a Vapnik-Chervonenkis class

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 33 / 40

Page 34: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

1 RECAP

2 MOTIVATION

3 THE VC-DIMENSIONDefinitionsExamples

4 THE FUNDAMENTAL THEOREM OF LEARNING THEORY

5 ADVANCED TOPICSGlivenko-Cantelli TheoremVC-entropy and Growth Function

6 EXERCISES AND DISCUSSION

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 34 / 40

Page 35: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

VC ENTROPY AND GROWTH FUNCTION

In previous lecture, we define the growth function as

τH(m) = supC⊂X :|C|=m

|HC |

Let’s rewrite the growth function as another form:

DEFINITION (VAPNIK)

Let x1, ...xm be m samples from X . Then define the number

NH(x1, ..., xm) = | {h(x1), ..., h(xm)|h ∈ H} | = |H{x1,...,xm}|

ObviouslyτH(m) = sup

C⊂X :|C|=m|HC | = sup

{x1,...,xm}⊂HNH(x1, ..., xm)

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 35 / 40

Page 36: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

VC ENTROPY AND GROWTH FUNCTION

The supremum is taken so that the bound works even in the worst distribution.In general case, we can replace supremum by exapectation w.r.t. a specific distribution,which gives another value:

DEFINITION (VC-ENTROPY, ANNEALED VC-ENTROPY, GROWTH FUNCTION)

Let NH(x1, ..., xm) be defined as previous. Then we defined VC-entropy as

HH(m) = E logNH(x1, ..., xm)

The annealed VC-entropy as

HHann(m) = logENH(x1, ..., xm)

And the growth function (with logarithm) as

GH(m) = log supx1,...,xm

NH(x1, ..., xm)(= log τH(m))

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 36 / 40

Page 37: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

VC ENTROPY AND GROWTH FUNCTION

COROLLARY

HH(m) ≤ HHann(m) ≤ GH(m)

Rmark: The fundamental theorem of learning theory tells us

limm→∞

GH(m)

m= lim

m→∞

log τH(m)

m= 0⇐⇒ lim

m→∞P(sup

h∈H|LS(h)− LD(h)| > ε) = 0

The convergence is uniform for all distribution P in (X ,F)

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 37 / 40

Page 38: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

THREE MILESTONES OF LEARNING THEORY

THEOREM (VAPNIK)

1

limm→∞

D(suph∈H|LS(h)− LD(h)| > ε) = 0

is sufficient and necessary that

limm→∞

HH(m)

m= 0

2

limm→∞

D(suph∈H|LS(h)− LD(h)| > ε) ≤ e−cε2m ( fast decay )

is sufficient if

limm→∞

HHann(m)

m= 0

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 38 / 40

Page 39: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

THREE MILESTONES OF LEARNING THEORY

THEOREM (VAPNIK)

3

limm→∞

P (suph∈H|LS(h)− LD(h)| > ε) = 0 ,for all P ∈ (X ,F)

is sufficient and necessary that

limm→∞

GH(m)

m= 0

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 39 / 40

Page 40: Introduction to the VC-Dimension - GitHub Pagesweiningchen.github.io/notes/VC_dim.pdf · OUTLINE 1 RECAP 2 MOTIVATION 3 THE VC-DIMENSION Definitions Examples 4 THE FUNDAMENTAL THEOREM

1 RECAP

2 MOTIVATION

3 THE VC-DIMENSIONDefinitionsExamples

4 THE FUNDAMENTAL THEOREM OF LEARNING THEORY

5 ADVANCED TOPICSGlivenko-Cantelli TheoremVC-entropy and Growth Function

6 EXERCISES AND DISCUSSION

WEI-NING CHEN INTRODUCTION TO THE VC-DIMENSION DECEMBER 28, 2018 40 / 40