dynamics of learning vq and neural gas aree witoelar, michael biehl mathematics and computing...

27
Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration with Barbara Hammer (Clausthal), Anarta Ghosh (Groningen)

Upload: dylan-atkinson

Post on 17-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dynamics of Learning VQand Neural Gas

Aree Witoelar, Michael BiehlMathematics and Computing ScienceUniversity of Groningen, Netherlands

in collaboration with Barbara Hammer (Clausthal), Anarta Ghosh (Groningen)

Page 2: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

Outline

Vector Quantization (VQ)

Analysis of VQ Dynamics

Learning Vector Quantization (LVQ)

Summary

Page 3: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

Vector Quantization

Objective:representation of (many) data with (few) prototype vectors

Assign data ξμ to nearest prototype vector wj

(by a distance measure, e.g. Euclidean)

grouping data into clusters e.g. for classification

P

j ),d(E(W)

data distance to nearest prototype

Find optimal set W for lowest quantization error

Page 4: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

Example: Winner Takes All (WTA)

• initialize K prototype vectors

• present a single example

• identify the closest prototype, i.e the so-called winner • move the winner even closer towards the example

• stochastic gradient descent with respect to a cost function

• prototypes at areas with high density of data

Page 5: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

Problems

Winner Takes All “winner takes most”: update according to “rank”e.g. Neural Gas

sensitive to initialization

less sensitive to initialization?

Page 6: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

(L)VQ algorithms

• intuitive

• fast, powerful algorithms

• flexible

• limited theoretical background w.r.t. convergence speed, robustness to initial conditions, etc.

Analysis of VQ Dynamics

• exact mathematical description in very high dimensions

• study of typical learning behavior

Page 7: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

Model: two Gaussian clusters of high dimensional data

Random vectors ξ ∈ ℝN according to

),(σ)P(

σ)P(p )P(

σσ

1σσ

ξξ

Ν

prior prob.: p+, p-

p+ + p- = 1

B+

B-

(p+)

(p-)

separable in projectionto (B+ , B-) plane

(p+)

(p-)

not separable on other planes

cluster centers: B+, B- ∈ ℝN

variance: υ+, υ-

separation ℓ

only separable in 2 dimensions simple model, but not trivial

classes: σ = {+1,-1}

Page 8: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

sequence of independent random data P...,1,2,3,μ},{ μμ ξ

learning rate,step size

strength,direction ofupdate etc.

move prototypetowards currentdata

1-μs

μμsss

1-μs

μs ... ,σ,c,rankf

N

ηwξww

K,...,2,1rank

1σ,

s

sc

update of prototype vector

prototypeclass

data class

“winner”

fs […] describes the algorithm used

Online learning

ws ∈ ℝN

Page 9: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

μt

μs

μstσ

μs

μsσ QBR www projections tocluster centers

length and overlapof prototypes

1. Define few characteristic quantities of the system

μstQ μ

sσR

1-μs

μμsss

1-μs

μs ... ,σ,c,rankf

N

ηwξww

μμμ1-μs

μs ξBbh ξwrandom vector ξμ enters as projections

1-μsσ

μσs

1-μsσ

μsσ Rb...fη)R(R N

N

N

/1Ο...f...fη

Qh...fηQh...fη)QQ(

ts2

1-μst

μst

1-μst

μts

1-μst

μst

2. Derive recursion relations of the quantities for new input data

1,1 σ

},...,2,1{t s,

K

3. Calculate average recursions

Page 10: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

In the thermodynamic limit N∞ ...

• characteristic quantities • self average w.r.t. random sequence of data (fluctuations vanish)

μstQ μ

sσR

• the projections• become correlated Gaussian quantities completely specified in terms of first and second moments:

sσσs R h st σtσsσt s Q hh- hh

sσσsσ s R bh- bh

bb- bbσσσ

else0

σ if b

σ

• define continuous learning time N

μ t μ : discrete (1,2,…,P)

t : continuous

Page 11: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

4. Derive ordinary differential equations

1-μsσ

μσs

sσ Rb...fηdR

dt

...f...fη

Qh...fηQh...fηQ

ts2

1-μst

μst

1-μst

μts

st

dt

d

5. Solve for Rsσ(t), Qst(t)• dynamics/asymptotic behavior (t ∞)• quantization/generalization error • sensitivity to initial conditions, learning rates, structure of data

Page 12: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

Nt /

Q11

Q22

Q12

ResultsVQ 2 prototypes

Nt /

R1+

R2-

R2+R1-

1-μs

μμs

μ1-μs

μs dd

N

ηwξww

jsj

ws winner

Numerical integration of the ODEs

(ws(0)≈0 p+=0.6, ℓ=1.0, υ+=1.5, υ-

=1.0, =0.01)

E(W)

t

characteristic quantities

quantization error

Page 13: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

B+

B-

RS+

RS-

2 prototypes

Projections of prototypes on the B+,B- plane at t=50

RS+

RS-

p+ > p-

Two prototypes move to the stronger cluster

3 prototypes

Page 14: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

Neural Gas: a winner take most algorithm3 prototypes

1-μs

μs1-μs

μs )

)(

r(exp

1

N

ηwξww

tC

update strength decreases exponentially by rank

RS+

RS-

quantizationerror E(W)

t

λi=2; λf=10-2

λ(t) large initially,decreased over time

λ(t)0: identical to WTA

t=0 t=50

Page 15: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

Sensitivity to initialization

at t=50

t=0

Neural GasWTA

at t=50RS+

RS-

RS+

RS-

Neural Gas:• more robust w.r.t. initialization

WTA:• (eventually) reaches minimum E(W)• depends on initialization: possible large learning timeE(W)

t

“plateau”∇HVQ≈0

Page 16: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

Learning Vector Quantization (LVQ)

Objective:classification of data using prototype vectors

Find optimal set W for lowest generalization error

else

cccg

0

1 ),(;),(

jjj

misclassified by nearest prototype

Assign data {ξ,σ}; ξ ∈ ℝN to nearest prototype vector(distance measure, e.g. Euclidean)

NRss

s w};c,{w

Page 17: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

1-μs

μμs

μ1-μs

μs dd

N

ηwξww

jsj

sc

ws winner ±1

LVQ1

c={+1, -1}

RS+

RS-

two prototypes

c={+1,+1,-1}

RS+

RS-

three prototypes

c={+1,-1,-1}

RS+

RS-

which class to add the 3rd prototype?

update winner towards/ away from data

no cost function related to generalization error

Page 18: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

Generalization error

p+=0.6, p-= 0.4υ+=1.5, υ-=1.0

εg

t

class

1

K K

g ),(ddp

s

ssjsj

ε

misclassified data

Page 19: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

Optimal decision boundary

B+

B-

(p+>p- )

(p-)

d

equal variance (υ+=υ-):linear decision boundary

unequal variance υ+>υ-

K=2

optimal with K=3

more prototypes better approximation to optimal decision boundary

(hyper)plane where

1)σP( p 1)σP( p ξξ

Page 20: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

Asymptotic εg

p+

• Optimal: K=3 better• LVQ1: K=3 better

• best: more prototypes on the class with the larger variance

• more prototypes not always better for LVQ1

υ+ >υ- (υ+=0.81, υ- =0.25)

εg(t∞)

c={+1,+1,-1}

p+

εg

c={+1,-1,-1}

• Optimal: K=3 equal to K=2• LVQ1: K=3 worse

εg(t∞)

Page 21: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

Summary

dynamics of (Learning) Vector Quantization for high dimensional data

Neural Gas: more robust w.r.t. initialization than WTA LVQ1: more prototypes not always better

Outlook

study different algorithms e.g. LVQ+/-, LFM, RSLVQ more complex models multi-prototype, multi-class problems

ReferenceDynamics and Generalization Ability of LVQ Algorithms M . Biehl, A . Ghosh, and B . Hammer Journal of Machine Learning Research (8 ): 323-360 (2007 )http://jmlr.csail.mit.edu/papers/v8/biehl07a.html

Page 22: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Questions?

Page 23: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

Page 24: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

Central Limit Theorem

•Let x1, x2,…, xN be independent random numbers from arbitrary probability distribution with mean and finite variance

•The distribution of the average of xj approaches a normal distribution as N becomes large.

N=1

N=2 N=5 N=50

Example: non-normal distribution

Distribution of average of xj:

p(xj)

N

1jjx

N

1p

Page 25: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

Self Averaging

Monte Carlo simulations over 100 independent runs

Fluctuations decreases with larger degree of freedom N

At N∞, fluctuations vanish (variance becomes zero)

Page 26: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

“LVQ +/-” t}s,{j;N

η 1-μj

μ1-μj

μj wξww jc

ds = min {dk} with cs = σμ

update correct and incorrect winners

dt = min {dk} with ct ≠σμ

t

t

strongly divergent!

p+ >> p- : strong repulsion by stronger class

to overcome divergence: e.g. early stopping (difficult in practice)

stop at εg(t)=εg,min

εg(t)

Page 27: Dynamics of Learning VQ and Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration

Dagstuhl Seminar, 25.03.2007

Comparison LVQ1 and LVQ +/-

υ+ = υ- =1.0

LVQ1 outperforms LVQ+/- with early stopping

c={+1,+1,-1}

p+

υ+ = 0.81, υ- =0.25

LVQ+/- with early stopping outperforms LVQ1 in a certain p+ interval

p+

LVQ+/- performance depends on initial conditions