dynamics and generalization of lvq, birmingham, 09-12- 05 3) vector quantization (vq) and learning...
TRANSCRIPT
Dynamics and generalization of LVQ Birmingham 09-12- 05
3) Vector Quantization (VQ)
and Learning Vector Quantization (LVQ)
References
M Biehl A Freking G ReentsDynamics of on-line competitive learningEurophysics Letters 38 (1997) 73-78
M Biehl A Ghosh B HammerDynamics and generalization ability of LVQ algorithmsJ Machine Learning Research 8 (2007) 323-360
and references in the latter
Dynamics and generalization of LVQ Birmingham 09-12- 05
Vector Quantization (VQ)
aim
representation of large amounts
of data by (few) prototype vectors
example
identification and grouping
in clusters of similar data
assignment of feature vector to the closest prototype w
(similarity or distance measure
eg Euclidean distance )
Dynamics and generalization of LVQ Birmingham 09-12- 05
unsupervised competitive learning
bull initialize K prototype vectors
bull present a single example
bull identify the closest prototype ie the so-called winner
bull move the winner even closer towards the example
intuitively clear plausible procedure
- places prototypes in areas with high density of data
- identifies the most relevant combinations of features
- (stochastic) on-line gradient descent with respect to
the cost function
Dynamics and generalization of LVQ Birmingham 09-12- 05
quantization error
μj
μk
K
jk
P
1μj
μK
1jVQ ddΘ
2 wξH
μjdprototypes data wj is the winner
here
Euclidean distance
aim faithful representation (in general ne clustering )
Result depends on - the number of prototype vectors - the distance measure metric used
Dynamics and generalization of LVQ Birmingham 09-12- 05
bull identify the closest prototype ie the so-called winner
bull initialize prototype vectors for different classes
bull present a single example
bull move the winner - closer towards the data (same class)
- away from the data (different class)
classification
assignment of a vector to the class of the closest
prototype w
aim generalization ability
classification of novel data
after learning from examples
∙ identification of prototype vectors from labelled example data
∙ distance based classification (eg Euclidean Manhattan hellip)
basic heuristic LVQ scheme LVQ1 [Kohonen]
piecewise linear decision boundaries
Learning Vector Quantization
(t)wξ(t)w1tw (t)w
η
N-dimfeature space
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ algorithms
- frequently applied in a variety
of practical problems
- plausible intuitive flexible
- fast easy to implement
- often based on heuristic arguments
or cost functions with unclear relation to generalization
- limited theoretical understanding of
- dynamics and convergence properties
- achievable generalization ability
here analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- typical properties in a model situation
Dynamics and generalization of LVQ Birmingham 09-12- 05
Model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σ
σN2
σ
- v 2
1exp
v 2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
cluster distance prop ℓ ℓ
jj Bσσξ
σσσvξξ
22jj
indep components with
and variance
ℝN
Dynamics and generalization of LVQ Birmingham 09-12- 05
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=144 v-=064μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamics and generalization of LVQ Birmingham 09-12- 05
Dynamics of on-line training
sequence of new independent random examples 123μσμμ ξ
drawn according to μμσ σPp μ ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
example
LVQ1 original formulation [Kohonen]
Winner-Takes-All (WTA) algorithm
μs
μs
μs d d σS f
1-μs
μμμs-
μss
1-μs
μs σSddf
N
ηwξww 21
μs
μμs
μ
d
1σS
wξ
update of two prototype vectors w+ w-
Dynamics and generalization of LVQ Birmingham 09-12- 05
algorithm recursions
Mathematical analysis of the learning dynamics
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections into the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
2 average over the current example
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
completely specified in terms of first and second moments (wo indices μ)
sσσ
N
1jjsσs R x
jw stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
Dynamics and generalization of LVQ Birmingham 09-12- 05
averaged recursions closed in p σ1σ
σ
μsσ
μst R Q
- depend on the random sequence of example data
- their fluctuations vanish with N
learning dynamics is completely described in terms of averages
3 self-averaging property of characteristic quantities
μsσ
μst R Q
1N
(mean and variance)
R++ (α=10) computer simulations (LVQ1)
- mean results approach theoretical prediction- variance vanishes as N
Dynamics and generalization of LVQ Birmingham 09-12- 05
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst integration yields evolution of projections
stochastic recursions deterministic ODE
probability for misclassification of a novel example
ddpddp gε
]2
]
]2
][
2
1
2
1
QQ[Qv
R[R2QQ
QQ[Q v
RR2QQpp
5 learning curve
generalization error εg(α) after training with α N examples
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
w-
w+
ℓ B-
ℓ B+
RS- w+
RS+
Trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary ____ asymptotic position
initialization ws(0)asymp0
theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
tsst
σssσ
Q
BR
ww
w
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ 21 [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)
p+=08 ℓ=1 v+=v-=1 =05
averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification for αinfin
εg = min p+p- RS+
RS-
Dynamics and generalization of LVQ Birmingham 09-12- 05
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- here lowest minimum value reached for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning From Mistakes (LFM)
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10
Dynamics and generalization of LVQ Birmingham 09-12- 05
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
― trivial classification
Dynamics and generalization of LVQ Birmingham 09-12- 05
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
Dynamics and generalization of LVQ Birmingham 09-12- 05
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
Dynamics and generalization of LVQ Birmingham 09-12- 05
Summary
bulla model scenario of LVQ training
two clusters two prototypes
dynamics of online training
bullcomparison of algorithms (within the model)
LVQ 1 original formulation of LVQ
with close to optimal asymptotic generalization
LVQ 21 intuitive extension creates instability
trivial (stationary) classification
+ stopping potentially good performance
practical difficulties depends on initialization
LFM crisp limit of Soft Robust LVQ stable behavior
far from optimal generalization
VQ description of in-class competition
Dynamics and generalization of LVQ Birmingham 09-12- 05
Outlook
bullSelf-Organizing Maps (SOM)
neighborhood preserving SOM Neural Gas (distance rank based)
bull Generalized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Dynamics and generalization of LVQ Birmingham 09-12- 05
Vector Quantization (VQ)
aim
representation of large amounts
of data by (few) prototype vectors
example
identification and grouping
in clusters of similar data
assignment of feature vector to the closest prototype w
(similarity or distance measure
eg Euclidean distance )
Dynamics and generalization of LVQ Birmingham 09-12- 05
unsupervised competitive learning
bull initialize K prototype vectors
bull present a single example
bull identify the closest prototype ie the so-called winner
bull move the winner even closer towards the example
intuitively clear plausible procedure
- places prototypes in areas with high density of data
- identifies the most relevant combinations of features
- (stochastic) on-line gradient descent with respect to
the cost function
Dynamics and generalization of LVQ Birmingham 09-12- 05
quantization error
μj
μk
K
jk
P
1μj
μK
1jVQ ddΘ
2 wξH
μjdprototypes data wj is the winner
here
Euclidean distance
aim faithful representation (in general ne clustering )
Result depends on - the number of prototype vectors - the distance measure metric used
Dynamics and generalization of LVQ Birmingham 09-12- 05
bull identify the closest prototype ie the so-called winner
bull initialize prototype vectors for different classes
bull present a single example
bull move the winner - closer towards the data (same class)
- away from the data (different class)
classification
assignment of a vector to the class of the closest
prototype w
aim generalization ability
classification of novel data
after learning from examples
∙ identification of prototype vectors from labelled example data
∙ distance based classification (eg Euclidean Manhattan hellip)
basic heuristic LVQ scheme LVQ1 [Kohonen]
piecewise linear decision boundaries
Learning Vector Quantization
(t)wξ(t)w1tw (t)w
η
N-dimfeature space
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ algorithms
- frequently applied in a variety
of practical problems
- plausible intuitive flexible
- fast easy to implement
- often based on heuristic arguments
or cost functions with unclear relation to generalization
- limited theoretical understanding of
- dynamics and convergence properties
- achievable generalization ability
here analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- typical properties in a model situation
Dynamics and generalization of LVQ Birmingham 09-12- 05
Model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σ
σN2
σ
- v 2
1exp
v 2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
cluster distance prop ℓ ℓ
jj Bσσξ
σσσvξξ
22jj
indep components with
and variance
ℝN
Dynamics and generalization of LVQ Birmingham 09-12- 05
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=144 v-=064μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamics and generalization of LVQ Birmingham 09-12- 05
Dynamics of on-line training
sequence of new independent random examples 123μσμμ ξ
drawn according to μμσ σPp μ ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
example
LVQ1 original formulation [Kohonen]
Winner-Takes-All (WTA) algorithm
μs
μs
μs d d σS f
1-μs
μμμs-
μss
1-μs
μs σSddf
N
ηwξww 21
μs
μμs
μ
d
1σS
wξ
update of two prototype vectors w+ w-
Dynamics and generalization of LVQ Birmingham 09-12- 05
algorithm recursions
Mathematical analysis of the learning dynamics
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections into the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
2 average over the current example
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
completely specified in terms of first and second moments (wo indices μ)
sσσ
N
1jjsσs R x
jw stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
Dynamics and generalization of LVQ Birmingham 09-12- 05
averaged recursions closed in p σ1σ
σ
μsσ
μst R Q
- depend on the random sequence of example data
- their fluctuations vanish with N
learning dynamics is completely described in terms of averages
3 self-averaging property of characteristic quantities
μsσ
μst R Q
1N
(mean and variance)
R++ (α=10) computer simulations (LVQ1)
- mean results approach theoretical prediction- variance vanishes as N
Dynamics and generalization of LVQ Birmingham 09-12- 05
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst integration yields evolution of projections
stochastic recursions deterministic ODE
probability for misclassification of a novel example
ddpddp gε
]2
]
]2
][
2
1
2
1
QQ[Qv
R[R2QQ
QQ[Q v
RR2QQpp
5 learning curve
generalization error εg(α) after training with α N examples
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
w-
w+
ℓ B-
ℓ B+
RS- w+
RS+
Trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary ____ asymptotic position
initialization ws(0)asymp0
theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
tsst
σssσ
Q
BR
ww
w
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ 21 [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)
p+=08 ℓ=1 v+=v-=1 =05
averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification for αinfin
εg = min p+p- RS+
RS-
Dynamics and generalization of LVQ Birmingham 09-12- 05
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- here lowest minimum value reached for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning From Mistakes (LFM)
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10
Dynamics and generalization of LVQ Birmingham 09-12- 05
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
― trivial classification
Dynamics and generalization of LVQ Birmingham 09-12- 05
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
Dynamics and generalization of LVQ Birmingham 09-12- 05
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
Dynamics and generalization of LVQ Birmingham 09-12- 05
Summary
bulla model scenario of LVQ training
two clusters two prototypes
dynamics of online training
bullcomparison of algorithms (within the model)
LVQ 1 original formulation of LVQ
with close to optimal asymptotic generalization
LVQ 21 intuitive extension creates instability
trivial (stationary) classification
+ stopping potentially good performance
practical difficulties depends on initialization
LFM crisp limit of Soft Robust LVQ stable behavior
far from optimal generalization
VQ description of in-class competition
Dynamics and generalization of LVQ Birmingham 09-12- 05
Outlook
bullSelf-Organizing Maps (SOM)
neighborhood preserving SOM Neural Gas (distance rank based)
bull Generalized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Dynamics and generalization of LVQ Birmingham 09-12- 05
unsupervised competitive learning
bull initialize K prototype vectors
bull present a single example
bull identify the closest prototype ie the so-called winner
bull move the winner even closer towards the example
intuitively clear plausible procedure
- places prototypes in areas with high density of data
- identifies the most relevant combinations of features
- (stochastic) on-line gradient descent with respect to
the cost function
Dynamics and generalization of LVQ Birmingham 09-12- 05
quantization error
μj
μk
K
jk
P
1μj
μK
1jVQ ddΘ
2 wξH
μjdprototypes data wj is the winner
here
Euclidean distance
aim faithful representation (in general ne clustering )
Result depends on - the number of prototype vectors - the distance measure metric used
Dynamics and generalization of LVQ Birmingham 09-12- 05
bull identify the closest prototype ie the so-called winner
bull initialize prototype vectors for different classes
bull present a single example
bull move the winner - closer towards the data (same class)
- away from the data (different class)
classification
assignment of a vector to the class of the closest
prototype w
aim generalization ability
classification of novel data
after learning from examples
∙ identification of prototype vectors from labelled example data
∙ distance based classification (eg Euclidean Manhattan hellip)
basic heuristic LVQ scheme LVQ1 [Kohonen]
piecewise linear decision boundaries
Learning Vector Quantization
(t)wξ(t)w1tw (t)w
η
N-dimfeature space
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ algorithms
- frequently applied in a variety
of practical problems
- plausible intuitive flexible
- fast easy to implement
- often based on heuristic arguments
or cost functions with unclear relation to generalization
- limited theoretical understanding of
- dynamics and convergence properties
- achievable generalization ability
here analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- typical properties in a model situation
Dynamics and generalization of LVQ Birmingham 09-12- 05
Model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σ
σN2
σ
- v 2
1exp
v 2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
cluster distance prop ℓ ℓ
jj Bσσξ
σσσvξξ
22jj
indep components with
and variance
ℝN
Dynamics and generalization of LVQ Birmingham 09-12- 05
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=144 v-=064μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamics and generalization of LVQ Birmingham 09-12- 05
Dynamics of on-line training
sequence of new independent random examples 123μσμμ ξ
drawn according to μμσ σPp μ ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
example
LVQ1 original formulation [Kohonen]
Winner-Takes-All (WTA) algorithm
μs
μs
μs d d σS f
1-μs
μμμs-
μss
1-μs
μs σSddf
N
ηwξww 21
μs
μμs
μ
d
1σS
wξ
update of two prototype vectors w+ w-
Dynamics and generalization of LVQ Birmingham 09-12- 05
algorithm recursions
Mathematical analysis of the learning dynamics
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections into the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
2 average over the current example
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
completely specified in terms of first and second moments (wo indices μ)
sσσ
N
1jjsσs R x
jw stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
Dynamics and generalization of LVQ Birmingham 09-12- 05
averaged recursions closed in p σ1σ
σ
μsσ
μst R Q
- depend on the random sequence of example data
- their fluctuations vanish with N
learning dynamics is completely described in terms of averages
3 self-averaging property of characteristic quantities
μsσ
μst R Q
1N
(mean and variance)
R++ (α=10) computer simulations (LVQ1)
- mean results approach theoretical prediction- variance vanishes as N
Dynamics and generalization of LVQ Birmingham 09-12- 05
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst integration yields evolution of projections
stochastic recursions deterministic ODE
probability for misclassification of a novel example
ddpddp gε
]2
]
]2
][
2
1
2
1
QQ[Qv
R[R2QQ
QQ[Q v
RR2QQpp
5 learning curve
generalization error εg(α) after training with α N examples
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
w-
w+
ℓ B-
ℓ B+
RS- w+
RS+
Trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary ____ asymptotic position
initialization ws(0)asymp0
theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
tsst
σssσ
Q
BR
ww
w
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ 21 [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)
p+=08 ℓ=1 v+=v-=1 =05
averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification for αinfin
εg = min p+p- RS+
RS-
Dynamics and generalization of LVQ Birmingham 09-12- 05
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- here lowest minimum value reached for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning From Mistakes (LFM)
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10
Dynamics and generalization of LVQ Birmingham 09-12- 05
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
― trivial classification
Dynamics and generalization of LVQ Birmingham 09-12- 05
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
Dynamics and generalization of LVQ Birmingham 09-12- 05
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
Dynamics and generalization of LVQ Birmingham 09-12- 05
Summary
bulla model scenario of LVQ training
two clusters two prototypes
dynamics of online training
bullcomparison of algorithms (within the model)
LVQ 1 original formulation of LVQ
with close to optimal asymptotic generalization
LVQ 21 intuitive extension creates instability
trivial (stationary) classification
+ stopping potentially good performance
practical difficulties depends on initialization
LFM crisp limit of Soft Robust LVQ stable behavior
far from optimal generalization
VQ description of in-class competition
Dynamics and generalization of LVQ Birmingham 09-12- 05
Outlook
bullSelf-Organizing Maps (SOM)
neighborhood preserving SOM Neural Gas (distance rank based)
bull Generalized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Dynamics and generalization of LVQ Birmingham 09-12- 05
quantization error
μj
μk
K
jk
P
1μj
μK
1jVQ ddΘ
2 wξH
μjdprototypes data wj is the winner
here
Euclidean distance
aim faithful representation (in general ne clustering )
Result depends on - the number of prototype vectors - the distance measure metric used
Dynamics and generalization of LVQ Birmingham 09-12- 05
bull identify the closest prototype ie the so-called winner
bull initialize prototype vectors for different classes
bull present a single example
bull move the winner - closer towards the data (same class)
- away from the data (different class)
classification
assignment of a vector to the class of the closest
prototype w
aim generalization ability
classification of novel data
after learning from examples
∙ identification of prototype vectors from labelled example data
∙ distance based classification (eg Euclidean Manhattan hellip)
basic heuristic LVQ scheme LVQ1 [Kohonen]
piecewise linear decision boundaries
Learning Vector Quantization
(t)wξ(t)w1tw (t)w
η
N-dimfeature space
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ algorithms
- frequently applied in a variety
of practical problems
- plausible intuitive flexible
- fast easy to implement
- often based on heuristic arguments
or cost functions with unclear relation to generalization
- limited theoretical understanding of
- dynamics and convergence properties
- achievable generalization ability
here analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- typical properties in a model situation
Dynamics and generalization of LVQ Birmingham 09-12- 05
Model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σ
σN2
σ
- v 2
1exp
v 2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
cluster distance prop ℓ ℓ
jj Bσσξ
σσσvξξ
22jj
indep components with
and variance
ℝN
Dynamics and generalization of LVQ Birmingham 09-12- 05
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=144 v-=064μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamics and generalization of LVQ Birmingham 09-12- 05
Dynamics of on-line training
sequence of new independent random examples 123μσμμ ξ
drawn according to μμσ σPp μ ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
example
LVQ1 original formulation [Kohonen]
Winner-Takes-All (WTA) algorithm
μs
μs
μs d d σS f
1-μs
μμμs-
μss
1-μs
μs σSddf
N
ηwξww 21
μs
μμs
μ
d
1σS
wξ
update of two prototype vectors w+ w-
Dynamics and generalization of LVQ Birmingham 09-12- 05
algorithm recursions
Mathematical analysis of the learning dynamics
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections into the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
2 average over the current example
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
completely specified in terms of first and second moments (wo indices μ)
sσσ
N
1jjsσs R x
jw stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
Dynamics and generalization of LVQ Birmingham 09-12- 05
averaged recursions closed in p σ1σ
σ
μsσ
μst R Q
- depend on the random sequence of example data
- their fluctuations vanish with N
learning dynamics is completely described in terms of averages
3 self-averaging property of characteristic quantities
μsσ
μst R Q
1N
(mean and variance)
R++ (α=10) computer simulations (LVQ1)
- mean results approach theoretical prediction- variance vanishes as N
Dynamics and generalization of LVQ Birmingham 09-12- 05
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst integration yields evolution of projections
stochastic recursions deterministic ODE
probability for misclassification of a novel example
ddpddp gε
]2
]
]2
][
2
1
2
1
QQ[Qv
R[R2QQ
QQ[Q v
RR2QQpp
5 learning curve
generalization error εg(α) after training with α N examples
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
w-
w+
ℓ B-
ℓ B+
RS- w+
RS+
Trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary ____ asymptotic position
initialization ws(0)asymp0
theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
tsst
σssσ
Q
BR
ww
w
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ 21 [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)
p+=08 ℓ=1 v+=v-=1 =05
averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification for αinfin
εg = min p+p- RS+
RS-
Dynamics and generalization of LVQ Birmingham 09-12- 05
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- here lowest minimum value reached for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning From Mistakes (LFM)
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10
Dynamics and generalization of LVQ Birmingham 09-12- 05
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
― trivial classification
Dynamics and generalization of LVQ Birmingham 09-12- 05
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
Dynamics and generalization of LVQ Birmingham 09-12- 05
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
Dynamics and generalization of LVQ Birmingham 09-12- 05
Summary
bulla model scenario of LVQ training
two clusters two prototypes
dynamics of online training
bullcomparison of algorithms (within the model)
LVQ 1 original formulation of LVQ
with close to optimal asymptotic generalization
LVQ 21 intuitive extension creates instability
trivial (stationary) classification
+ stopping potentially good performance
practical difficulties depends on initialization
LFM crisp limit of Soft Robust LVQ stable behavior
far from optimal generalization
VQ description of in-class competition
Dynamics and generalization of LVQ Birmingham 09-12- 05
Outlook
bullSelf-Organizing Maps (SOM)
neighborhood preserving SOM Neural Gas (distance rank based)
bull Generalized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Dynamics and generalization of LVQ Birmingham 09-12- 05
bull identify the closest prototype ie the so-called winner
bull initialize prototype vectors for different classes
bull present a single example
bull move the winner - closer towards the data (same class)
- away from the data (different class)
classification
assignment of a vector to the class of the closest
prototype w
aim generalization ability
classification of novel data
after learning from examples
∙ identification of prototype vectors from labelled example data
∙ distance based classification (eg Euclidean Manhattan hellip)
basic heuristic LVQ scheme LVQ1 [Kohonen]
piecewise linear decision boundaries
Learning Vector Quantization
(t)wξ(t)w1tw (t)w
η
N-dimfeature space
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ algorithms
- frequently applied in a variety
of practical problems
- plausible intuitive flexible
- fast easy to implement
- often based on heuristic arguments
or cost functions with unclear relation to generalization
- limited theoretical understanding of
- dynamics and convergence properties
- achievable generalization ability
here analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- typical properties in a model situation
Dynamics and generalization of LVQ Birmingham 09-12- 05
Model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σ
σN2
σ
- v 2
1exp
v 2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
cluster distance prop ℓ ℓ
jj Bσσξ
σσσvξξ
22jj
indep components with
and variance
ℝN
Dynamics and generalization of LVQ Birmingham 09-12- 05
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=144 v-=064μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamics and generalization of LVQ Birmingham 09-12- 05
Dynamics of on-line training
sequence of new independent random examples 123μσμμ ξ
drawn according to μμσ σPp μ ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
example
LVQ1 original formulation [Kohonen]
Winner-Takes-All (WTA) algorithm
μs
μs
μs d d σS f
1-μs
μμμs-
μss
1-μs
μs σSddf
N
ηwξww 21
μs
μμs
μ
d
1σS
wξ
update of two prototype vectors w+ w-
Dynamics and generalization of LVQ Birmingham 09-12- 05
algorithm recursions
Mathematical analysis of the learning dynamics
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections into the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
2 average over the current example
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
completely specified in terms of first and second moments (wo indices μ)
sσσ
N
1jjsσs R x
jw stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
Dynamics and generalization of LVQ Birmingham 09-12- 05
averaged recursions closed in p σ1σ
σ
μsσ
μst R Q
- depend on the random sequence of example data
- their fluctuations vanish with N
learning dynamics is completely described in terms of averages
3 self-averaging property of characteristic quantities
μsσ
μst R Q
1N
(mean and variance)
R++ (α=10) computer simulations (LVQ1)
- mean results approach theoretical prediction- variance vanishes as N
Dynamics and generalization of LVQ Birmingham 09-12- 05
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst integration yields evolution of projections
stochastic recursions deterministic ODE
probability for misclassification of a novel example
ddpddp gε
]2
]
]2
][
2
1
2
1
QQ[Qv
R[R2QQ
QQ[Q v
RR2QQpp
5 learning curve
generalization error εg(α) after training with α N examples
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
w-
w+
ℓ B-
ℓ B+
RS- w+
RS+
Trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary ____ asymptotic position
initialization ws(0)asymp0
theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
tsst
σssσ
Q
BR
ww
w
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ 21 [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)
p+=08 ℓ=1 v+=v-=1 =05
averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification for αinfin
εg = min p+p- RS+
RS-
Dynamics and generalization of LVQ Birmingham 09-12- 05
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- here lowest minimum value reached for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning From Mistakes (LFM)
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10
Dynamics and generalization of LVQ Birmingham 09-12- 05
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
― trivial classification
Dynamics and generalization of LVQ Birmingham 09-12- 05
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
Dynamics and generalization of LVQ Birmingham 09-12- 05
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
Dynamics and generalization of LVQ Birmingham 09-12- 05
Summary
bulla model scenario of LVQ training
two clusters two prototypes
dynamics of online training
bullcomparison of algorithms (within the model)
LVQ 1 original formulation of LVQ
with close to optimal asymptotic generalization
LVQ 21 intuitive extension creates instability
trivial (stationary) classification
+ stopping potentially good performance
practical difficulties depends on initialization
LFM crisp limit of Soft Robust LVQ stable behavior
far from optimal generalization
VQ description of in-class competition
Dynamics and generalization of LVQ Birmingham 09-12- 05
Outlook
bullSelf-Organizing Maps (SOM)
neighborhood preserving SOM Neural Gas (distance rank based)
bull Generalized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ algorithms
- frequently applied in a variety
of practical problems
- plausible intuitive flexible
- fast easy to implement
- often based on heuristic arguments
or cost functions with unclear relation to generalization
- limited theoretical understanding of
- dynamics and convergence properties
- achievable generalization ability
here analysis of LVQ algorithms wrt
- dynamics of the learning process
- performance ie generalization ability
- typical properties in a model situation
Dynamics and generalization of LVQ Birmingham 09-12- 05
Model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σ
σN2
σ
- v 2
1exp
v 2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
cluster distance prop ℓ ℓ
jj Bσσξ
σσσvξξ
22jj
indep components with
and variance
ℝN
Dynamics and generalization of LVQ Birmingham 09-12- 05
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=144 v-=064μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamics and generalization of LVQ Birmingham 09-12- 05
Dynamics of on-line training
sequence of new independent random examples 123μσμμ ξ
drawn according to μμσ σPp μ ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
example
LVQ1 original formulation [Kohonen]
Winner-Takes-All (WTA) algorithm
μs
μs
μs d d σS f
1-μs
μμμs-
μss
1-μs
μs σSddf
N
ηwξww 21
μs
μμs
μ
d
1σS
wξ
update of two prototype vectors w+ w-
Dynamics and generalization of LVQ Birmingham 09-12- 05
algorithm recursions
Mathematical analysis of the learning dynamics
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections into the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
2 average over the current example
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
completely specified in terms of first and second moments (wo indices μ)
sσσ
N
1jjsσs R x
jw stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
Dynamics and generalization of LVQ Birmingham 09-12- 05
averaged recursions closed in p σ1σ
σ
μsσ
μst R Q
- depend on the random sequence of example data
- their fluctuations vanish with N
learning dynamics is completely described in terms of averages
3 self-averaging property of characteristic quantities
μsσ
μst R Q
1N
(mean and variance)
R++ (α=10) computer simulations (LVQ1)
- mean results approach theoretical prediction- variance vanishes as N
Dynamics and generalization of LVQ Birmingham 09-12- 05
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst integration yields evolution of projections
stochastic recursions deterministic ODE
probability for misclassification of a novel example
ddpddp gε
]2
]
]2
][
2
1
2
1
QQ[Qv
R[R2QQ
QQ[Q v
RR2QQpp
5 learning curve
generalization error εg(α) after training with α N examples
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
w-
w+
ℓ B-
ℓ B+
RS- w+
RS+
Trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary ____ asymptotic position
initialization ws(0)asymp0
theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
tsst
σssσ
Q
BR
ww
w
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ 21 [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)
p+=08 ℓ=1 v+=v-=1 =05
averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification for αinfin
εg = min p+p- RS+
RS-
Dynamics and generalization of LVQ Birmingham 09-12- 05
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- here lowest minimum value reached for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning From Mistakes (LFM)
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10
Dynamics and generalization of LVQ Birmingham 09-12- 05
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
― trivial classification
Dynamics and generalization of LVQ Birmingham 09-12- 05
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
Dynamics and generalization of LVQ Birmingham 09-12- 05
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
Dynamics and generalization of LVQ Birmingham 09-12- 05
Summary
bulla model scenario of LVQ training
two clusters two prototypes
dynamics of online training
bullcomparison of algorithms (within the model)
LVQ 1 original formulation of LVQ
with close to optimal asymptotic generalization
LVQ 21 intuitive extension creates instability
trivial (stationary) classification
+ stopping potentially good performance
practical difficulties depends on initialization
LFM crisp limit of Soft Robust LVQ stable behavior
far from optimal generalization
VQ description of in-class competition
Dynamics and generalization of LVQ Birmingham 09-12- 05
Outlook
bullSelf-Organizing Maps (SOM)
neighborhood preserving SOM Neural Gas (distance rank based)
bull Generalized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Dynamics and generalization of LVQ Birmingham 09-12- 05
Model situation two clusters of N-dimensional data
random vectors isin ℝN according to σ)P(p )P(1σ
σ ξξ
2σ
σN2
σ
- v 2
1exp
v 2π
1σ)P( Βξξ mixture of two Gaussians
orthonormal center vectors
B+ B- isin ℝN ( B )2 =1 B+ B- =0
prior weights of classes p+ p-
p+ + p- = 1
B+
B-
(p+)
(p-)
cluster distance prop ℓ ℓ
jj Bσσξ
σσσvξξ
22jj
indep components with
and variance
ℝN
Dynamics and generalization of LVQ Birmingham 09-12- 05
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=144 v-=064μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamics and generalization of LVQ Birmingham 09-12- 05
Dynamics of on-line training
sequence of new independent random examples 123μσμμ ξ
drawn according to μμσ σPp μ ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
example
LVQ1 original formulation [Kohonen]
Winner-Takes-All (WTA) algorithm
μs
μs
μs d d σS f
1-μs
μμμs-
μss
1-μs
μs σSddf
N
ηwξww 21
μs
μμs
μ
d
1σS
wξ
update of two prototype vectors w+ w-
Dynamics and generalization of LVQ Birmingham 09-12- 05
algorithm recursions
Mathematical analysis of the learning dynamics
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections into the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
2 average over the current example
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
completely specified in terms of first and second moments (wo indices μ)
sσσ
N
1jjsσs R x
jw stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
Dynamics and generalization of LVQ Birmingham 09-12- 05
averaged recursions closed in p σ1σ
σ
μsσ
μst R Q
- depend on the random sequence of example data
- their fluctuations vanish with N
learning dynamics is completely described in terms of averages
3 self-averaging property of characteristic quantities
μsσ
μst R Q
1N
(mean and variance)
R++ (α=10) computer simulations (LVQ1)
- mean results approach theoretical prediction- variance vanishes as N
Dynamics and generalization of LVQ Birmingham 09-12- 05
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst integration yields evolution of projections
stochastic recursions deterministic ODE
probability for misclassification of a novel example
ddpddp gε
]2
]
]2
][
2
1
2
1
QQ[Qv
R[R2QQ
QQ[Q v
RR2QQpp
5 learning curve
generalization error εg(α) after training with α N examples
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
w-
w+
ℓ B-
ℓ B+
RS- w+
RS+
Trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary ____ asymptotic position
initialization ws(0)asymp0
theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
tsst
σssσ
Q
BR
ww
w
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ 21 [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)
p+=08 ℓ=1 v+=v-=1 =05
averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification for αinfin
εg = min p+p- RS+
RS-
Dynamics and generalization of LVQ Birmingham 09-12- 05
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- here lowest minimum value reached for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning From Mistakes (LFM)
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10
Dynamics and generalization of LVQ Birmingham 09-12- 05
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
― trivial classification
Dynamics and generalization of LVQ Birmingham 09-12- 05
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
Dynamics and generalization of LVQ Birmingham 09-12- 05
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
Dynamics and generalization of LVQ Birmingham 09-12- 05
Summary
bulla model scenario of LVQ training
two clusters two prototypes
dynamics of online training
bullcomparison of algorithms (within the model)
LVQ 1 original formulation of LVQ
with close to optimal asymptotic generalization
LVQ 21 intuitive extension creates instability
trivial (stationary) classification
+ stopping potentially good performance
practical difficulties depends on initialization
LFM crisp limit of Soft Robust LVQ stable behavior
far from optimal generalization
VQ description of in-class competition
Dynamics and generalization of LVQ Birmingham 09-12- 05
Outlook
bullSelf-Organizing Maps (SOM)
neighborhood preserving SOM Neural Gas (distance rank based)
bull Generalized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Dynamics and generalization of LVQ Birmingham 09-12- 05
high-dimensional data (formally Ninfin)
ξμ isinℝN N=200 ℓ=1 p+=04 v+=144 v-=064μ
B
yξ
( 240)( 160)
projections into the plane of center vectors B+ B-
μ By ξ
μ 2
2xξ
w
projections on two independent random directions w12
μ 11x ξw
Dynamics and generalization of LVQ Birmingham 09-12- 05
Dynamics of on-line training
sequence of new independent random examples 123μσμμ ξ
drawn according to μμσ σPp μ ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
example
LVQ1 original formulation [Kohonen]
Winner-Takes-All (WTA) algorithm
μs
μs
μs d d σS f
1-μs
μμμs-
μss
1-μs
μs σSddf
N
ηwξww 21
μs
μμs
μ
d
1σS
wξ
update of two prototype vectors w+ w-
Dynamics and generalization of LVQ Birmingham 09-12- 05
algorithm recursions
Mathematical analysis of the learning dynamics
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections into the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
2 average over the current example
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
completely specified in terms of first and second moments (wo indices μ)
sσσ
N
1jjsσs R x
jw stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
Dynamics and generalization of LVQ Birmingham 09-12- 05
averaged recursions closed in p σ1σ
σ
μsσ
μst R Q
- depend on the random sequence of example data
- their fluctuations vanish with N
learning dynamics is completely described in terms of averages
3 self-averaging property of characteristic quantities
μsσ
μst R Q
1N
(mean and variance)
R++ (α=10) computer simulations (LVQ1)
- mean results approach theoretical prediction- variance vanishes as N
Dynamics and generalization of LVQ Birmingham 09-12- 05
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst integration yields evolution of projections
stochastic recursions deterministic ODE
probability for misclassification of a novel example
ddpddp gε
]2
]
]2
][
2
1
2
1
QQ[Qv
R[R2QQ
QQ[Q v
RR2QQpp
5 learning curve
generalization error εg(α) after training with α N examples
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
w-
w+
ℓ B-
ℓ B+
RS- w+
RS+
Trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary ____ asymptotic position
initialization ws(0)asymp0
theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
tsst
σssσ
Q
BR
ww
w
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ 21 [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)
p+=08 ℓ=1 v+=v-=1 =05
averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification for αinfin
εg = min p+p- RS+
RS-
Dynamics and generalization of LVQ Birmingham 09-12- 05
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- here lowest minimum value reached for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning From Mistakes (LFM)
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10
Dynamics and generalization of LVQ Birmingham 09-12- 05
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
― trivial classification
Dynamics and generalization of LVQ Birmingham 09-12- 05
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
Dynamics and generalization of LVQ Birmingham 09-12- 05
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
Dynamics and generalization of LVQ Birmingham 09-12- 05
Summary
bulla model scenario of LVQ training
two clusters two prototypes
dynamics of online training
bullcomparison of algorithms (within the model)
LVQ 1 original formulation of LVQ
with close to optimal asymptotic generalization
LVQ 21 intuitive extension creates instability
trivial (stationary) classification
+ stopping potentially good performance
practical difficulties depends on initialization
LFM crisp limit of Soft Robust LVQ stable behavior
far from optimal generalization
VQ description of in-class competition
Dynamics and generalization of LVQ Birmingham 09-12- 05
Outlook
bullSelf-Organizing Maps (SOM)
neighborhood preserving SOM Neural Gas (distance rank based)
bull Generalized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Dynamics and generalization of LVQ Birmingham 09-12- 05
Dynamics of on-line training
sequence of new independent random examples 123μσμμ ξ
drawn according to μμσ σPp μ ξ
learning ratestep size
competitiondirection ofupdate etc
change of prototypetowards or away from the current data
example
LVQ1 original formulation [Kohonen]
Winner-Takes-All (WTA) algorithm
μs
μs
μs d d σS f
1-μs
μμμs-
μss
1-μs
μs σSddf
N
ηwξww 21
μs
μμs
μ
d
1σS
wξ
update of two prototype vectors w+ w-
Dynamics and generalization of LVQ Birmingham 09-12- 05
algorithm recursions
Mathematical analysis of the learning dynamics
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections into the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
2 average over the current example
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
completely specified in terms of first and second moments (wo indices μ)
sσσ
N
1jjsσs R x
jw stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
Dynamics and generalization of LVQ Birmingham 09-12- 05
averaged recursions closed in p σ1σ
σ
μsσ
μst R Q
- depend on the random sequence of example data
- their fluctuations vanish with N
learning dynamics is completely described in terms of averages
3 self-averaging property of characteristic quantities
μsσ
μst R Q
1N
(mean and variance)
R++ (α=10) computer simulations (LVQ1)
- mean results approach theoretical prediction- variance vanishes as N
Dynamics and generalization of LVQ Birmingham 09-12- 05
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst integration yields evolution of projections
stochastic recursions deterministic ODE
probability for misclassification of a novel example
ddpddp gε
]2
]
]2
][
2
1
2
1
QQ[Qv
R[R2QQ
QQ[Q v
RR2QQpp
5 learning curve
generalization error εg(α) after training with α N examples
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
w-
w+
ℓ B-
ℓ B+
RS- w+
RS+
Trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary ____ asymptotic position
initialization ws(0)asymp0
theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
tsst
σssσ
Q
BR
ww
w
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ 21 [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)
p+=08 ℓ=1 v+=v-=1 =05
averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification for αinfin
εg = min p+p- RS+
RS-
Dynamics and generalization of LVQ Birmingham 09-12- 05
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- here lowest minimum value reached for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning From Mistakes (LFM)
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10
Dynamics and generalization of LVQ Birmingham 09-12- 05
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
― trivial classification
Dynamics and generalization of LVQ Birmingham 09-12- 05
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
Dynamics and generalization of LVQ Birmingham 09-12- 05
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
Dynamics and generalization of LVQ Birmingham 09-12- 05
Summary
bulla model scenario of LVQ training
two clusters two prototypes
dynamics of online training
bullcomparison of algorithms (within the model)
LVQ 1 original formulation of LVQ
with close to optimal asymptotic generalization
LVQ 21 intuitive extension creates instability
trivial (stationary) classification
+ stopping potentially good performance
practical difficulties depends on initialization
LFM crisp limit of Soft Robust LVQ stable behavior
far from optimal generalization
VQ description of in-class competition
Dynamics and generalization of LVQ Birmingham 09-12- 05
Outlook
bullSelf-Organizing Maps (SOM)
neighborhood preserving SOM Neural Gas (distance rank based)
bull Generalized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Dynamics and generalization of LVQ Birmingham 09-12- 05
algorithm recursions
Mathematical analysis of the learning dynamics
11 σtsμt
μs
μstσ
μs
μsσ QBR www
projections into the (B+ B- )-plane
length and relativeposition of prototypes
1 description in terms of a few characteristic quantitities
( here ℝ2N ℝ7 )
2 average over the current example
random vector according to avg lengthσ)|P( μξ 22 vN σσ
ξ
in the thermodynamic limit N μμ
μ1-μs
μs
By
wx
ξ
ξ
correlated Gaussian random quantities
completely specified in terms of first and second moments (wo indices μ)
sσσ
N
1jjsσs R x
jw stσσtσsσt s Qv xx- xx
sσσσsσ s Rv yx- yx σσσσv yy- yy
sσσ y
Dynamics and generalization of LVQ Birmingham 09-12- 05
averaged recursions closed in p σ1σ
σ
μsσ
μst R Q
- depend on the random sequence of example data
- their fluctuations vanish with N
learning dynamics is completely described in terms of averages
3 self-averaging property of characteristic quantities
μsσ
μst R Q
1N
(mean and variance)
R++ (α=10) computer simulations (LVQ1)
- mean results approach theoretical prediction- variance vanishes as N
Dynamics and generalization of LVQ Birmingham 09-12- 05
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst integration yields evolution of projections
stochastic recursions deterministic ODE
probability for misclassification of a novel example
ddpddp gε
]2
]
]2
][
2
1
2
1
QQ[Qv
R[R2QQ
QQ[Q v
RR2QQpp
5 learning curve
generalization error εg(α) after training with α N examples
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
w-
w+
ℓ B-
ℓ B+
RS- w+
RS+
Trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary ____ asymptotic position
initialization ws(0)asymp0
theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
tsst
σssσ
Q
BR
ww
w
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ 21 [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)
p+=08 ℓ=1 v+=v-=1 =05
averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification for αinfin
εg = min p+p- RS+
RS-
Dynamics and generalization of LVQ Birmingham 09-12- 05
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- here lowest minimum value reached for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning From Mistakes (LFM)
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10
Dynamics and generalization of LVQ Birmingham 09-12- 05
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
― trivial classification
Dynamics and generalization of LVQ Birmingham 09-12- 05
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
Dynamics and generalization of LVQ Birmingham 09-12- 05
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
Dynamics and generalization of LVQ Birmingham 09-12- 05
Summary
bulla model scenario of LVQ training
two clusters two prototypes
dynamics of online training
bullcomparison of algorithms (within the model)
LVQ 1 original formulation of LVQ
with close to optimal asymptotic generalization
LVQ 21 intuitive extension creates instability
trivial (stationary) classification
+ stopping potentially good performance
practical difficulties depends on initialization
LFM crisp limit of Soft Robust LVQ stable behavior
far from optimal generalization
VQ description of in-class competition
Dynamics and generalization of LVQ Birmingham 09-12- 05
Outlook
bullSelf-Organizing Maps (SOM)
neighborhood preserving SOM Neural Gas (distance rank based)
bull Generalized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Dynamics and generalization of LVQ Birmingham 09-12- 05
averaged recursions closed in p σ1σ
σ
μsσ
μst R Q
- depend on the random sequence of example data
- their fluctuations vanish with N
learning dynamics is completely described in terms of averages
3 self-averaging property of characteristic quantities
μsσ
μst R Q
1N
(mean and variance)
R++ (α=10) computer simulations (LVQ1)
- mean results approach theoretical prediction- variance vanishes as N
Dynamics and generalization of LVQ Birmingham 09-12- 05
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst integration yields evolution of projections
stochastic recursions deterministic ODE
probability for misclassification of a novel example
ddpddp gε
]2
]
]2
][
2
1
2
1
QQ[Qv
R[R2QQ
QQ[Q v
RR2QQpp
5 learning curve
generalization error εg(α) after training with α N examples
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
w-
w+
ℓ B-
ℓ B+
RS- w+
RS+
Trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary ____ asymptotic position
initialization ws(0)asymp0
theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
tsst
σssσ
Q
BR
ww
w
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ 21 [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)
p+=08 ℓ=1 v+=v-=1 =05
averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification for αinfin
εg = min p+p- RS+
RS-
Dynamics and generalization of LVQ Birmingham 09-12- 05
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- here lowest minimum value reached for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning From Mistakes (LFM)
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10
Dynamics and generalization of LVQ Birmingham 09-12- 05
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
― trivial classification
Dynamics and generalization of LVQ Birmingham 09-12- 05
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
Dynamics and generalization of LVQ Birmingham 09-12- 05
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
Dynamics and generalization of LVQ Birmingham 09-12- 05
Summary
bulla model scenario of LVQ training
two clusters two prototypes
dynamics of online training
bullcomparison of algorithms (within the model)
LVQ 1 original formulation of LVQ
with close to optimal asymptotic generalization
LVQ 21 intuitive extension creates instability
trivial (stationary) classification
+ stopping potentially good performance
practical difficulties depends on initialization
LFM crisp limit of Soft Robust LVQ stable behavior
far from optimal generalization
VQ description of in-class competition
Dynamics and generalization of LVQ Birmingham 09-12- 05
Outlook
bullSelf-Organizing Maps (SOM)
neighborhood preserving SOM Neural Gas (distance rank based)
bull Generalized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Dynamics and generalization of LVQ Birmingham 09-12- 05
4 continuous learning time
N
μ α of examples
of learning stepsper degree of freedom
) α (R ) α (Q sσst integration yields evolution of projections
stochastic recursions deterministic ODE
probability for misclassification of a novel example
ddpddp gε
]2
]
]2
][
2
1
2
1
QQ[Qv
R[R2QQ
QQ[Q v
RR2QQpp
5 learning curve
generalization error εg(α) after training with α N examples
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
w-
w+
ℓ B-
ℓ B+
RS- w+
RS+
Trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary ____ asymptotic position
initialization ws(0)asymp0
theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
tsst
σssσ
Q
BR
ww
w
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ 21 [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)
p+=08 ℓ=1 v+=v-=1 =05
averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification for αinfin
εg = min p+p- RS+
RS-
Dynamics and generalization of LVQ Birmingham 09-12- 05
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- here lowest minimum value reached for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning From Mistakes (LFM)
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10
Dynamics and generalization of LVQ Birmingham 09-12- 05
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
― trivial classification
Dynamics and generalization of LVQ Birmingham 09-12- 05
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
Dynamics and generalization of LVQ Birmingham 09-12- 05
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
Dynamics and generalization of LVQ Birmingham 09-12- 05
Summary
bulla model scenario of LVQ training
two clusters two prototypes
dynamics of online training
bullcomparison of algorithms (within the model)
LVQ 1 original formulation of LVQ
with close to optimal asymptotic generalization
LVQ 21 intuitive extension creates instability
trivial (stationary) classification
+ stopping potentially good performance
practical difficulties depends on initialization
LFM crisp limit of Soft Robust LVQ stable behavior
far from optimal generalization
VQ description of in-class competition
Dynamics and generalization of LVQ Birmingham 09-12- 05
Outlook
bullSelf-Organizing Maps (SOM)
neighborhood preserving SOM Neural Gas (distance rank based)
bull Generalized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
initialization ws(0)=0
theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
self-averaging property
(mean and variances)
1N
R++ (α=10)
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
w-
w+
ℓ B-
ℓ B+
RS- w+
RS+
Trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary ____ asymptotic position
initialization ws(0)asymp0
theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
tsst
σssσ
Q
BR
ww
w
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ 21 [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)
p+=08 ℓ=1 v+=v-=1 =05
averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification for αinfin
εg = min p+p- RS+
RS-
Dynamics and generalization of LVQ Birmingham 09-12- 05
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- here lowest minimum value reached for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning From Mistakes (LFM)
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10
Dynamics and generalization of LVQ Birmingham 09-12- 05
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
― trivial classification
Dynamics and generalization of LVQ Birmingham 09-12- 05
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
Dynamics and generalization of LVQ Birmingham 09-12- 05
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
Dynamics and generalization of LVQ Birmingham 09-12- 05
Summary
bulla model scenario of LVQ training
two clusters two prototypes
dynamics of online training
bullcomparison of algorithms (within the model)
LVQ 1 original formulation of LVQ
with close to optimal asymptotic generalization
LVQ 21 intuitive extension creates instability
trivial (stationary) classification
+ stopping potentially good performance
practical difficulties depends on initialization
LFM crisp limit of Soft Robust LVQ stable behavior
far from optimal generalization
VQ description of in-class competition
Dynamics and generalization of LVQ Birmingham 09-12- 05
Outlook
bullSelf-Organizing Maps (SOM)
neighborhood preserving SOM Neural Gas (distance rank based)
bull Generalized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ1 The winner takes it all
winner ws 1
1-μs
μμμS
μS
1-μs
μs Sσdd
N
ηwξww
only the winner is updated according to the class label
w-
w+
ℓ B-
ℓ B+
RS- w+
RS+
Trajectories in the (B+B- )-plane
(bull) =2040140 optimal decision boundary ____ asymptotic position
initialization ws(0)asymp0
theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10
averaged over 100 indep runs
Q++
Q--
Q+-
α
RSσ
tsst
σssσ
Q
BR
ww
w
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ 21 [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)
p+=08 ℓ=1 v+=v-=1 =05
averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification for αinfin
εg = min p+p- RS+
RS-
Dynamics and generalization of LVQ Birmingham 09-12- 05
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- here lowest minimum value reached for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning From Mistakes (LFM)
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10
Dynamics and generalization of LVQ Birmingham 09-12- 05
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
― trivial classification
Dynamics and generalization of LVQ Birmingham 09-12- 05
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
Dynamics and generalization of LVQ Birmingham 09-12- 05
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
Dynamics and generalization of LVQ Birmingham 09-12- 05
Summary
bulla model scenario of LVQ training
two clusters two prototypes
dynamics of online training
bullcomparison of algorithms (within the model)
LVQ 1 original formulation of LVQ
with close to optimal asymptotic generalization
LVQ 21 intuitive extension creates instability
trivial (stationary) classification
+ stopping potentially good performance
practical difficulties depends on initialization
LFM crisp limit of Soft Robust LVQ stable behavior
far from optimal generalization
VQ description of in-class competition
Dynamics and generalization of LVQ Birmingham 09-12- 05
Outlook
bullSelf-Organizing Maps (SOM)
neighborhood preserving SOM Neural Gas (distance rank based)
bull Generalized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning curve
η= 201002
- suboptimal non-monotonic behavior for small η
εg (αinfin) grows linearly with η- stationary state
η 0 αinfin (η α ) infin
- well-defined asymptotics
η
εgp+ = 02 ℓ=10
v+ = v- = 10
achievable generalization error
εgεg
p+ p+
v+ = v- =10 v+ =025 v-=081
best linear boundary― LVQ1
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ 21 [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)
p+=08 ℓ=1 v+=v-=1 =05
averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification for αinfin
εg = min p+p- RS+
RS-
Dynamics and generalization of LVQ Birmingham 09-12- 05
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- here lowest minimum value reached for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning From Mistakes (LFM)
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10
Dynamics and generalization of LVQ Birmingham 09-12- 05
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
― trivial classification
Dynamics and generalization of LVQ Birmingham 09-12- 05
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
Dynamics and generalization of LVQ Birmingham 09-12- 05
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
Dynamics and generalization of LVQ Birmingham 09-12- 05
Summary
bulla model scenario of LVQ training
two clusters two prototypes
dynamics of online training
bullcomparison of algorithms (within the model)
LVQ 1 original formulation of LVQ
with close to optimal asymptotic generalization
LVQ 21 intuitive extension creates instability
trivial (stationary) classification
+ stopping potentially good performance
practical difficulties depends on initialization
LFM crisp limit of Soft Robust LVQ stable behavior
far from optimal generalization
VQ description of in-class competition
Dynamics and generalization of LVQ Birmingham 09-12- 05
Outlook
bullSelf-Organizing Maps (SOM)
neighborhood preserving SOM Neural Gas (distance rank based)
bull Generalized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Dynamics and generalization of LVQ Birmingham 09-12- 05
LVQ 21 [Kohonen] here update correct and wrong winner
1-μs
μ1-μs
μs Sσ
N
ηwξww
αQQRR
Q R R
with
finite remain
Q R R
R Q
Q R
α 102 4 86
6-
0
6theory and simulation (N=100)
p+=08 ℓ=1 v+=v-=1 =05
averages over 100 independent runs
problem instability of the algorithm
due to repulsion of wrong prototypes
trivial classification for αinfin
εg = min p+p- RS+
RS-
Dynamics and generalization of LVQ Birmingham 09-12- 05
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- here lowest minimum value reached for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning From Mistakes (LFM)
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10
Dynamics and generalization of LVQ Birmingham 09-12- 05
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
― trivial classification
Dynamics and generalization of LVQ Birmingham 09-12- 05
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
Dynamics and generalization of LVQ Birmingham 09-12- 05
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
Dynamics and generalization of LVQ Birmingham 09-12- 05
Summary
bulla model scenario of LVQ training
two clusters two prototypes
dynamics of online training
bullcomparison of algorithms (within the model)
LVQ 1 original formulation of LVQ
with close to optimal asymptotic generalization
LVQ 21 intuitive extension creates instability
trivial (stationary) classification
+ stopping potentially good performance
practical difficulties depends on initialization
LFM crisp limit of Soft Robust LVQ stable behavior
far from optimal generalization
VQ description of in-class competition
Dynamics and generalization of LVQ Birmingham 09-12- 05
Outlook
bullSelf-Organizing Maps (SOM)
neighborhood preserving SOM Neural Gas (distance rank based)
bull Generalized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Dynamics and generalization of LVQ Birmingham 09-12- 05
suggested strategy
selection of data in a window close to the current decision boundary
slows down the repulsion system remains instable
Early stopping end training process at minimal εg (idealized)
εg
η= 20 10 05
η
- pronounced minimum in εg (α) depends on initialization and cluster geometry
- here lowest minimum value reached for η0
v+ =025 v-=081εg
p+
― LVQ1__ early stopping
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning From Mistakes (LFM)
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10
Dynamics and generalization of LVQ Birmingham 09-12- 05
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
― trivial classification
Dynamics and generalization of LVQ Birmingham 09-12- 05
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
Dynamics and generalization of LVQ Birmingham 09-12- 05
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
Dynamics and generalization of LVQ Birmingham 09-12- 05
Summary
bulla model scenario of LVQ training
two clusters two prototypes
dynamics of online training
bullcomparison of algorithms (within the model)
LVQ 1 original formulation of LVQ
with close to optimal asymptotic generalization
LVQ 21 intuitive extension creates instability
trivial (stationary) classification
+ stopping potentially good performance
practical difficulties depends on initialization
LFM crisp limit of Soft Robust LVQ stable behavior
far from optimal generalization
VQ description of in-class competition
Dynamics and generalization of LVQ Birmingham 09-12- 05
Outlook
bullSelf-Organizing Maps (SOM)
neighborhood preserving SOM Neural Gas (distance rank based)
bull Generalized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Dynamics and generalization of LVQ Birmingham 09-12- 05
Learning From Mistakes (LFM)
1-μs
μμμσ-
μσ
1-μs
μs Sσdd
N
ημμ wξww
LVQ21 updateonly if the current classification is wrong
crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]
projected trajetory
ℓ B-
ℓ B+
RS+
RS-
εg
p+=08 ℓ=30
v+=40 v-=90
η= 20 10 05
Learning curves
η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10
Dynamics and generalization of LVQ Birmingham 09-12- 05
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
― trivial classification
Dynamics and generalization of LVQ Birmingham 09-12- 05
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
Dynamics and generalization of LVQ Birmingham 09-12- 05
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
Dynamics and generalization of LVQ Birmingham 09-12- 05
Summary
bulla model scenario of LVQ training
two clusters two prototypes
dynamics of online training
bullcomparison of algorithms (within the model)
LVQ 1 original formulation of LVQ
with close to optimal asymptotic generalization
LVQ 21 intuitive extension creates instability
trivial (stationary) classification
+ stopping potentially good performance
practical difficulties depends on initialization
LFM crisp limit of Soft Robust LVQ stable behavior
far from optimal generalization
VQ description of in-class competition
Dynamics and generalization of LVQ Birmingham 09-12- 05
Outlook
bullSelf-Organizing Maps (SOM)
neighborhood preserving SOM Neural Gas (distance rank based)
bull Generalized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Dynamics and generalization of LVQ Birmingham 09-12- 05
εg
p+
equal cluster variances
p+
unequal variances
best linear boundary
― LVQ1
--- LVQ21 (early stopping)middot-middot LFM
Comparison achievable generalization ability
v+=025 v-=081v+=v-=10
― trivial classification
Dynamics and generalization of LVQ Birmingham 09-12- 05
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
Dynamics and generalization of LVQ Birmingham 09-12- 05
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
Dynamics and generalization of LVQ Birmingham 09-12- 05
Summary
bulla model scenario of LVQ training
two clusters two prototypes
dynamics of online training
bullcomparison of algorithms (within the model)
LVQ 1 original formulation of LVQ
with close to optimal asymptotic generalization
LVQ 21 intuitive extension creates instability
trivial (stationary) classification
+ stopping potentially good performance
practical difficulties depends on initialization
LFM crisp limit of Soft Robust LVQ stable behavior
far from optimal generalization
VQ description of in-class competition
Dynamics and generalization of LVQ Birmingham 09-12- 05
Outlook
bullSelf-Organizing Maps (SOM)
neighborhood preserving SOM Neural Gas (distance rank based)
bull Generalized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Dynamics and generalization of LVQ Birmingham 09-12- 05
Vector Quantization
competitive learning 1-μs
μμS
μS
1-μs
μs dd
N
ηwξww
ws winner
class membership is unknown
or identical for all data
numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )
εg
α
VQ
LVQ+
LVQ1
αα
R++
R+-
R-+
R--
100 200 3000
0
10system is invariant under
exchange of the prototypes
weakly repulsive fixed
points
Dynamics and generalization of LVQ Birmingham 09-12- 05
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
Dynamics and generalization of LVQ Birmingham 09-12- 05
Summary
bulla model scenario of LVQ training
two clusters two prototypes
dynamics of online training
bullcomparison of algorithms (within the model)
LVQ 1 original formulation of LVQ
with close to optimal asymptotic generalization
LVQ 21 intuitive extension creates instability
trivial (stationary) classification
+ stopping potentially good performance
practical difficulties depends on initialization
LFM crisp limit of Soft Robust LVQ stable behavior
far from optimal generalization
VQ description of in-class competition
Dynamics and generalization of LVQ Birmingham 09-12- 05
Outlook
bullSelf-Organizing Maps (SOM)
neighborhood preserving SOM Neural Gas (distance rank based)
bull Generalized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Dynamics and generalization of LVQ Birmingham 09-12- 05
interpretations
- VQ unsupervised learning unlabelled data
- LVQ two prototypes of the same class identical labels
- LVQ different classes but labels are not used in training
εg
p+
asymptotics (0 )
p+asymp0
p-asymp1
- low quantization error- high gen error εg
Dynamics and generalization of LVQ Birmingham 09-12- 05
Summary
bulla model scenario of LVQ training
two clusters two prototypes
dynamics of online training
bullcomparison of algorithms (within the model)
LVQ 1 original formulation of LVQ
with close to optimal asymptotic generalization
LVQ 21 intuitive extension creates instability
trivial (stationary) classification
+ stopping potentially good performance
practical difficulties depends on initialization
LFM crisp limit of Soft Robust LVQ stable behavior
far from optimal generalization
VQ description of in-class competition
Dynamics and generalization of LVQ Birmingham 09-12- 05
Outlook
bullSelf-Organizing Maps (SOM)
neighborhood preserving SOM Neural Gas (distance rank based)
bull Generalized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Dynamics and generalization of LVQ Birmingham 09-12- 05
Summary
bulla model scenario of LVQ training
two clusters two prototypes
dynamics of online training
bullcomparison of algorithms (within the model)
LVQ 1 original formulation of LVQ
with close to optimal asymptotic generalization
LVQ 21 intuitive extension creates instability
trivial (stationary) classification
+ stopping potentially good performance
practical difficulties depends on initialization
LFM crisp limit of Soft Robust LVQ stable behavior
far from optimal generalization
VQ description of in-class competition
Dynamics and generalization of LVQ Birmingham 09-12- 05
Outlook
bullSelf-Organizing Maps (SOM)
neighborhood preserving SOM Neural Gas (distance rank based)
bull Generalized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line
Dynamics and generalization of LVQ Birmingham 09-12- 05
Outlook
bullSelf-Organizing Maps (SOM)
neighborhood preserving SOM Neural Gas (distance rank based)
bull Generalized Relevance LVQ [eg Hammer amp Villmann]
adaptive metrics eg distance measure
N
i
iii w1
2)( sλ ξξwd
training
bullapplications
bull multi-class multi-prototype problems
bull optimized procedures learning rate schedules
variational approach Bayes optimal on-line