Chapter 8. Principal-‐Components Analysis
Neural Networks and Learning Machines (Haykin)
Lecture Notes of Self-‐learning Neural Algorithms
Byoung-‐Tak ZhangSchool of Computer Science and Engineering
Seoul National University
1
Contents
8.1 Introduction ……………………………………………………..….. 38.2 Principles of Self-‐organization ………………………………. 48.3 Self-‐organized Feature Analysis ………………………….... 68.4 Principal-‐Components Analysis …………………………..... 88.5 Hebbian-‐Based Maximum Eigenfilter ……………....…. 158.6 Hebbian-‐Based PCA …………………………………………….. 198.7 Case Study: Image Decoding ……………………………….. 22Summary and Discussion …………………………..…………….. 25
2
8.1 Introductionn Supervised learning
• Learning from labeled examplesn Semisupervised learning
• Learning from unlabeled and labeled examplesn Unsupervised learning
• Learning from examples without a teacherl Self-‐organized learning
• Neurobiological considerations• Locality of learning (immediate local behavior of neurons)
l Statistical learning theory• Mathematical considerations• Less emphasis on locality of learning
3
8.2 Principles of Self-‐Organization (1/2)
n Principle 1: Self-‐amplification (self-‐reinforcement)l Synaptic modification self-‐amplifies by Hebb’s postulate of
learning1) If two neurons of a synapse are activated simultaneously, then
synaptic strength is selectively increased.2) If two neurons of a synapse are activated asynchronously,
then synaptic strength is selectively weakened or eliminated.
l Four key mechanisms of Hebbian synapsel Time-‐dependent mechanisml Local mechanisml Interactive mechanisml Conjunctional or correlational mechanism
!!Δwkj(n)=η yk(n)x j(n)
4
8.2 Principles of Self-‐Organization (2/2)
n Principle 2: Competition• Limitation of available resources• The most vigorously growing (fittest) synapses or neurons are
selected at the expense of the others.• Synaptic plasticity (adjustability of a synaptic weight)
n Principle 3: Cooperation• Modifications in synaptic weights at the neural level and in
neurons at the network level tend to cooperate with each other.• Lateral interaction among a group of excited neurons
n Principle 4: Structural information• The underlying structure (redundancy) in the input signal is
acquired by a self-‐organizing system• Inherent characteristic of the input signal
5
8.3 Self-‐organized Feature Analysis
Figure 8.1 Layout of modular self-‐adaptive Linsker’s model, with overlapping receptive fields. Mammalian visual system model.
6
8.4 Principal-‐Components Analysis (1/8)
Does there exist an invertible linear transformation T such that the truncation of Tx is optimum in the mean-‐square-‐error sense?
!!!
x :m#dimentional!vectorX :m#dimentional!random!vectorq :m#dimentional!unit!vectorProjection :!!!!!!!A= XTq =qTXVariance!of!A:!!!!!!σ 2 =E[A2]=E[(qTX)(XTq)]=qTE[XXT ]q =qTRqR :m#by#m!correlation!matrix!!!!!!R =Ε[XXT ] 7
8.4 Principal-‐Components Analysis (2/8)
!!!
!!!!!ψ (q)= !σ 2 =qTRq!!!!!!!!!!!!!!**For!any!small!perturbation!δq :!!!!!ψ (q+δq)=ψ (q).!.!.!.!.!.!.Introduce!a!scalar!factor!λ:!!!!!Rq = λq!!!!!!!(eigenvalue!problem)
!!!
λ1 ,λ2 ,...,λm : !Eigenvalues!of!Rq1 ,q2 ,...,qm : !Eigenvectors!of!R!!!!!Rq j = λ jq j !!!!!!!!j =1,!2,!..., !m!!!!!λ1 > λ2 >!> λ j >!> λm!!!!!Q = [q1 ,q2 ,...,q j ,...,qm]!!!!!!!RQ =QΛ!!
!!!
!!!!!RQ =QΛEigen!decomposition:!!!!!i)!QTRQ = Λ
!!!!!!!!!!q jTRq j =
λ j , !!!!k = j0,!!!!!!k ≠ j
⎧⎨⎪
⎩⎪!!!!!!!**
!!!!!ii)!R =QΛQT = λii=1
m
∑ qiqiT
!!!!!!!!!!!(spectral!theorem)
!!!
From!!**,!we!see!that!!!!!ψ (q j )= !λ j !!!!!!!j =1,2,...,m 8
• Summary of the eigenstructure of PCA1) The eigenvectors of the correlation matrix R
for the random vector X define the unit vectors qj, representing the principal directions along with the variance probes have their extremal values.
2) The associated eigenvalues define the extremal values of the variance probes
8.4 Principal-‐Components Analysis (3/8)
!!!ψ (q j )
!!!ψ (u j )
9
8.4 Principal-‐Components Analysis (4/8)
!!!
Data!vector!x : !a!realization!of!Xa : !a!realization!of!A!!!!!aj =q j
Tx = xTq j !!!!!!!!!!!j =1,2,...,maj : !the!projections!of!x !onto!principal!directions!!!!!!!!(principal!components)
Reconstruction!(synthesis)!of!the!original!data!x :!!!!!a = [a1 ,a2 ,...,am]T = [xTq1 ,xTq2 ,...,xTqm]T =QTx
!!!!!Qa =QQTx = Ix = x !!!!!
!!!!!x =Qa = ajq jj=1
m
∑10
8.4 Principal-‐Components Analysis (5/8)
!!!
Dimensionality!reduction!!!!!λ1 ,λ2 ,...,λℓ : !largest!ℓ!eigenvalues!of!R
!!!!!x̂ = ajq jj=1
ℓ
∑ = [q1 ,q2 ,...,qℓ ]
a1a2"aℓ
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
, !!!ℓ ≤m
Encoder!for!x:!linear!projection!from!#m !to!#ℓ
!!!!!
a1a2"aℓ
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
= !!
q1T
q2T
"qℓT
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
x , !!!!!!!!!!ℓ ≤m
Figure 8.2 Two phases of PCA(a) Encoding, (b) Decoding
11
8.4 Principal-‐Components Analysis (6/8)
!!!
Approximation!error!vector:!!!!!e = x 0 x̂
!!!!!e = aiqii=ℓ+1
m
∑
Figure 8.3: Relationship betweendata vector x,its reconstructed version and error vector e.
!!x̂
12
8.4 Principal-‐Components Analysis (7/8)
Figure 8.4: A cloud of data points. Projection onto Axis 1 has maximum variance and shows bimodal.
13
8.4 Principal-‐Components Analysis (8/8)
Figure 8.5: Digital compression of handwritten digits using PCA.
14
8.5 Hebbian-‐Based Maximum Eigenfilter (1/4)
!!
Linear!neuron!with!Hebbian!adaptation
!!!!!y = wixii=ℓ+1
m
∑Synaptic!weight!wi !varies!with!time!!!!!wi(n+1)=wi(n)+η y(n)xi(n),!!!!i =1,2,...,m!!!!!"!!!!!wi(n+1)=wi(n)+η y(n) xi(n)− y(n)wi(n)( )
xi '(n)= xi(n)− y(n)wi(n)
!!!!!wi(n+1)=wi(n)+η y(n)xi '(n)15
8.5 Hebbian-‐Based Maximum Eigenfilter (2/4)
Figure 8.6: Signal-‐flow graph representation of maximum eigenfilter
16
8.5 Hebbian-‐Based Maximum Eigenfilter (3/4)
!!!
Matrix!formulation!!!!!x(n)= [x1(n),x2(n),...,xm(n)]T
!!!!!w(n)= [w1(n),w2(n),...,wm(n)]T
!!!!!y(n)= xT(n)w(n)=wT(n)x(n)!!!!!w(n+1)=w(n)+η y(n)[x(n)− y(n)w(n)]!!!!!!!!!!!!!!!!!!!!! =w(n)+ηxT(n)w(n)[x(n)−wT(n)x(n)w(n)]!!!!!!!!!!!!!!!!!!!!! =w(n)+η[xT(n)x(n)w(n)−wT(n)x(n)xT(n)w(n)w(n)]
17
8.5 Hebbian-‐Based Maximum Eigenfilter (4/4)
18!!!
Aymptotic!stability!of!maximum!eigenfilter!!!!!w(t)→q1 !!!!!!!!!as!!t→∞
A!single!linear!neuron!governed!by!the!self;organizing!learning!ruleadaptively!extracts!the!first!principal!component!of!a!stationary!input.!!!!!x(n)= y(n)q1 !!!!!!!!!for!!n→∞
A!Hebbian;based!linear!neuron!with!learning!rulew(n+1)=w(n)+η y(n)[x(n)− y(n)w(n)]converges!with!probability!1!to!a!fixed!point:
1)!limn→∞
σ 2(n)= λ12)!lim
n→∞w(n)=q1 !!!with!!! limn→∞
||w(n)|| ! = !1!
8.6 Hebbian-‐Based PCA (1/3)
Figure 8.7: Feedforward network with a single layer of computational nodes
19
!!
linear!neuron!!!!!!!m!inputs,!ℓ!outputs:!ℓ! < !mwji : !synaptic!weight!from!i !to!j
!!!!y j(n)= wji(n)xi(n)i=1
m
∑ !!!!!!!!!!!!!!j =1,!2,!..., !ℓ
8.6 Hebbian-‐Based PCA (2/3)
20
!!!
Generalized!Hebbian!Algorithm!(GHA)
!!!!!!y j(n)= wji(n)xi(n)i=1
m
∑ !!!!!!!!!!!!!!j =1,!2,!..., !ℓ
Weight!update!rule:
!!!!!Δwji(n)=η y j(n)xi(n)− y j(n) wki(n)yk(n)k=1
j
∑⎛
⎝⎜⎞
⎠⎟
Rewriting!as!!!!!Δwji(n)=η y j(n) xi '(n)−wji(n)y j(n)⎡⎣ ⎤⎦ !
!!!!!!xi '(n)= xi(n)− wki(n)yk(n)k=1
j−1
∑
8.6 Hebbian-‐Based PCA (3/3)
21Figure 8.8: Signal-‐flow graph of GHA
!!
Δwji(n)=η y j(n)xi ''(n)xi ''(n)= xi '(n)−wji(n)y j(n)
wji(n+1)=wji(n)+Δwji(n)wji(n)= z−1[wji(n+1)]
8.7 Case Study: Image Decoding (1/3)
22
Figure 8.9: Signal-‐flow graph representation of how the reconstructed vector x^ is computed in the GHA.
!!! x̂(n)= yk(n)qk
k=1
ℓ
∑
8.7 Case Study: Image Decoding (2/3)
23
Figure 8.10: (a) An image of Lena used in the image-‐coding experiment. (b) 8 x 8 masks representing the synaptic weights learned by the GHA. (c) Reconstructed image of Lena obtained using the dominant 8 principal components without quantization. (d) Reconstructed image of Lena with an 11-‐to-‐1 compression ratio using quantization
8.7 Case Study: Image Decoding (3/3)
24
Figure 8.11: Image of peppers
Summary and Discussion• PCA = dimensionality reduction = compression• Generalized Hebbian algorithm (GHA) = Neural algorithm for
PCA• Dimensionality reduction
1) Representation of data2) Reconstruction of data
• Two views of unsupervised learning1) Bottom-‐up view2) Top-‐down view
• Nonlinear PCA methods1) Hebbian networks2) Replicator networks or autoencoders3) Principal curves4) Kernel PCA
25
!!!
a =
a1a2!aℓ
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
= !!
q1T
q2T
!qℓT
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
x , !!!ℓ ≤m
!!!
x̂ = ajq jj=1
ℓ
∑ = [q1 ,q2 ,...,qℓ ]
a1a2"aℓ
⎡
⎣
⎢⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥⎥
, !!!ℓ ≤m
!!!
!!!!!e = x " x̂
!!!!!e = aiqii=ℓ+1
m
∑