Download - Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

Chapter 8. Principal-‐Components Analysis

Neural Networks and Learning Machines (Haykin)

Lecture Notes of Self-‐learning Neural Algorithms

Byoung-‐Tak ZhangSchool of Computer Science and Engineering

Seoul National University

1

Contents

8.1 Introduction ……………………………………………………..….. 38.2 Principles of Self-‐organization ………………………………. 48.3 Self-‐organized Feature Analysis ………………………….... 68.4 Principal-‐Components Analysis …………………………..... 88.5 Hebbian-‐Based Maximum Eigenfilter ……………....…. 158.6 Hebbian-‐Based PCA …………………………………………….. 198.7 Case Study: Image Decoding ……………………………….. 22Summary and Discussion …………………………..…………….. 25

2

8.1 Introductionn Supervised learning

• Learning from labeled examplesn Semisupervised learning

• Learning from unlabeled and labeled examplesn Unsupervised learning

• Learning from examples without a teacherl Self-‐organized learning

• Neurobiological considerations• Locality of learning (immediate local behavior of neurons)

l Statistical learning theory• Mathematical considerations• Less emphasis on locality of learning

3

8.2 Principles of Self-‐Organization (1/2)

n Principle 1: Self-‐amplification (self-‐reinforcement)l Synaptic modification self-‐amplifies by Hebb’s postulate of

learning1) If two neurons of a synapse are activated simultaneously, then

synaptic strength is selectively increased.2) If two neurons of a synapse are activated asynchronously,

then synaptic strength is selectively weakened or eliminated.

l Four key mechanisms of Hebbian synapsel Time-‐dependent mechanisml Local mechanisml Interactive mechanisml Conjunctional or correlational mechanism

!!Δwkj(n)=η yk(n)x j(n)

4

8.2 Principles of Self-‐Organization (2/2)

n Principle 2: Competition• Limitation of available resources• The most vigorously growing (fittest) synapses or neurons are

selected at the expense of the others.• Synaptic plasticity (adjustability of a synaptic weight)

n Principle 3: Cooperation• Modifications in synaptic weights at the neural level and in

neurons at the network level tend to cooperate with each other.• Lateral interaction among a group of excited neurons

n Principle 4: Structural information• The underlying structure (redundancy) in the input signal is

acquired by a self-‐organizing system• Inherent characteristic of the input signal

5

8.3 Self-‐organized Feature Analysis

Figure 8.1 Layout of modular self-‐adaptive Linsker’s model, with overlapping receptive fields. Mammalian visual system model.

6

8.4 Principal-‐Components Analysis (1/8)

Does there exist an invertible linear transformation T such that the truncation of Tx is optimum in the mean-‐square-‐error sense?

!!!

x :m#dimentional!vectorX :m#dimentional!random!vectorq :m#dimentional!unit!vectorProjection :!!!!!!!A= XTq =qTXVariance!of!A:!!!!!!σ 2 =E[A2]=E[(qTX)(XTq)]=qTE[XXT ]q =qTRqR :m#by#m!correlation!matrix!!!!!!R =Ε[XXT ] 7


!!!

!!!!!ψ (q)= !σ 2 =qTRq!!!!!!!!!!!!!!**For!any!small!perturbation!δq :!!!!!ψ (q+δq)=ψ (q).!.!.!.!.!.!.Introduce!a!scalar!factor!λ:!!!!!Rq = λq!!!!!!!(eigenvalue!problem)

!!!

λ1 ,λ2 ,...,λm : !Eigenvalues!of!Rq1 ,q2 ,...,qm : !Eigenvectors!of!R!!!!!Rq j = λ jq j !!!!!!!!j =1,!2,!..., !m!!!!!λ1 > λ2 >!> λ j >!> λm!!!!!Q = [q1 ,q2 ,...,q j ,...,qm]!!!!!!!RQ =QΛ!!

!!!

!!!!!RQ =QΛEigen!decomposition:!!!!!i)!QTRQ = Λ

!!!!!!!!!!q jTRq j =

λ j , !!!!k = j0,!!!!!!k ≠ j

⎧⎨⎪

⎩⎪!!!!!!!**

!!!!!ii)!R =QΛQT = λii=1

m

∑ qiqiT

!!!!!!!!!!!(spectral!theorem)

!!!

From!!**,!we!see!that!!!!!ψ (q j )= !λ j !!!!!!!j =1,2,...,m 8

• Summary of the eigenstructure of PCA1) The eigenvectors of the correlation matrix R

for the random vector X define the unit vectors qj, representing the principal directions along with the variance probes have their extremal values.

2) The associated eigenvalues define the extremal values of the variance probes


!!!ψ (q j )

!!!ψ (u j )

9


!!!

Data!vector!x : !a!realization!of!Xa : !a!realization!of!A!!!!!aj =q j

Tx = xTq j !!!!!!!!!!!j =1,2,...,maj : !the!projections!of!x !onto!principal!directions!!!!!!!!(principal!components)

Reconstruction!(synthesis)!of!the!original!data!x :!!!!!a = [a1 ,a2 ,...,am]T = [xTq1 ,xTq2 ,...,xTqm]T =QTx

!!!!!Qa =QQTx = Ix = x !!!!!

!!!!!x =Qa = ajq jj=1

m

∑10


!!!

Dimensionality!reduction!!!!!λ1 ,λ2 ,...,λℓ : !largest!ℓ!eigenvalues!of!R

!!!!!x̂ = ajq jj=1

ℓ

∑ = [q1 ,q2 ,...,qℓ ]

a1a2"aℓ

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

, !!!ℓ ≤m

Encoder!for!x:!linear!projection!from!#m !to!#ℓ

!!!!!

a1a2"aℓ

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

= !!

q1T

q2T

"qℓT

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

x , !!!!!!!!!!ℓ ≤m

Figure 8.2 Two phases of PCA(a) Encoding, (b) Decoding

11


!!!

Approximation!error!vector:!!!!!e = x 0 x̂

!!!!!e = aiqii=ℓ+1

m

∑

Figure 8.3: Relationship betweendata vector x,its reconstructed version and error vector e.

!!x̂

12


Figure 8.4: A cloud of data points. Projection onto Axis 1 has maximum variance and shows bimodal.

13


Figure 8.5: Digital compression of handwritten digits using PCA.

14

8.5 Hebbian-‐Based Maximum Eigenfilter (1/4)

!!

Linear!neuron!with!Hebbian!adaptation

!!!!!y = wixii=ℓ+1

m

∑Synaptic!weight!wi !varies!with!time!!!!!wi(n+1)=wi(n)+η y(n)xi(n),!!!!i =1,2,...,m!!!!!"!!!!!wi(n+1)=wi(n)+η y(n) xi(n)− y(n)wi(n)( )

xi '(n)= xi(n)− y(n)wi(n)

!!!!!wi(n+1)=wi(n)+η y(n)xi '(n)15


Figure 8.6: Signal-‐flow graph representation of maximum eigenfilter

16


!!!

Matrix!formulation!!!!!x(n)= [x1(n),x2(n),...,xm(n)]T

!!!!!w(n)= [w1(n),w2(n),...,wm(n)]T

!!!!!y(n)= xT(n)w(n)=wT(n)x(n)!!!!!w(n+1)=w(n)+η y(n)[x(n)− y(n)w(n)]!!!!!!!!!!!!!!!!!!!!! =w(n)+ηxT(n)w(n)[x(n)−wT(n)x(n)w(n)]!!!!!!!!!!!!!!!!!!!!! =w(n)+η[xT(n)x(n)w(n)−wT(n)x(n)xT(n)w(n)w(n)]

17


18!!!

Aymptotic!stability!of!maximum!eigenfilter!!!!!w(t)→q1 !!!!!!!!!as!!t→∞

A!single!linear!neuron!governed!by!the!self;organizing!learning!ruleadaptively!extracts!the!first!principal!component!of!a!stationary!input.!!!!!x(n)= y(n)q1 !!!!!!!!!for!!n→∞

A!Hebbian;based!linear!neuron!with!learning!rulew(n+1)=w(n)+η y(n)[x(n)− y(n)w(n)]converges!with!probability!1!to!a!fixed!point:

1)!limn→∞

σ 2(n)= λ12)!lim

n→∞w(n)=q1 !!!with!!! limn→∞

||w(n)|| ! = !1!

8.6 Hebbian-‐Based PCA (1/3)

Figure 8.7: Feedforward network with a single layer of computational nodes

19

!!

linear!neuron!!!!!!!m!inputs,!ℓ!outputs:!ℓ! < !mwji : !synaptic!weight!from!i !to!j

!!!!y j(n)= wji(n)xi(n)i=1

m

∑ !!!!!!!!!!!!!!j =1,!2,!..., !ℓ


20

!!!

Generalized!Hebbian!Algorithm!(GHA)

!!!!!!y j(n)= wji(n)xi(n)i=1

m

∑ !!!!!!!!!!!!!!j =1,!2,!..., !ℓ

Weight!update!rule:

!!!!!Δwji(n)=η y j(n)xi(n)− y j(n) wki(n)yk(n)k=1

j

∑⎛

⎝⎜⎞

⎠⎟

Rewriting!as!!!!!Δwji(n)=η y j(n) xi '(n)−wji(n)y j(n)⎡⎣ ⎤⎦ !

!!!!!!xi '(n)= xi(n)− wki(n)yk(n)k=1

j−1

∑


21Figure 8.8: Signal-‐flow graph of GHA

!!

Δwji(n)=η y j(n)xi ''(n)xi ''(n)= xi '(n)−wji(n)y j(n)

wji(n+1)=wji(n)+Δwji(n)wji(n)= z−1[wji(n+1)]

8.7 Case Study: Image Decoding (1/3)

22

Figure 8.9: Signal-‐flow graph representation of how the reconstructed vector x^ is computed in the GHA.

!!! x̂(n)= yk(n)qk

k=1

ℓ

∑


23

Figure 8.10: (a) An image of Lena used in the image-‐coding experiment. (b) 8 x 8 masks representing the synaptic weights learned by the GHA. (c) Reconstructed image of Lena obtained using the dominant 8 principal components without quantization. (d) Reconstructed image of Lena with an 11-‐to-‐1 compression ratio using quantization


24

Figure 8.11: Image of peppers

Summary and Discussion• PCA = dimensionality reduction = compression• Generalized Hebbian algorithm (GHA) = Neural algorithm for

PCA• Dimensionality reduction

1) Representation of data2) Reconstruction of data

• Two views of unsupervised learning1) Bottom-‐up view2) Top-‐down view

• Nonlinear PCA methods1) Hebbian networks2) Replicator networks or autoencoders3) Principal curves4) Kernel PCA

25

!!!

a =

a1a2!aℓ

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

= !!

q1T

q2T

!qℓT

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

x , !!!ℓ ≤m

!!!

x̂ = ajq jj=1

ℓ

∑ = [q1 ,q2 ,...,qℓ ]

a1a2"aℓ

⎡

⎣

⎢⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥⎥

, !!!ℓ ≤m

!!!

!!!!!e = x " x̂

!!!!!e = aiqii=ℓ+1

m

∑

Download - Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

Top Related