chapter(8.( principal0components(analysischapter(8.(principal0components(analysis...

25
Chapter 8. PrincipalComponents Analysis Neural Networks and Learning Machines (Haykin) Lecture Notes of Selflearning Neural Algorithms ByoungTak Zhang School of Computer Science and Engineering Seoul National University 1

Upload: others

Post on 17-Mar-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

Chapter  8.  Principal-­‐Components  Analysis

Neural  Networks  and  Learning  Machines  (Haykin)

Lecture  Notes  of  Self-­‐learning  Neural  Algorithms

Byoung-­‐Tak  ZhangSchool  of  Computer  Science  and  Engineering

Seoul  National  University

1

Page 2: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

Contents

8.1  Introduction    ……………………………………………………..…..  38.2  Principles  of  Self-­‐organization    ……………………………….  48.3  Self-­‐organized  Feature  Analysis    …………………………....  68.4  Principal-­‐Components  Analysis ………………………….....  88.5  Hebbian-­‐Based  Maximum  Eigenfilter ……………....….  158.6  Hebbian-­‐Based  PCA    ……………………………………………..  198.7  Case  Study:  Image  Decoding    ………………………………..  22Summary  and  Discussion      …………………………..……………..  25

2

Page 3: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

8.1  Introductionn Supervised  learning

• Learning  from  labeled  examplesn Semisupervised learning

• Learning  from  unlabeled  and  labeled  examplesn Unsupervised  learning

• Learning  from  examples  without  a  teacherl Self-­‐organized   learning

• Neurobiological  considerations• Locality  of  learning  (immediate  local  behavior  of  neurons)

l Statistical  learning  theory• Mathematical  considerations• Less  emphasis  on  locality  of  learning

3

Page 4: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

8.2  Principles  of  Self-­‐Organization  (1/2)

n Principle  1:  Self-­‐amplification (self-­‐reinforcement)l Synaptic  modification  self-­‐amplifies   by  Hebb’s postulate  of  

learning1) If  two  neurons  of  a  synapse  are  activated  simultaneously,   then  

synaptic  strength  is  selectively   increased.2) If  two  neurons  of  a  synapse  are  activated  asynchronously,  

then  synaptic  strength  is  selectively   weakened or  eliminated.

l Four  key  mechanisms  of  Hebbian synapsel Time-­‐dependent  mechanisml Local  mechanisml Interactive  mechanisml Conjunctional  or  correlational  mechanism

!!Δwkj(n)=η yk(n)x j(n)

4

Page 5: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

8.2  Principles  of  Self-­‐Organization  (2/2)

n Principle  2:  Competition• Limitation  of  available  resources• The  most  vigorously  growing  (fittest)  synapses  or  neurons  are  

selected  at  the  expense  of  the  others.• Synaptic  plasticity (adjustability  of  a  synaptic  weight)

n Principle  3:  Cooperation• Modifications  in  synaptic  weights  at  the  neural  level and  in  

neurons  at  the  network  level tend  to  cooperate  with  each  other.• Lateral  interaction  among  a  group  of  excited  neurons

n Principle  4:  Structural   information• The  underlying   structure  (redundancy)  in  the  input  signal  is  

acquired by  a  self-­‐organizing   system• Inherent  characteristic  of  the  input  signal

5

Page 6: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

8.3  Self-­‐organized  Feature  Analysis

Figure  8.1  Layout  of  modular  self-­‐adaptive  Linsker’s model,  with  overlapping  receptive  fields.  Mammalian  visual  system  model.

6

Page 7: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

8.4  Principal-­‐Components  Analysis (1/8)

Does  there  exist  an  invertible   linear   transformation  T such  that  the  truncation   of  Tx is  optimum  in  the  mean-­‐square-­‐error   sense?

!!!

x :m#dimentional!vectorX :m#dimentional!random!vectorq :m#dimentional!unit!vectorProjection :!!!!!!!A= XTq =qTXVariance!of!A:!!!!!!σ 2 =E[A2]=E[(qTX)(XTq)]=qTE[XXT ]q =qTRqR :m#by#m!correlation!matrix!!!!!!R =Ε[XXT ] 7

Page 8: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

8.4  Principal-­‐Components  Analysis  (2/8)

!!!

!!!!!ψ (q)= !σ 2 =qTRq!!!!!!!!!!!!!!**For!any!small!perturbation!δq :!!!!!ψ (q+δq)=ψ (q).!.!.!.!.!.!.Introduce!a!scalar!factor!λ:!!!!!Rq = λq!!!!!!!(eigenvalue!problem)

!!!

λ1 ,λ2 ,...,λm : !Eigenvalues!of!Rq1 ,q2 ,...,qm : !Eigenvectors!of!R!!!!!Rq j = λ jq j !!!!!!!!j =1,!2,!..., !m!!!!!λ1 > λ2 >!> λ j >!> λm!!!!!Q = [q1 ,q2 ,...,q j ,...,qm]!!!!!!!RQ =QΛ!!

!!!

!!!!!RQ =QΛEigen!decomposition:!!!!!i)!QTRQ = Λ

!!!!!!!!!!q jTRq j =

λ j , !!!!k = j0,!!!!!!k ≠ j

⎧⎨⎪

⎩⎪!!!!!!!**

!!!!!ii)!R =QΛQT = λii=1

m

∑ qiqiT

!!!!!!!!!!!(spectral!theorem)

!!!

From!!**,!we!see!that!!!!!ψ (q j )= !λ j !!!!!!!j =1,2,...,m 8

Page 9: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

• Summary  of  the  eigenstructure of  PCA1) The  eigenvectors  of  the  correlation  matrix  R

for  the  random  vector  X define  the  unit  vectors  qj,  representing  the  principal  directions  along  with  the  variance  probes  have  their  extremal values.

2) The  associated  eigenvalues  define  the  extremal values  of  the  variance  probes  

8.4  Principal-­‐Components  Analysis  (3/8)

!!!ψ (q j )

!!!ψ (u j )

9

Page 10: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

8.4  Principal-­‐Components  Analysis  (4/8)

!!!

Data!vector!x : !a!realization!of!Xa : !a!realization!of!A!!!!!aj =q j

Tx = xTq j !!!!!!!!!!!j =1,2,...,maj : !the!projections!of!x !onto!principal!directions!!!!!!!!(principal!components)

Reconstruction!(synthesis)!of!the!original!data!x :!!!!!a = [a1 ,a2 ,...,am]T = [xTq1 ,xTq2 ,...,xTqm]T =QTx

!!!!!Qa =QQTx = Ix = x !!!!!

!!!!!x =Qa = ajq jj=1

m

∑10

Page 11: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

8.4  Principal-­‐Components  Analysis  (5/8)

!!!

Dimensionality!reduction!!!!!λ1 ,λ2 ,...,λℓ : !largest!ℓ!eigenvalues!of!R

!!!!!x̂ = ajq jj=1

∑ = [q1 ,q2 ,...,qℓ ]

a1a2"aℓ

⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥

, !!!ℓ ≤m

Encoder!for!x:!linear!projection!from!#m !to!#ℓ

!!!!!

a1a2"aℓ

⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥

= !!

q1T

q2T

"qℓT

⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥

x , !!!!!!!!!!ℓ ≤m

Figure  8.2  Two  phases  of  PCA(a)  Encoding,  (b)  Decoding

11

Page 12: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

8.4  Principal-­‐Components  Analysis  (6/8)

!!!

Approximation!error!vector:!!!!!e = x 0 x̂

!!!!!e = aiqii=ℓ+1

m

Figure  8.3:  Relationship  betweendata  vector  x,its  reconstructed  version  and  error  vector  e.

!!x̂

12

Page 13: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

8.4  Principal-­‐Components  Analysis  (7/8)

Figure  8.4:  A  cloud  of  data  points.  Projection  onto  Axis  1  has  maximum  variance  and  shows  bimodal.

13

Page 14: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

8.4  Principal-­‐Components  Analysis  (8/8)

Figure  8.5:  Digital  compression  of  handwritten  digits  using  PCA.

14

Page 15: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

8.5  Hebbian-­‐Based  Maximum  Eigenfilter (1/4)

!!

Linear!neuron!with!Hebbian!adaptation

!!!!!y = wixii=ℓ+1

m

∑Synaptic!weight!wi !varies!with!time!!!!!wi(n+1)=wi(n)+η y(n)xi(n),!!!!i =1,2,...,m!!!!!"!!!!!wi(n+1)=wi(n)+η y(n) xi(n)− y(n)wi(n)( )

xi '(n)= xi(n)− y(n)wi(n)

!!!!!wi(n+1)=wi(n)+η y(n)xi '(n)15

Page 16: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

8.5  Hebbian-­‐Based  Maximum  Eigenfilter (2/4)

Figure  8.6:  Signal-­‐flow  graph  representation  of  maximum  eigenfilter

16

Page 17: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

8.5  Hebbian-­‐Based  Maximum  Eigenfilter (3/4)

!!!

Matrix!formulation!!!!!x(n)= [x1(n),x2(n),...,xm(n)]T

!!!!!w(n)= [w1(n),w2(n),...,wm(n)]T

!!!!!y(n)= xT(n)w(n)=wT(n)x(n)!!!!!w(n+1)=w(n)+η y(n)[x(n)− y(n)w(n)]!!!!!!!!!!!!!!!!!!!!! =w(n)+ηxT(n)w(n)[x(n)−wT(n)x(n)w(n)]!!!!!!!!!!!!!!!!!!!!! =w(n)+η[xT(n)x(n)w(n)−wT(n)x(n)xT(n)w(n)w(n)]

17

Page 18: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

8.5  Hebbian-­‐Based  Maximum  Eigenfilter (4/4)

18!!!

Aymptotic!stability!of!maximum!eigenfilter!!!!!w(t)→q1 !!!!!!!!!as!!t→∞

A!single!linear!neuron!governed!by!the!self;organizing!learning!ruleadaptively!extracts!the!first!principal!component!of!a!stationary!input.!!!!!x(n)= y(n)q1 !!!!!!!!!for!!n→∞

A!Hebbian;based!linear!neuron!with!learning!rulew(n+1)=w(n)+η y(n)[x(n)− y(n)w(n)]converges!with!probability!1!to!a!fixed!point:

1)!limn→∞

σ 2(n)= λ12)!lim

n→∞w(n)=q1 !!!with!!! limn→∞

||w(n)|| ! = !1!

Page 19: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

8.6  Hebbian-­‐Based  PCA  (1/3)  

Figure  8.7:  Feedforward network  with  a  single  layer  of  computational  nodes

19

!!

linear!neuron!!!!!!!m!inputs,!ℓ!outputs:!ℓ! < !mwji : !synaptic!weight!from!i !to!j

!!!!y j(n)= wji(n)xi(n)i=1

m

∑ !!!!!!!!!!!!!!j =1,!2,!..., !ℓ

Page 20: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

8.6  Hebbian-­‐Based  PCA  (2/3)

20

!!!

Generalized!Hebbian!Algorithm!(GHA)

!!!!!!y j(n)= wji(n)xi(n)i=1

m

∑ !!!!!!!!!!!!!!j =1,!2,!..., !ℓ

Weight!update!rule:

!!!!!Δwji(n)=η y j(n)xi(n)− y j(n) wki(n)yk(n)k=1

j

∑⎛

⎝⎜⎞

⎠⎟

Rewriting!as!!!!!Δwji(n)=η y j(n) xi '(n)−wji(n)y j(n)⎡⎣ ⎤⎦ !

!!!!!!xi '(n)= xi(n)− wki(n)yk(n)k=1

j−1

Page 21: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

8.6 Hebbian-­‐Based  PCA  (3/3)

21Figure  8.8:  Signal-­‐flow  graph  of  GHA

!!

Δwji(n)=η y j(n)xi ''(n)xi ''(n)= xi '(n)−wji(n)y j(n)

wji(n+1)=wji(n)+Δwji(n)wji(n)= z−1[wji(n+1)]

Page 22: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

8.7 Case  Study:  Image  Decoding  (1/3)

22

Figure  8.9:  Signal-­‐flow  graph  representation  of  how  the  reconstructed  vector  x^  is  computed   in  the  GHA.

!!! x̂(n)= yk(n)qk

k=1

Page 23: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

8.7 Case  Study:  Image  Decoding  (2/3)

23

Figure  8.10:  (a)  An  image  of  Lena  used  in  the  image-­‐coding  experiment.  (b)  8  x  8  masks  representing  the  synaptic  weights  learned  by  the  GHA.  (c)  Reconstructed  image  of  Lena  obtained  using  the  dominant  8  principal  components  without  quantization.  (d)  Reconstructed  image  of  Lena  with  an  11-­‐to-­‐1  compression  ratio  using  quantization

Page 24: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

8.7 Case  Study:  Image  Decoding  (3/3)

24

Figure  8.11:  Image  of  peppers

Page 25: Chapter(8.( Principal0Components(AnalysisChapter(8.(Principal0Components(Analysis Neural’Networks’and’Learning’Machines’ (Haykin) Lecture(Notes(of(Self0learning(NeuralAlgorithms

Summary  and  Discussion• PCA =  dimensionality  reduction  =  compression• Generalized  Hebbian algorithm  (GHA)  =  Neural  algorithm  for  

PCA• Dimensionality  reduction

1) Representation  of  data2) Reconstruction  of  data

• Two  views  of  unsupervised   learning1) Bottom-­‐up  view2) Top-­‐down  view

• Nonlinear  PCA  methods1) Hebbian networks2) Replicator  networks  or  autoencoders3) Principal  curves4) Kernel  PCA

25

!!!

a =

a1a2!aℓ

⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥

= !!

q1T

q2T

!qℓT

⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥

x , !!!ℓ ≤m

!!!

x̂ = ajq jj=1

∑ = [q1 ,q2 ,...,qℓ ]

a1a2"aℓ

⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥

, !!!ℓ ≤m

!!!

!!!!!e = x " x̂

!!!!!e = aiqii=ℓ+1

m