information geometry and neural netowrks

Post on 12-Jan-2016

40 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Information Geometry and Neural Netowrks. Shun-ichi Amari RIKEN Brain Science Institute Orthogonal decomposition of rates and (higher-order) correlations Synchronous firing and higher correlations Algebraic singularities caused by multiple stimuli - PowerPoint PPT Presentation

TRANSCRIPT

Information Geometryand Neural Netowrks

Shun-ichi Amari RIKEN Brain Science Institute  Orthogonal decomposition of rates and (higher-order) correlations

Synchronous firing and higher correlations

Algebraic singularities caused by multiple stimuli

Dynamics of learning in multiplayer perceptrons

Information GeometryInformation GeometryInformation GeometryInformation Geometry

Systems Theory Information Theory

Statistics Neural Networks

Combinatorics PhysicsInformation Sciences

Riemannian ManifoldDual Affine Connections

Manifold of Probability Distributions

Math. AI

2

2

1; , ; , exp

22

xS p x p x

Information GeometryInformation Geometry ? ?Information GeometryInformation Geometry ? ?

p x

;S p x θ

Riemannian metric

Dual affine connections

( , ) θ

Manifold of Probability DistributionsManifold of Probability DistributionsManifold of Probability DistributionsManifold of Probability Distributions

1 2 3 1 2 3

1,2,3 { ( )}

, , 1

x p x

p p p p p p

3p

2p1p

p

;M p x

Two StructuresTwo StructuresTwo StructuresTwo Structures

Riemannian metric and affine connectionRiemannian metric and affine connection

2

2

: log

1, : ,

2

ij i j

p

ds g d d

p xD p q E

q x

ds D p x p x d

Fisher informationFisher information

log logiji j

g E p p

Riemannian Structure

2 ( )

( )

( ) ( )

Euclidean

i jij

T

ij

ds g d d

d G d

G g

G E

Affine Connection

covariant derivative

geodesic X=X X=X(t)

( )

c

i jij

X Y

s g d d

minimal distance

straight line

1 2{ ( , )}S p x x1 2, 0,1x x

1 2{ ( ) ( )}M q x q x

Independent Distributions

Neural Firing

1x 2x 3x nx

higher-order correlations

orthogonal decomposition

1 2( ) ( , ,..., )np p x x xx

[ ]i iE x

[ , ]ij i jv Cov x x

----firing rate

----covariance

Information Geometryof Higher-Order Correlations ----orthogonal decomposition

Information Geometryof Higher-Order Correlations ----orthogonal decomposition

Riemannian metric

dual affine connections

Pythagoras theorem

Dual geodesics

,S p x

Correlations of Neural FiringCorrelations of Neural Firing

1 2

00 10 01 11

1 1

2 1

,

, , ,

p x x

p p p p

p

p

11 00

10 01

logp p

p p

1x 2x

2

1

1 2{( , ), } orthogonal coordinates

firing ratescorrelations

   00110001011010100100110100

0101101001010

firing rates:correlation—covariance?

1x

2x

3x

00 01 10 11{ , , , }p p p p

1 2 12, ;

1 2{ ( , )}S p x x1 2, 0,1x x

1 2{ ( ) ( )}M q x q x

Independent Distributions

Pythagoras Theorem

p

qr

D[p:r] = D[p:q]+D[q:r]

p,q: same marginals

r,q: same correlations

1 2,

independent

correlations

( )[ : ] ( ) log

( )x

p xD p r p x

q x

estimation correlationtesting

invariant under firing rates

01100101……. 110001011001……. 101000111100……. 1001

1x

2x

3x

No pairwise correlations, Triplewise correlation

1 2 3 1 2 3

1 2 1 2

( , , ) ( ) ( ) ( )

( , ) ( ) ( )

p x x x p x p x p x

p x x p x p x

Pythagoras Decomposition of KL Divergence

( )p x

( )indp x

( )pairwise corrp x

only pairwise

independent

Higher-Order Correlations

1 2, , ,

exp

n

i i ij i j ijk i j k

x x x

p x x x x x x

x

x

0M

1M

[ ]

[ ]i i

ij i j

E x

E x x

( , , ,...)

( , , ,...)

i ij ijk

i ij ijk

Synfiring andHigher-Order Correlations

Amari, Nakahara, Wu, Sakai

Neurons

1x nx

1i ix u

Gaussian [ ]i i ju E u u

2x

Population and Synfire

Population and Synfire

hswu jiji ii ux 1

(1 )i iu h

, 0, 1i N

s

1x nx

2

[ ]

[ ] 1

i j

i

E u u

E u

timesame at the fire neurons Prob ipi

(1 )

Pr{ 1} Pr{ 0}

i n in i

i i

C F F

F x u

Pr{ }1

i

h

timesame at the fire neurons Prob ipi

Pr{ neurons fire}r

ir P nr

n

( , ) nH r nzq r e e d FrFr

nz 1 log 1 log

2

2

dt 2

1 2

0

2thaehaFF

1 22 1( , ) exp[ { ( ) } ]

2(1 ) 2 1q r c F h

1 2

1...

( , ) exp{ ...}

(1/ )k

i i ij i j ijk i j k

ki i i

p x x x x x x

O n

x

Synfiring

1( ) ( ,..., )

1n

i

p p x x

r x q rn

x

( )q r

r

Bifurcation

r

rP

ix : independent---single delta peak pairwise correlated

higher-order correlation !

Shun-ichi AmariRIKEN Brain Science Institute

amari@brain.riken.go.jp

Collaborators: Si Wu   Hiro Nakahara

Field Theory of Population CodingField Theory of Population Coding

* *|x r z x

*r z f z x z

2

2exp

2

zf z

a

Population Coding and Neural Field

z

Population Encoding

r z f z x z

ˆdecoding r z x

x

f (z-x)

r(z)

z

z

Noise

2

2

22

0

' '

', ' 1 ' exp

2

z

n z z h z z

z zh z z n z z n

b

b

z

Probability Model

2

12

( ) exp2

nQ r z x c r z f z x h r z f z x

1 1 , ' ' 'r z h r z r z h z z r z dzdz

1 ' ' '' ' '', , h z z h z z dz z z

r z f z x z

Fisher information

2*

* | log

dx

xrQdExI

Cramer-Rao

)(

*

2*

xIxxE

Fourier Analysis

1

2i zf z F f z e dz

' 1

2i zh z z H h z e dz

222

22

FnI d

H

Fisher Information

2 2

2 2

2

22 2

21 2

a

b

n eI d

n b n e

3 2

3 2

1) No correlation 0

2) Uniform correlations

1

nI

ab

nI

a

2 3

2

3) Limited range correlations

1

1 '

14) Wide range correlations:

10 1

5) Special case: 1, 2

cb

nn

Ia c

bn

I A dc

b a

I

Dynamics of Neural Fields

, , ,

u z tu z t w z z u z t dz

uc r z

ShapingDetectingDecoding

How the Brain Solves Singularity in Population Coding

S. Amari and H. Nakahara

RIKEN Brain Science Institute

1x 2xZ

1x 2xZ

Neural Activity

1 2

11 2 2

1

1; , , exp

2

log log

: Fisher information matrix

iji j

ij

r z v z x v z x z

Q r z v x x r f h r f

Q QI E

I I

Parameter Space

v

1x2x

2 1

1 2

1

: difference

1 : center of gravity

, ,

Fisher information degenerates as 0

Cramer-Raoparadigm: error

u x x

w v x vx

w u v

u

I

2 2 1 3 3 1 1

2 3

1

2

3

; 1

1 1 2 1, ,

2 6

f z H z H z z

v v v v vw u u

g

I g

g

: Jacobian singular

T

J

I J I J

2

3

2

~ 1

1~

1~

1~i

w O

u Ou

v Ou

x Ou

w

synfiring resolves singularity

1 1 2

2 1 2

phase 1:

:

f z v z x v z x

f z v z x v z x

1 , 1v v

: regular as 0I u

1x 2xZ

1x 2xZ

synfiring mechanism

1z

2z

common multiplicative noisecommon multiplicative noise

S.Amari and H.Nagaoka,

Methods of Information GeometryAMS &Oxford Univ Press, 2000

Mathematical Neurons

i iy w x h w x

x y( )u

u

Multilayer Perceptrons

i iy v n w x

21; exp ,

2

, i i

p y c y f

f v

x x

x w x

x y

1 2( , ,..., )nx x x x

1 1( ,..., ; ,..., )m mw w v v

Multilayer Perceptron

1 1,

,

, ; ,

i i

m m

y f

v

v v

x θ

w x

θ w w

neuromanifold( )x

space of functions

Neuromanifold

• Metrical structure

• Topological structure

Riemannian manifold

22

ij i j

T

ds d

g d d

d G d

j

i

d

log ( | ; ) log ( | ; )( ) [ ]ij

i j

p y x p y xg E

Geometry of singular modelGeometry of singular model

y v n w x

v| | 0v w

Gaussian mixtureGaussian mixture

1 2 1 2; , , 1p x v w w v x w v x w

21 1exp

22x x

1 2: singular , 1 0 w w v v

1w

2w

v

Topological Singularities

S

M

singularities

Singularity of MLP---example

Backpropagation ---gradient learningBackpropagation ---gradient learningBackpropagation ---gradient learningBackpropagation ---gradient learning

1 1

2

examples : , , , training set

1( , ; ) ,

2 log , ;

t ty y

E y x y f

p y

x x

x

x

,

t t

i i

E

f v

x w x

Information Geometry of MLPInformation Geometry of MLP

Natural Gradient Learning : S. Amari ; H.Y. Park

1

1 1 1 11 1 T

t t t t

EG

G G G f f G

1 1 2 2( ) ( )y v w x v w x n

1 2

1 2

w w w

v v v

2 1

2 1

u w w

z v v

x y

1w

2w

z

1w

2w

1v

2v

2 hidden-units

1 1 2

1 2

1 2

2 1

2 1

2

: y v v n

w w w

v

u w w

v vz

v

v

v

2w x w x

Dynamics of Learning

1,

( , ), ( , )

( , ),

( , )

d dl G l

dt dt

du dzf u z k u z

dt dt

du f u z

dz k u z

2 2 1

log2

u z z c

The teacher is on singularity

2 2 3

2 4

2

1( )4

1( )4

1( )4

duA z u

dtdz

A z zudt

dz zu

du z

2 2 1log

2u z z c

The teacher is on singularity

2 2 3

2 4

2

1( )4

1( )4

1( )4

duA z u

dtdz

A z zudt

dz zu

du z

2 2 1log

2u z z c

top related