llu s a. belanche [email protected]/docencia/apren/2009-10/teoria/tema9.pdf · ejemplo del...

30
Hopfield networks Llu´ ıs A. Belanche [email protected] Soft Computing Research Group Dept. de Llenguatges i Sistemes Inform` atics (Software department) Universitat Polit` ecnica de Catalunya 2010-2011

Upload: hanguyet

Post on 21-Sep-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Lluıs A. [email protected]

Soft Computing Research Group

Dept. de Llenguatges i Sistemes Informatics (Software department)

Universitat Politecnica de Catalunya

2010-2011

Page 2: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Introduction

Content-addressable autoassociative memory

Deals with incomplete/erroneous keys: “Julia xxxxazar: Rxxxela”H

ENERGYFUNCTION

Basins of attraction

Attractors (memories)

A mechanical system tends to minimum-energy states (e.g. a pole)

Symmetric Hopfield networks have an analogous behaviour:

low-energy states ⇔ attractors ⇔ memories

Page 3: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Introduction

A Hopfield network is a single-

layer recurrent neural network,

where the only layer has n neurons

(as many as the input dimension)

Page 4: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Introduction

The data sample is a list of keys (aka patterns) L = {ξi | 1 ≤ i ≤ l}, with

ξi ∈ {0,1}n that we want to associate with themselves.

What is the meaning of autoassociativity? when a new pattern Ψ is shown to

the network, the output is the key in L closest (in Hamming distance) to Ψ.

Evidently, this can be achieved by direct programming; yet in hardware

it is quite complex (and the Hopfield net does it quite distinctly)

We therefore set up two goals:

1. Content-addressable autoassociative memory (not easy)

2. Tolerance to small errors, that will be corrected (rather hard)

Page 5: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Intuitive analysis

The state of the network is the vector of activations e of the n

neurons. The state space is {0,1}n:

ξ1

ξ2

ξ3

.

.

.

001

000010

100

111

011

101

110

When the network evolves, it “traverses” the hypercube

Page 6: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Formal analysis (I)

For convenience, we change from {0,1} to {−1,+1}, by s = 2e − 1.

Dynamics of a neuron:

si := sgn

n∑

j=1

wijsj − θi

, with sgn(z) =

{

+1 : z ≥ 0−1 : z < 0

Let us simplify the analysis by using θi = 0, and uncorrelated keys ξi

(from a uniform distribution).

Page 7: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Formal analysis (II)

How are neurons chosen to update their activations?

1. Synchronous: all at once

2. Asynchronous: there is a sequential order or a fair probability dis-tribution

The memories do not change; the end attractor (the recovered me-mory) does

We will assume asynchronous updating

When does the process stop?

−→ When there are no more changes in the network state (we are ina stable state)

Page 8: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Formal analysis (III)

PROCEDURE:

1. Formula for setting the weights wij to store the keys ξi ∈ L

2. Sanity check that the keys ξi ∈ L are stable (they are autoassociated

with themselves)

3. Check that small perturbations of the keys ξi ∈ L are corrected (they

are associated with the closest pattern in L)

Page 9: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Formal analysis (IV)

Case l = 1 (only one pattern ξ):

Present ξ to the net: si := ξi, 1 ≤ i ≤ n

Stability:

si := sgn

n∑

j=1

wijsj

= sgn

n∑

j=1

wijξj

= ... should be ... = ξi

One way to get this is: wij = αξiξj, α > 0 ∈ R (called Hebbian learning)

sgn

n∑

j=1

wijξj

= sgn

n∑

j=1

αξiξjξj

= sgn

n∑

j=1

αξi

= sgn (αnξi) = sgn(ξi) = ξi

Page 10: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Formal analysis (V)

Take α = 1/n. This has a normalizing effect: ‖wi‖2 = 1/√

n

Therefore Wn×n = (wij) = 1nξ × ξ, i.e. wij = 1

nξiξj

If the perturbation consists in b flipped bits, where b ≤ (n − 1) ÷ 2,

then the sign of∑n

j=1 wijsj does not change!

=⇒ if the number of correct bits exceeds that of incorrect bits, the

network is able to correct the error bits

Page 11: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Formal analysis (VI)

Case l ≥ 1:

Superposition: wij = 1n

l∑

k=1ξkiξkj and Wn×n = 1

n

l∑

k=1ξk × ξk

Defining En×l = [ξ1, . . . , ξl], we get Wn×n = 1nEEt

Note Wn×n is a symmetric matrix

Page 12: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Formal analysis (VII)

Stability of a pattern ξv ∈ L:

Initialize the net to si := ξvi, 1 ≤ i ≤ n

Stability: si := sgn

(n∑

j=1

wijξvj

)

= sgn(hvi)

hvi =n∑

j=1

wijξvj = 1n

n∑

j=1

l∑

k=1

ξkiξkjξvj = 1n

n∑

j=1

(l∑

k=1,k 6=v

ξkiξkjξvj

)

+ ξvi ξvjξvj︸ ︷︷ ︸

+1

= ξvi +1n

n∑

j=1

l∑

k=1,k 6=v

ξkiξkjξvj = ξvi + crosstalk(v, i)

If |crosstalk(v, i)| < 1 then sgn(hvi) = sgn(ξvi + crosstalk(v, i)) = sgn(ξvi) = ξvi

Page 13: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Formal analysis (and VIII)

Analogously to the case l = 1, if the number of correct bits of a

pattern ξv ∈ L exceeds that of incorrect bits, the network is able to

correct the bits in error

(the pattern ξv ∈ L is indeed a memory)

When is |crosstalk(v, i)| < 1? If l << n nothing is wrong, but as we

have l closer to n, the crosstalk term is quite strong compared to ξv:

there is no stability

The million euro question: given a network of n neurons, what is the

maximum value for l? What is the capacity of the network?

Page 14: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Capacity analysis

For random uncorrelated patterns (for convenience):

lmax =

{ n2ln(n)+lnln(n)

: perfect recall

≈ 0,138n : small errors (≈ 1,6%)

Realistic patterns will not be random, though some coding can make

them more so

An alternative setting is to have negatively correlated patterns:

ξv · ξu ≤ 0, ∀v 6= u

This condition is equivalent to stating that the number of different bits is at least

the number of equal bits

Page 15: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Spurious states

When some pattern ξ is stored, the pattern −ξ is also stored. In

consequence, all initial configurations of ξ where the majority of bits

are incorrect will end up in −ξ!

−→ Hardly surprising: from the point of view of −ξ, there are more correct bits than

incorrect bits

The network can be made more stable (less spurious states) if we set

wii = 0 for all i

Wn×n = 1n

l∑

k=1ξk × ξk − l

nIn×n = 1n(EEt − lIn×n)

Page 16: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Example

ξ1 = (+1 − 1 + 1), ξ2 = (−1 + 1 − 1), E3×2 =

+1 −1−1 +1+1 −1

+1−1+1

(+1 − 1 + 1) =

+1 −1 +1−1 +1 −1+1 −1 +1

−1+1−1

(−1 + 1 − 1) =

+1 −1 +1−1 +1 −1+1 −1 +1

−→ 13

0 −2 +2−2 0 −2+2 −2 0

(+1 + 1 + 1)(−1 − 1 + 1)(+1 − 1 − 1)

−→ (+1 − 1 + 1),

(+1 + 1 − 1)(−1 − 1 − 1)(−1 + 1 + 1)

−→ (−1 + 1 − 1)

The network is able to correct 1 erroneous bit

Page 17: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Practical applications (I)4. Otros paradigmas Conexionistas 2. Memorias asociativas: 20

Convergencia y capacidad de la red de Hopfield

ENERGIA DE LA RED O FUNCION DE LYAPUNOV: para

PROPIEDAD 1:

PROPIEDAD 2: ¡acotada!

PROPIEDAD 3:

CONVERGENCIA: La red de Hopfield converge en un numero finito de iteraciones.

CAPACIDAD:

Redes Neuronales Artificiales . 2001-2002 Facultat d’Informatica – UPV: 18 de diciembre de 2001

4. Otros paradigmas Conexionistas 2. Memorias asociativas: 21

Ejemplo del comportamiento de una red de Hopfield

Una red de 120 nodos (12x10)

CONJUNTO DE OCHO PATRONES

SECUENCIA DE SALIDA PARA LA ENTRADA "3" DISTORSIONADA

0 1 2 3 4 5 6 7

Redes Neuronales Artificiales . 2001-2002 Facultat d’Informatica – UPV: 18 de diciembre de 2001

4. Otros paradigmas Conexionistas 2. Memorias asociativas: 22

Memorias heteroasociativas: memoria bidireccional asociativa

Capa Y

Capa Xw ij

1 2 m

1 2 n

APRENDIZAJE DE LOS PESOS:Dado con ,

para y

Redes Neuronales Artificiales . 2001-2002 Facultat d’Informatica – UPV: 18 de diciembre de 2001

4. Otros paradigmas Conexionistas 2. Memorias asociativas: 23

Funcionamiento sıncrono de la BAM

Entrada:

Inicializacion: , y ,

Iteracion: Propagacion de a

Propagacion de a

Convergencia: No se produzcan cambios en ni en :

Salida:

Redes Neuronales Artificiales . 2001-2002 Facultat d’Informatica – UPV: 18 de diciembre de 2001

A Hopfield network that stores and recovers several digits

Page 18: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Practical applications (II)

Ex: pattern-recognition– Handwritting recognition

– Reconeixement d’objectius militars

Una xarxa neu-ronal que haaprès a memo-ritzar formes otrossos de for-mes dels avionsserbis, desco-breix un aviómilitar amagatsota un avió civil.

A Hopfield network that stores fighter aircraft shapes is able to reconstruct them. The

network operates on the video images supplied by a camera on board a USAF bomber

Page 19: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

The energy function (I)

If the weights are symmetric, there exists a so-called energy function:

H(s) = −n∑

i=1

n∑

j>i

wijsisj

which is a function of the state of the system.

The central property of H is that it always decreases (or remains

constant) as the system evolves

Therefore the attractors (the stored memories) have to be the local

minima of the energy function

Page 20: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

The energy function (II)

Proposition 1:

A change of state in one neuron i yields a change ∆H < 0

Let hi =n∑

j=1wijsj. If neuron i changes is because si 6= sgn(hi)

Therefore ∆H = −(−si)hi − (−sihi) = 2sihi < 0

Proposition 2:

H(s) ≥ −n∑

i=1

n∑

j=1|wij| (the energy function is lower-bounded)

Page 21: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

The energy function (III)

The energy function H is also useful to derive the proposed weights

Consider the case of only one pattern ξ: we want H to be minimum when the correlationbetween the current state s and ξ is maximum and equal to 〈ξ, s〉2 = 〈ξ, ξ〉2 = ‖ξ‖2 = n

Therefore we choose H(s) = −K 1n〈ξ, s〉2, where K > 0 is a suitable constant

For the many-pattern case, we can try to make each of the ξi into local minima of H:

H(s) = −K1

n

n∑

v=1

〈ξv, s〉2 = −K1

n

n∑

v=1

(n∑

i=1

ξvisi

)2

= −K1

n

n∑

v=1

(n∑

i=1

ξvisi

)

n∑

j=1

ξvjsj

= −K

n∑

i=1

n∑

j=1

(

1

n

n∑

v=1

ξviξvj

)

sisj = −K

n∑

i=1

n∑

j=1

wijsisj = −1

2

n∑

i=1

n∑

j>i

wijsisj

Page 22: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

The energy function (and IV)

Exercises:

1. In the previous example,

a) Check that H(s) = 23(s1s2 − s2s3 + s1s3)

b) Show that H decreases as the network evolves towards the stored

patterns

2. Prove Proposition 2

Page 23: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Spurious states (continued)

We have not shown that the explicitly stored memories are the only

attractors of the system

We have seen that the reversed patterns −ξ are also stored (they have

the same energy than ξ)

There are stable mixture states, linear combinations of an odd num-

ber of patterns, e.g.:

ξ(mix)i = sgn

α∑

k=1

±ξvk,i

, 1 ≤ i ≤ n, α odd

Page 24: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Dynamics (continued)

Let us change the dynamics of a neuron by using an activation with

memory:

si(t + 1) =

+1 if hi(t + 1) > 0hi(t) if hi(t + 1) = 0−1 if hi(t + 1) < 0

where hi(t + 1) =n∑

j=1wijsj(t)

Page 25: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Application to optimization problems

Problem: find the optimum (or close to) of a scalar function subject

to a set of restrictions

e.g.: The traveling-salesman problem (TSP) or graph bipartitioning (GBP); oftenwe deal with NP-complete problems.

TSP(n) = tour n cities with the minimum total distance; TSP(60) ≈ 6,93 · 1079

possible valid tours. In general, TSP(n) = n!2n

.

GBP(n) = given a graph with an even number n of nodes, partition it in two

subgraphs of n/2 nodes each such that the number of crossing edges is minimum

Idea: setup the problem as a function to be minimized, setting the

weights so that the obtained function is the energy function of a

Hopfield network

Page 26: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Application to graph bipartitioning

Representation: n graph nodes = n neurons, with

si =

{

+1 : node i in left part−1 : node i in right part

Connectivity matrix: cij = 1 indicates a connection between nodes i and j.

Page 27: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Application to graph bipartitioning

H(S) = −n∑

i=1

n∑

j>i

cijSiSj

︸ ︷︷ ︸

connections between parts

+ α

n∑

i=1

Si

2

︸ ︷︷ ︸

bipartition

=⇒ H(S) = −n∑

i=1

n∑

j>i

(cij − α)SiSj ≡ −n∑

i=1

n∑

j>i

wijSiSj +n∑

i=1

θiSi

=⇒ wij = cij − α, θi = 0

(general idea: wij is the coefficient of SiSj in the energy function)

Page 28: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Application to the traveling-salesman problem

Representation: 1 city −→ n neurons, coding for position;

hence n cities = n2 neurons.

Sn×n = (Sij) (network states); Dn×n (distances between cities, given)

with Sij = +1 if the city i is visited in position j and −1 otherwise.

Restrictions on S:

1. A city cannot be visited more than oncen∑

i=1

n∑

j=1

n∑

k>j

SijSik = 0

−→ every row i must have only one +1: R1(S)

2. A city can only be visited in sequence (no ubiquity gift

−→ every column j must have only one +1: R1(St)

3. Every city must be visited

(n∑

i=1

n∑

j=1

Sij − n

)2

= 0

−→ there must be n +1s overall: R2(S)

Page 29: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Application to the traveling-salesman problem

4. The tour has to be the shortest possiblen∑

k=1

n∑

i=1

n∑

j>i

DijSik(Sj,k−1 + Sj,k+1)

Total sum of distances in the tour: R3(S, D)

H(S) = −n∑

i=1

n∑

j=1

n∑

k>i

n∑

l>j

wij,klSijSkl+

n∑

i=1

n∑

j>i

θijSij ≡ αR1(S)+βR1(St)+γR2(S)+εR3(S, D)

=⇒ wij,kl = −αδik(1 − δjl) − βδjl(1 − δik) − γ − εDik(δj,l+1 + δj,l−1) and θij = −γn

This is a continuous Hopfield network: Sij ∈ (−1,1) since Sij := tanhχ(hij − θij)

Problem: set up good values for α, β, γ, ε > 0, χ > 0

Page 30: Llu s A. Belanche belanche@lsi.upcbelanche/Docencia/apren/2009-10/Teoria/tema9.pdf · Ejemplo del comportamiento de una red de Hopfield Una red de 120 nodos (12x10) CONJUNTO DE OCHO

Hopfield networks

Application to the traveling-salesman problem

Example for n = 5: 120 valid tours (12 different), 25 neurons

225 potential attractors; actual num-ber of them: 1,740 (among them, thevalid 120). ⇒ 1,740 - 120 = 1,620spurious attractors (invalid tours)

The valid tours correspond to thestates of lower energy (correct de-sign)

But ... we have a network with 30times more invalid tours than validones: a random initialization will endup in an invalid tour with probability0.966

Moreover, we aim for the best tour!A possible workaround is to use stochastic unitswith simulated annealing