[progress in brain research] models of brain and mind - physical, computational and psychological...

15
R. Banerjee & B.K. Chakrabarti (Eds.) Progress in Brain Research, Vol. 168 ISSN 0079-6123 Copyright r 2008 Elsevier B.V. All rights reserved CHAPTER 13 Neural network modeling Bikas K. Chakrabarti 1,2, and Abhik Basu 2 1 Centre for Applied Mathematics and Computational Science, Saha Institute of Nuclear Physics, Calcutta 700064, India 2 Theoretical Condensed Matter Physics Division, Saha Institute of Nuclear Physics, Calcutta 700064, India Abstract: Some of the (comparatively older) numerical results on neural network models obtained by our group are reviewed. These models incorporate synaptic connections constructed by using the Hebb’s rule. The dynamics is determined by the internal field which has a weighted contribution from the time delayed signals. Studies on relaxation and the growth of correlations in the Hopfield model are discussed here. The memory capacity of such networks have been investigated also for some asymmetric synaptic interactions. In some cases both the asynchronous (or Glauber; Hopfield) and synchronous (Little) dynamics are used. At the end, in the appendix, we discuss the effects of asymmetric interactions on the statistical properties in a related model of spin glass (new results). Keywords: neural network models; spin glasses; associative memory; Hopfield model; Little model; retrieval dynamics; relaxation Introduction The human brain is formed out of an intercon- nected network of roughly 10 10 –10 12 relatively simple computing elements called neurons. Each neuron, although a complex electro-chemical device, performs simple computational tasks of comparing and summing incoming electrical sig- nals from other neurons through the synaptic connections. Yet the emerging features of this interconnected network are surprising and under- standing the cognition and computing ability of the human brain is certainly an intriguing problem (see Amit, 1989; Hertz et al., 1991). Although the real details of the working of a neuron can be quite complicated, for a physics model, following McCullough and Pitts (see Amit, 1989), neurons can be taken as two-state devices (with firing state and nonfiring states). Such two-state neurons are interconnected via synaptic junctions where, as pointed out first by Hebb (see Amit, 1989), learning takes place as the pulse travels through the synaptic junctions (its phase and/or strength may get changed). Present day super-computers employ about 10 8 transistors, each connected to about 2–4 other transistors. The human brain, as mentioned before, is made up of B10 10 neurons and each neuron is connected to B10 4 other neurons. The brain is thus a much more densely interconnected network, and although it uses much slower operating elements (the typical time scale is of the order of milliseconds for neurons) compared to silicon devices (nanose- cond order), many of the computations performed by the brain, such as in perceiving an object, are remarkably faster and suggest an altogether different architecture and approach. Corresponding author. Tel.: +91 33 2321 4869; Fax: +91 33 2337 4637; E-mail: [email protected] DOI: 10.1016/S0079-6123(07)68013-3 155

Upload: bikas-k

Post on 08-Dec-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

R. Banerjee & B.K. Chakrabarti (Eds.)

Progress in Brain Research, Vol. 168

ISSN 0079-6123

Copyright r 2008 Elsevier B.V. All rights reserved

CHAPTER 13

Neural network modeling

Bikas K. Chakrabarti1,2,� and Abhik Basu2

1Centre for Applied Mathematics and Computational Science, Saha Institute of Nuclear Physics, Calcutta 700064, India2Theoretical Condensed Matter Physics Division, Saha Institute of Nuclear Physics, Calcutta 700064, India

Abstract: Some of the (comparatively older) numerical results on neural network models obtained by ourgroup are reviewed. These models incorporate synaptic connections constructed by using the Hebb’s rule.The dynamics is determined by the internal field which has a weighted contribution from the time delayedsignals. Studies on relaxation and the growth of correlations in the Hopfield model are discussed here. Thememory capacity of such networks have been investigated also for some asymmetric synaptic interactions.In some cases both the asynchronous (or Glauber; Hopfield) and synchronous (Little) dynamics are used.At the end, in the appendix, we discuss the effects of asymmetric interactions on the statistical properties ina related model of spin glass (new results).

Keywords: neural network models; spin glasses; associative memory; Hopfield model; Little model; retrievaldynamics; relaxation

Introduction

The human brain is formed out of an intercon-nected network of roughly 1010–1012 relativelysimple computing elements called neurons. Eachneuron, although a complex electro-chemicaldevice, performs simple computational tasks ofcomparing and summing incoming electrical sig-nals from other neurons through the synapticconnections. Yet the emerging features of thisinterconnected network are surprising and under-standing the cognition and computing ability ofthe human brain is certainly an intriguing problem(see Amit, 1989; Hertz et al., 1991). Although thereal details of the working of a neuron can be quitecomplicated, for a physics model, following

�Corresponding author. Tel.: +91 33 2321 4869;

Fax: +91 33 2337 4637;

E-mail: [email protected]

DOI: 10.1016/S0079-6123(07)68013-3 155

McCullough and Pitts (see Amit, 1989), neuronscan be taken as two-state devices (with firing stateand nonfiring states). Such two-state neurons areinterconnected via synaptic junctions where, aspointed out first by Hebb (see Amit, 1989),learning takes place as the pulse travels throughthe synaptic junctions (its phase and/or strengthmay get changed).

Present day super-computers employ about 108

transistors, each connected to about 2–4 othertransistors. The human brain, as mentioned before,is made up of B1010 neurons and each neuron isconnected toB104 other neurons. The brain is thusa much more densely interconnected network, andalthough it uses much slower operating elements(the typical time scale is of the order of millisecondsfor neurons) compared to silicon devices (nanose-cond order), many of the computations performedby the brain, such as in perceiving an object, areremarkably faster and suggest an altogetherdifferent architecture and approach.

156

Two very important features of neural compu-tations by the brain which largely outperformspresent day computers (though they performarithmetic and logical operations with everincreasing speeds) are:

� Associative memory or retrieval from partialinformation.

� Pattern or rhythm recognition and inductiveinference capability.

By associative memory, we mean that the braincan recall a pattern or sequence from partial orincomplete information (partially erased pattern).In terms of the language of dynamics of anynetwork it means that by learning various patternsthe neural network forms corresponding (distinct)attractors in the configuration space, and if thepartial information is within the basin of attractionof the pattern then the dynamics of the networktakes it to the (local) fixed point or attractorcorresponding to that pattern. The network is thensaid to have recognized the pattern. A look atneural networks in this way suggests that thelearning of a large (macroscopic) number ofpatterns in such a network means creation of alarge number of attractors or fixed points ofdynamics corresponding to the various patterns(uncorrelated by any symmetry operation) to belearned. The pattern or rhythm recognition capa-bility of the brain is responsible for inductiveknowledge. By looking at a part of a sequence, saya periodic function in time, the brain can extra-polate or predict (by adaptively changing thesynaptic connections) the next few steps or the restof the pattern. By improved training, expert systemscan comprehend more complicated (quasi-periodicor quasi-chaotic) patterns or time sequences (e.g., inmedical diagnosis). In the following sections wegive a brief (biased) review of our studies onassociative memory, time series prediction andoptimization problems. In this manuscript wediscuss asynchronous (Hopfield) and in some casessynchronous (Little) dynamics also.

There have been a fair number of studiesindicating that the recall performance of a neuralnetwork model increases if the synaptic strengthhas asymmetry in inter-neuron connectivity. In

view of this, we briefly discuss an associated spinglass model with asymmetric interactions in theappendix. We show that such a model lacks theFluctuation–Dissipation Theorem (FDT) and con-sequently it is very difficult to analyze the proper-ties of the model.

Hopfield model of associative memory

In the Hopfield model, a neuron i is represented bya two-state Ising spin at that site (i). The synapticconnections are represented by spin–spin interac-tion and they are taken to be symmetric. Thissymmetry of synaptic connections allows one todefine an energy function. Synaptic connectionsare constructed following Hebb’s rule of learning,which says that for p patterns the synaptic strengthfor the pair (i, j) is

Jij ¼1

N

Xpm

zmi zmj (1)

where zmi ; i ¼ 1; 2; . . . ;N, denotes the m-th patternlearned ðm ¼ 1; 2; . . . ; pÞ. Each can take values 71.The parameter N is the total number of neuronseach connected in a one to all manner and p is thenumber of patterns to be learned. For a system ofNneurons, each with two states, the energy function is

H ¼ �XNi4j

Jijsisj (2)

The expectation is that, with Jij’s constructed asin (1), the Hamiltonian or the energy function (2)will ensure that any arbitrary pattern will havehigher energy than those for the patterns to belearned; they will correspond to the (local) minimain the (free) energy landscape. Any pattern thenevolves following the dynamics

siðtþ 1Þ ¼ sgnðhiðtÞÞ (3)

where hi(t) is the internal field on the neuron i,given by

hi ¼Xj

JijsjðtÞ (4)

157

Here, a fixed point of dynamics or attractoris guaranteed; i.e., after a certain (finite) numberof iterations t�, the network stabilizes andsiðt� þ 1Þ ¼ siðt�Þ. Detailed analytical as well asnumerical studies (see Amit, 1989; Hertz et al.,1991) shows that the local minima for H in (2)indeed correspond to the patterns fed to be learnedin the limit when memory loading factora ð¼ p=NÞ tends to zero; and they are less thanB3% of the patterns fed to be learned whenaoac � 0:142. Above this loading, the networkgoes to a confused state where the local minima inthe energy landscape do not have any significantoverlap with the patterns fed, to be learned by thenetwork.

Relaxation studies in the Hopfield model

As already mentioned, after learning (i.e., after theconstruction of Jij’s as per Hebb’s rule in (1) isover), any arbitrary (corrupted or distorted)pattern fed to the network evolves according to

Fig. 1. Plot of average convergence time t as a function of the init

m(0)=0.95 (squares) at N=16,000 and m(0)=0.95 (diamonds) at N

t � exp½�Aða� acÞ� (adapted from Ghosh et al., 1990). (See Color P

the dynamics given in (3). The average of theiteration numbers required to reach the fixedpoints may be defined as relaxation time t (i.e.,averaged over all the patterns) for the ‘‘distorted’’patterns considered. If one starts with a patternobtained by distortion of any of the learnedpatterns by a fixed amount of corruption andcompares their relaxation time (to reach thecorresponding fixed point), one expects t toincrease as a increases (from zero) to a ¼ ac(for fixed N); the so called critical slowing down.In a recent study (Ghosh et al., 1990; see alsoChakrabarti and Dasgupta, 1992), systems rangingin size from N=1000 to N=16,000 neurons wereinvestigated numerically, in the absence of noise(zero temperature). For a fixed but small initialcorruption, the variation of the average conver-gence or relaxation time (t) with storage capacitywas found as t � exp½�Aða� acÞ� where A is aconstant dependent on system size N (Ghosh et al.,1990). This is illustrated in Fig. 1.

It may be noted that this critical slowing downof the recall process near ac is somewhat unusual

ial loading factor a for initial overlap m(0)=0.80 (circles) and

=1000. The inset shows how t variations fit with the form

late 13.1 in color plate section.)

158

compared to such phenomena near the phasetransition points. Below ac, the local minima or themetastable states act as the overlap or memorystates (not those actual minima, which are spin-glass states) and these metastable states (withoverlap) disappear at a4ac. This transition at ac istherefore not comparable to a phase transition,where the states corresponding to (free) energyminima change. Here, the minimum energy statesremain as the spin-glass states for any a40:05 (seeHertz et al., 1991); only the overlap metastablestates disappear at aX0.14.

A logical extension is to study the variation of twith synaptic noise. After the learning stage, thepatterns were again distorted (by changing the spinstates at a fixed small fraction of the sites orneurons) and retrieval is governed by the Glaubertype dynamics:

siðtþ 1Þ ¼ sgnðhiÞ with probability1

½1þ expð�2hi=TÞ�

¼ � sgnðhiÞ otherwise ð5Þ

Here T is the temperature parameterizing thesynaptic noise and hi is given by (4). Here, ofcourse, instead of looking for the attractors (whichexist only for T=0), one measures the averageoverlap (with the learned patterns) that develops intime and looks for its relaxation to some equili-brium value. Again, the relaxation time increasesabruptly as the metastable overlap states disappearacross the phase boundary in the a� T plane.Indeed, the (dynamical) phase boundary obtainedhere (Sen and Chakrabarti, 1989, 1992), studyingthe divergence of relaxation time, comparesfavorably with that obtained from the static(free energy) study. It was found that the ave-rage relaxation time t grows approximately asexp½ðT � TmÞ

2� near the phase boundary at TmðaÞ,

when aoac.

Growth of correlations in the Hopfield model

The Hopfield model is solvable in the thermo-dynamic limit. In the exact solution of the model(see Hertz et al., 1991), it is necessary to assumethat the learned patterns are completely random

and uncorrelated. Models with correlated patternshave been investigated, where average correlationbetween all the patterns have been kept finite.A different problem in which only two of thepatterns are correlated (i.e., have finite overlap), sothat average correlation is still irrelevant, hasrecently been studied (Chakrabarti and Dasgupta,1992). The aim was to investigate the ability (orlack of it) of the Hopfield model to store thecorrelation between two (arbitrary) patterns. Therestriction of the correlation between two patternsonly ensures that the critical memory loadingcapacity remains unaffected at B14%; i.e., acnearly 0.14. The variation of final correlationbetween learned patterns with the initial correla-tion between a randomly chosen pair of patterns(denoted by 1 and 2) to be learned was measured.The initial correlations between the patterns aregiven by rmn ¼ SiB

mi B

ni =N, where rmn ¼ r0 for m=1

and n=2, and rmn ¼ 0 for other values of m and n.The system is then allowed to evolve fromthe initial states sð1Þi ð0Þ ¼ B1i and sð2Þi ð0Þ ¼ B2ifollowing the Hopfield dynamics (3) and (4). Thecorrelation between the states s(1) and s(2) isrf ¼ Sis

ð1Þi ðt�Þsð2Þi ðt�Þ=N, where sð1Þi ðt�Þ and sð2Þi ðt�Þ

denote two states after reaching their respectivefixed points. In simulation, the system size is takenas N = 500; the patterns between which the corre-lation is introduced are selected randomly andthe averaging is done over 25 configurations foreach value of r0 and a. When the ratio rf=r0 isplotted against r0 for different loading a it is foundthat the ratio rf=r0 � 1; i.e., the Hopfield modelretains correlation of a particular pair of patterns;either it keeps the same correlation or enhancesit (see Figs. 3 and 4 of Chakrabarti and Dasgupta,1992). For studying the r0 ! 0 limit moreaccurately, a larger system of N=1000 wastaken. The critical value rc of average r0 uptowhich the ratio remains unity (and above whichit starts deviating towards higher values), ismeasured against different loading capacity a. Asa increases, rc decreases continually and as aexceeds a value of the order of 0.05, rc reacheszero. Hence the Hopfield model generates somecorrelations, however small, for completely uncor-related patterns, above a certain value of a(=0.05).

159

Extended memory and faster retrieval with delay

dynamics

In the literature, there have been a number ofindications that dynamically defined networks,with asymmetric synaptic connections, may havebetter recall performance because of the suppres-sion of spin-glass-like states. See the appendix fora discussion on a spin glass model with asymmetricinteractions. The exactly solvable models (seeCrisanti and Sompolinsky, 1987) indeed uses thesame Hebb’s rule for connection strength but withextreme dilution (inducing asymmetry). This givesbetter recall properties. There were also indica-tions that addition of some local memory of eachneuron, in the sense that the internal field in (3) isdetermined by the cumulative effect of all theprevious states of the neighboring (as indeed isbiologically plausible), gives better recall dynamicsin some analog network models. In all these cases,no effective energy function can be defined(because of the asymmetry of or of the localmemory effect) and the use of statistical physics ofspin-glass-like systems is not possible. All thenetworks are thus defined dynamically. In a recentsimulation study it has been shown that adynamics with a single time step delay with sometunable weight l (Sen and Chakrabarti, 1992)

siðtþ 1Þ ¼ sgnXj

JijðsjðtÞ þ lsjðt� 1ÞÞ

" #(6)

instead of (3) along with (2), improves theperformance of the Hopfield-like model (withsequential updating) enormously. Here the samesynaptic connections Jij in Eq. (1), obtained byusing Hebb’s rule, are employed. In particular, fora network with N=500, the simulations indicatedthat the discontinuous transition for the overlapfunction at ac � 0:14 disappears and the overlapm

mf ¼

Pisiðt

�Þzmi , for m-th pattern becomes contin-uous. Accepting recall with final overlap m � 0:9,the effective threshold loading capacity acincreases from 0.14 for l=0 to B0.25 for l � 1.It was also found that average recall time orrelaxation time t (defined before) becomes mini-mum around l � 1 at any particular loadingcapacity a.

The above interesting and somewhat surprisingresults for this single step time delayed dynamics arefrom simulations of a rather small size (N=500)of the network. Here we present the results of adetailed numerical study of this dynamics, bothwith asynchronous (sequential as well as random;Hopfield-like) and synchronous (Little-like) updat-ing. We check our numerical results for a biggernetwork (for 250pNp4000) and study if there isany significant finite size effect. We find that thefixed point memory capacity improves consider-ably with l, both with Hopfield and Littledynamics. We present here the results of a detailednumerical study of this dynamics, both withasynchronous (sequential as well as random;Hopfield-like) and synchronous (Little-like) updat-ing. With 90% final overlap (mfX0.90) with thelearned patterns, the loading capacity increases toalmost 0.25 for lB1. The relaxation or averagerecall timer, for any fixed loading capacity a(p0.25), is also minimum at lB1. We could notdetect any significant finite size effect (for250pNp4000) and we believe that this result istrue in the thermodynamic limit. We also observethat for l=1, both the Little dynamics andHopfield dynamics give identical results (all thelimit cycles of the Little model disappear to givefixed points). The performance of the network formemory storage in limit cycles for negative l hasalso been studied for Hopfield and Littledynamics. Here, the negative l term provides adamping and limit cycles become frequent but theoverlap properties deteriorate significantly. Wehave also studied the performance of suchdynamics for randomly diluted networks. In thenext section we discuss the significance of theseresults.

Simulations and numerical results: Hopfield and

Little models

We consider a network of N neurons (spins). Thesynaptic interactions between neurons i and j arefound using the Hebb’s rule (1) for a set ofp-random Monte Carlo generated patterns. Afterthis ‘learning’ stage, each pattern is corruptedrandomly by a fixed amount (typically 10% of the

Fig. 2. The variation of the average final overlap mf (for the fixed point obtained) with lR (=(1�l)(1+l)), positive l at a

fixed loading capacity a=0.25, with sequential dynamics for networks with N=250 (solid line), 500 (dashed line) and 1000

(dash-dot line). The inset shows the variation of average recall or relaxation time t against lR for different N (adapted from Maiti

et al., 1995).

160

neuron states are changed randomly; with initialoverlap m

mi ¼ 0:9 for any pattern m) and the

dynamics (updating process) following (3) startswith these corrupted patterns. Both sequential(Hopfield) and parallel (Little) updating areemployed. Since our updating process requirestwo initial patterns (at (t � 1) and (t � 2)), we startwith two randomly distorted patterns with thesame value of distortion (for each m). We study theoverlap of the network state with the pattern mduring the updating process and the final overlapm

mf of the fixed points (checked for two successive

time steps) or limit cycle patterns are noted. Theaverages over all p patterns (and over two to fivesets of random number seed for the entire run) aretaken (average of over all p patterns). The numberof iterations (t�) required to reach the fixed points(or limit cycles) are also noted and their average(over the patterns) give the average recall orrelaxation time (t). Our typical results for theHopfield model (dynamics) are for N=1000 andthose for the Little model are for N=500. We

consider mostly positive l values. Negative lvalues do not improve the performance andhave been studied just for some typical investiga-tions only.

In Fig. 2, we give the variation of the finalaverage overlap m (of any pattern) for thefixed point (checked over two successive timesteps) with the redefined l [lR=(1�l)(1+l)] atfixed loading capacity a � p/N=0.25 for sequen-tial as well as random (Glauber) updating. It maybe noted that for the Hopfield dynamics (l=0)the final overlap mfo0:30 for a=0.25. Strictlyspeaking, this value should be zero in the thermo-dynamic limit. The results shown in Fig. 2 arefor N=250, 500 and 1000, and the nonvanishingvalue of mf for l ¼ 0 ðlR ¼ 1Þ at a=0.25 isprecisely due to this finite size effect as can beeasily checked (decreases considerably as N be-comes large). It is seen that the final overlap gra-dually improves with increasing l and it saturatesbeyond l ¼ 1 ðlR ¼ 0Þ to mj � 0:90 (for a=0.25).The inset figure shows that the average recall time

161

(or relaxation time) t and the fraction of limitcycles also decreases as l increases from l=0 andfinally attains an optimal value at l=1. Recallthat time t again shows a minimum at l=1 asin the Glauber (Hopfield) case. An interestingobservation is that at l=1 the limit cycles ofthe Little model (parallel dynamics) disappearand all reach to fixed points with identical over-lap properties as in the case of the Hopfieldmodel.

For sequential updating, the variation of theaverage final overlap mf with the loading capacitya, for some typical positive values of l are shownin Fig. 3. The results are for N=500. The insethere shows variation of average recall time (t) witha for different l. For negative values of l even forsequential updating, one gets limit cycles (e.g., thefraction of limit cycles is B0.48, for l=�0.8 ata=0.25) and the fixed point fraction decreasesfurther for parallel (Little) dynamics. However,the overlap with the learned patterns decreasesconsiderably and we do not see any significant

Fig. 3. Average final overlap mf (for the fixed point obtained) at a fixe

lR (=(1�l)(1+l)); positive l). The results are for parallel updating (Ldynamics (Hopfield model) by circle (filled or unfilled). The inset show

cycles against lR (adapted from Maiti et al., 1995).

memory capability (for storage in limit cycles)for negative values of l (typically mf ffi 0:42 forl=0.8 at a=0.25, using sequential updating)(Fig. 4).

In order to investigate the finite size effect in theimproved performance of the network for positivel values (for sequential dynamics), we studied thevariation of average final overlap with 1/N at fixedloading capacity a (=0.20 and 0.25) and fixedl (=1). These results, for 250 p N p 4000, areshown in Fig. 5. The inset there shows the samevariation of with l/N at fixed a (=0.25) for l=0.One can clearly compare the finite size effectsobserved: for l=1, no significant variation of with1/N is observed for both a=0.20 and 0.25(mf ffi 0:4 for a=0.20, and mf ffi 0:8 for a=0.25at l=1 as N ! 1). For l=0, however thereis significant finite size effect. From the extrapola-tion we find the extrapolated result for mf

(for large N) to be around 0.29 here at this a(=0.25); see the inset. See Fig. 5 for the variationsof mf with the loading a for various values of l.

d loading capacity a= 0.25, for a network with N=500 against

ittle model), shown by square (filled or unfilled); for the Glauber

s the variation of average recall time t and the fraction of limit

Fig. 4. Variation of average final overlap mf against 1/N for fixed leading capacity a=0.2 and a=0.25 for l=1. The inset shows the

same for a=0.25 at l=0. The results are for 250pNp4000 (adapted from Maiti et al., 1995).

Fig. 5. Variation of average final overlap mf against loading capacity for some typical values of l (l=0 for the bottom curve and

l=0.2, 0.5, 1.0, 1.5, 2 successively upwards, with sequential dynamics for N=1000). The inset shows the variation of the recall time tagainst a for the same values of l.

162

Fig. 6. The variation of the configurational energy E(t) with time (updating iteration) t, for the sequential updating with l=4 for two

typical cases (arbitrary distorted patterns) before and after reaching the fixed points. The inset shows the same for a distorted pattern

with l=0 dynamics (adapted from Maiti et al., 1995).

163

It is to be noted that for positive l, the over-lap and other properties saturate beyondl=1 (lR=0) for any a. This is true even for verylarge values of l in the updating (6). This issomewhat tricky in the sense that apparently onetime step is almost skipped in every updating;practically {Si(t�2)} (together with only a smallfraction of {Si(t�1)}) determines {Si(t)} through(6) and is kept in a buffer for latter use. Simi-larly {Si(t�1)} from the buffer is used to deter-mine {Si(t=1)}. We have checked independentlythe performance of such an updatingðSiðtþ 1Þ ¼ sgn½SjJijðl0SjðtÞ þ lSjðt� 1Þ�, insteadof (6), with l0 ! 0). We find for lu=0, theHopfield model result is recovered. In fact, thiscan be seen from Fig. 3, where the overlap mf

decreases rapidly for very large values of l in (6)(lX10). It may be noted, however, that theeffective size of the domain of attraction of thefixed points depends very much on the ratio l/lu.This can be seen easily in the limit l ! 0 but lufinite (or l0 ! 0 but l finite), where the change inenergy of the network per updating step depends

on the value of lu (or l), which determines theeffective shape of the energy landscape seen by thedynamics.

As can be easily seen, for nonzero l, the updatingdynamics in (6) does not minimize the ‘energy’function E(t) defined as EðtÞ ¼

PJijSiðtÞSjðtÞ as

the delay term contributes in the internal field.Some typical variations in E(t) with iterations(updating) are shown in Fig. 6 for a=0.5 andl=4. (The large values of a and l are chosen hereto ensure larger t, so that the effect can be displayedover longer time range.) The results clearly showthe contribution of the l term in escaping over thespurious valleys, in the ‘‘energy’’ landscape. Theinset in Fig. 6 shows the same variation for E(t) inthe l=0 (Hopfield) case. Here, of course, it clearlydecreases monotonically with time.

We have also studied the performance of thisdynamics for diluted network (with sequentialdynamics; N=500), where a random fraction c ofthe synaptic connections Jij are removed. We showin Fig. 7, some results for overlap mf against l at afixed loading capacity l=0.20 for some typical

Fig. 7. The variation of average final overlap with, positive l at a fixed capacity a ¼ 0:20 with sequential dynamics for N=500 at some

typical values of c (=0.10, 0.20 and 0.25; the c=0.1 data being at the top). The inset shows the variation of with a at l=1, for the same

value of c (adapted from Maiti et al., 1995).

164

values of dilution concentration c. The inset showsthe variation of mf against a at a fixed l (=1) forsome typical values of c.

Summary

In this paper, we have reviewed several existingnumerical results on neural network modelsobtained by our group. The models used herehave synaptic connections constructed by usingthe Hebb’s rule. The dynamics is governed bythe internal field given by (3) and (6). Some(new) analytic results for asymmetric interactionsin a related spin glass model is given in theappendix.

Acknowledgment

We are grateful to C. Dasgupta, P. Sengupta,M. Ghosh, G. A. Kohring, P. Maity, A. K. Sen

and P. Sen for collaborations at various stages ofthe above studies and to C. Dasgupta for a criticalreading of the manuscript.

Appendix: A spin glass model with asymmetric

interactions

A. The dynamical model

We discuss an asymmetric spin-glass model in themean-field limit. We follow the notations ofCrisanti and Sompolinsky (1987). The modelconsists of N fully connected spins interacting viarandom asymmetric interactions. The details andmore complete studies of the same would bediscussed elsewhere (Basu, 2007). We choose theinteraction matrix between spins i and j to havethe form

Jij ¼ Jsij þ kJa

ij ; k � 0 (7)

165

where Jsij and Ja

ij are symmetric and antisymmetriccouplings respectively, such that

Jsij ¼ Js

ji; Jaij ¼ �Ja

ji (8)

Each of Jsij and Ja

ij are zero-mean Gaussiandistributed random variables with the variance

½ðJsijÞ

2� ¼

J2

N

1

1þ k2(9)

As in Crisanti and Sompolinsky (1987) squarebrackets denote the quench average with respectto the distribution. The parameter k measuresthe degree of asymmetry in the interactions.Similar to Crisanti and Sompolinsky (1987),Eq. (9) implies

N J2ij

h i¼ J2; N JijJji

� �¼ J2 1� k2

1þ k2(10)

Therefore, from the above it is clear that fork=0 the model reduces to the ordinary symmetricspin glass model with infinite range interactions(Sompolinsky and Zippelius, 1982), whereas k=1corresponds to a fully asymmetric model (Crisantiand Sompolinsky, 1987).

The dynamics for this model is defined by anappropriate Langevin equation for the i-th spin

G�10

@

@tsiðtÞ ¼ � r0siðtÞ �

dV ðsiÞdsiðtÞ

þ SjJijsjðtÞ þ hiðtÞ þ ziðtÞ ð11Þ

In our model we consider a soft spin whichvaries continuously from �N to N. The localpotential V ðsiÞ is as chosen in Crisanti andSompolinsky (1987) and is an even function ofsi. The function hi(t) is a local external magneticfield. Further, the stochastic function ziðtÞ is azero-mean Gaussian distributed variable with avariance

hziðtÞzjðt0Þi ¼

2T

g0dðt� t0Þdij (12)

Note that the above choice of noise correlationswould have validated the FDT relating the

appropriate correlation functions and the corre-sponding response functions for the fully sym-metric (k=0) problem. Furthermore, the staticproperties of the fully symmetric version of themodel can be derived from an appropriateBoltzmann distribution without any reference tothe underlying dynamics. However, for the presentcase with a finite k there is no FDT and,equivalently, no Boltzmann distribution existsfor the problem. As discussed in Crisanti andSompolinsky (1987) and as we show below thismakes the finite k problem far more complicatedas compared to its fully symmetric version.

B. Dynamic generating functional

The generating functional for the correlation andthe response functions for the model Eq. (11) is

Z½f; f� ¼Z

DsDs expXi

L0ðsi; siÞ

"

þ1

2SJs

ijfsjðtÞsiðtÞ þ sðt0Þg

þ1

2SJa

ijfsjðtÞsjðtÞ � siðtÞsiðt0Þg

#ð13Þ

where L0 is the local part of the action:

L0 ¼

ZdtisðtÞ½�G�1

0 @tsiðtÞ � r0siðtÞ � 4us3i þ hiðtÞ

þ TG�10 isiðtÞ�

We now average over the distribution of Jij toobtain

hZi ¼

ZDsDs exp L0ðsi; siÞþ2J2sjðtÞsiðtÞsjðt0Þsiðt0Þ

þ2siðtÞsjðt0Þsiðt0ÞsjðtÞJ2 1� k2

1þ k2

�ð14Þ

Here, the field s is the MSR conjugate variable(Martin et al., 1973; Sompolinsky and Zippelius,1982).

We now take the mean-field limit, which in thepresent context, is equivalent to assuming infiniterange interactions (i.e., all spins are interactingwith each other). In that limit, fluctuations, which

166

are O(1/N), are negligible (Sompolinskyand Zippelius, 1982). We then linearize theremaining quartic terms by Hubbard–Stratono-vich transformation (Sompolinsky and Zippelius,1982). To simplify further, in the limit N-p,

we substitute the stationary point values.Furthermore, we set due to causality. We thusobtain

hZi ¼

ZDsDs exp L0 þ

b2J2

2

Zdtdt0 Cðt� t0ÞisiðtÞisiðt0Þ

þ 21� k2

1þ k2Gðt� t0ÞisiðtÞsiðt0Þ

��ð15Þ

The effective equation of motion, correspondingto the generating functional (15) above is

G�10

@

@tsiðtÞ ¼ r0siðtÞ � 4us3i þ

J2

T2

1� k2

1þ k2

Z t

�1

dt0Gðt� t0Þsiðt0Þ þ ciðtÞ ð16Þ

where the effective noise is zero-mean, Gaussiandistributed with a variance

hciðtÞcjðt0Þi ¼

2

G0dðt� t0Þdij þ b2J2Cðt� t0Þdij

(17)

or in the Fourier space

siðoÞ ¼ G0ðoÞ½cðoÞ þ hiðoÞ� � 4uG0ðoÞ

ZdO1dO2sjðO1ÞsjðO2Þsiðo� O1 � O2Þ

ð18Þ

Further

G0ðoÞ ¼ r0 � ioG�1 �1� k2J2

1þ k2T2GðoÞ (19)

and the effective noise correlation is given by

hciðoÞcjð�oÞi ¼ 2dij2

G0þ

J2

T2CðoÞ

� �(20)

Note that, unlike the case of symmetric couplings,the effective noise no longer does not haveany relation with the effective propagator (19).Therefore, the correlation function C(t) andthe propagator G(t) are independent of eachother in contrast to the case with symmetriccouplings where the two are related by the FDT(Sompolinsky and Zippelius, 1982; Crisanti andSompolinsky, 1987). Further, for the convenienceof our subsequent analyses we write the effectivenoise ciðtÞ as the sum of three parts

ciðtÞ ¼ ZiðtÞ þ xiðtÞ þ zi (21)

Here

hZiðoÞZjð�oÞi ¼ dij2

G0þ

J2

T2

1� k2

1� k2CðoÞ

� �(22)

such that CðoÞ ¼ ð2T=oÞIGðoÞ, where I refers tothe imaginary part. Thus, Zi is the part of the totaleffective noise which respects the FDT. Thefunction C(t) is the part of the correlator C(t)which is related to the propagator G(t) through theFDT. We further define C(t)�C(t)�q(t) such thatqðtÞ ¼ ½hsiðt0ÞiZhsiðtþ t0ÞiZ�xizi . Here, Zi is the time-persistent part of the Gaussian noise with a variance

hziðoÞzjð�oÞi ¼ 2pdðoÞb2J2qdij (23)

This then yields

hxiðoÞxjð�oÞi ¼ 2pdi2k2

1þ k2b2J2CðoÞ

þ b2fqðoÞ � qdðoÞg�

ð24Þ

With these effective noise correlators and propa-gator we now proceed to consider the dynamics athigh temperature.

C. Dynamics at high temperature

In the high temperature phase there is no timepersistent correlation. We begin by defining adamping function

G�1ðoÞ ¼ i@G�1

@o(25)

167

Eqs. (19) and (25) together with the Dyson’sequation yield

G�1ðoÞ ¼G�10 þ i @S@o

1� J2

T2 G2ðoÞ 1�k2

1þk2

(26)

Equation (26), therefore, suggests that the inverseof the effective relaxation time G (o=0) has adivergence at a critical temperature given by

T c ¼ JGð0Þ

ffiffiffiffiffiffiffiffiffiffiffiffiffi1� k2

1þ k2

s(27)

This, however, holds provided @S=@o hasa finite limit for o ! 0. We now argue belowin favour of such a result: we begin by noting thatfor k=0, i.e., for the symmetric coupling case,GðoÞ �

ffiffiffiffic

pin the small-o limit, which in turn

implies GðoÞ �ffiffiffiffio

p(Sompolinsky and Zippelius,

1982). We then note that (o), a part of the totalcorrelation C(o), is related to the propagatorG(o) and hence has a small-o dependence ofthe form 1=

ffiffiffiffiw

p. The total correlator C(o) is

given by

CðoÞ ¼ GðoÞG�ðoÞ G�1ðoÞ þ CðoÞ� �

(28)

indicating that C(o) is no more singular than1=

ffiffiffiffio

p. We are then led to the result that

@S=@ojo ¼ 0 is finite. We, therefore, concludethat for TWTc spin fluctuations decay asj1� T=Tcj

�1, where Tc is given by (27).

D. Statics below

Having argued in favour of the possibility of acritical temperature Tc we now consider the staticproperties below Tc. We begin by setting up theFokker–Planck equation (Chaikin and Lubensky,2000) which governs the time-evolution of theprobability distribution of the configurations of si.In particular, we consider the probability distribu-tion P1ðs; tjs0; t0Þ � hdðsi � siðtÞÞis0; t0, which isthe probability of the configuration {s} at time t.

This implies

P1ðs; tþ Dtjs0; t0Þ

¼

ZDs0P1ðs; tþ Dtjs0; tÞP1ðs0; tjs0; t0Þ ð29Þ

We now calculate the conditional probabilityP1ðs; tþ Dtjs0; tÞ from the equation of motion: aTaylor series expansion yields

siðtþ DtÞ ¼ siðtÞ � �r0siðtÞ �dVdsi

�þJ2 1� k2

1þ k2

Z t

�1

dt0Gðt� t0Þsiðt0Þ�G0Dt

þ

Z tþDt

t

ciðt0Þdt0

þ1

2

Z tþDt

t

Z tþDt

t

cðt1Þcðt2Þdt1dt2

ð30Þ

Here, the noise c ¼ Zi þ xi þ zi. We now per-form averages over Zi and xi which are zero-meanand Gaussian distribute with variances discussedabove. After simplifications we finally find

hd½si � siðtþ DtÞ�is0it0

¼ 1þ Dt �r0si �dVdsi

� �

þ J2 1� k2

1þ k2

Z tþDt

t

dt0Gðt� t0Þsiðt0Þ � ziDt�@P1

@si

þ DtJ2

Z Dt

�Dt

2k2

1þ k2J2Cðt0ÞJ2½qðt0Þ � q�

� �@2

@s2iP1

ð31Þ

yielding

@P1

@t¼ T

@

@si

1

T�r0si �

dVdsi

� � zi

þJ2 1� k2

1þ k2

Z tþDt

t

Gðt� t0Þsiðt0Þ

þJ2

Z Dt

�DtCðt0Þdt0 þ J2 2k2

1þ k2qðt0Þ � q½ �

�ð32Þ

168

The steady-state solution of Eq. (32), P1 (steady)can be obtained by setting @P1=@t to zero. FromP1 (steady) one would be able to calculateq � ½hsi2n;x�z. From Eq. (33) one notes thatq � ½hsi2n;x�z appears in Eq. (32). Therefore, onewould need an equation for the distributionP2 � ½hsiZðx; zÞ�. The equation is

@P2

@t¼

@

@s�r0si �

dVdsi

� zi � xiðtÞ

þ J2 1� k2

1þ k2

Z tþDt

t

Gðt� t0Þsiðt0Þ

þJ2 1� k2

1þ k2

Z Dt

�DtCðt0Þdt0

@

@si

�P2 ð33Þ

Clearly there is no steady-state solution sincehas an explicit time dependence. We formallywrite the solution of the Eq. (33) as P2ðtÞ ¼

exp½R t

0 LðtÞdt�P2ðt ¼ 0Þ where L is the operator

@

@s�r0si �

dVdsi

� zi � xiðtÞ

þ J2 1� k2

1þ k2

Z tþDt

t

Gðt� t0Þsiðt0Þ

þJ2 1� k2

1þ k2

Z Dt

�DtCðt0Þdt0

@

@si

�ð34Þ

and P2ðt ¼ 0Þ is the initial condition. The function(t) is then given by

qðtÞ ¼

ZDsP2ðt ¼ 0ÞDsDxDz

exp

Z t0

t

Ldt0

" #siðt0Þsiðt0 þ tÞP½x�P½z�

ð35Þ

Note that we have performed averaging overinitial conditions also. Functions P[z] and P[x] aredistributions of z and x respectively. Assumingthat the initial distribution is normalised we have

qðtÞ ¼

ZDsDxDz

exp

Z t0

0

Lðt0Þdt0

" #siðt0Þsiðt0 þ tÞP½x�P½z�

ð36Þ

Further, the mðzÞ � hsiZ;x ¼RDsP1½z�s. These

formally complete the discussions on the staticproperties below Tc. In summary we have inves-tigated a spin glass model with asymmetric spin–spin interactions in the mean-field limit. Due to theasymmetry there is no FDT in the model andconsequently analysis of the model becomes farmore complicated. Our result [Eq. (28)] suggeststhat the spin glass transition temperature isreduced in the presence of the asymmetry, andtherefore, indicates the possibility of highermemory capacity of neurons with such asymmetricsynaptic interactions. Thus asymmetric interac-tions tend to suppress the spin-glass phase. In thiscontext we refer to some recent related studies:Crisanti and Sompolinsky (1987) studied a sphe-rical model and the Ising version of the problemrespectively., and found no spin-glass phase at anyfinite temperature for any strength of the asym-metry. Further extensive studies on this problemare required for a satisfactory resolution of theissues raised here.

References

Amit, D.J. (1989) Modelling Brain Function. Cambridge

University Press, Cambridge.

Basu, A. (2007). Unpublished.

Chaikin, P.M. and Lubensky, T.C. (2000) Principles of

Condensed Matter Physics. Cambridge University Press,

Cambridge.

Chakrabarti, B.K. and Dasgupta, P.K. (1992) Phys. A, 186:

33–48.

Crisanti, A. and Sompolinsky, H. (1987) Phys. Rev. A, 36:

4922–4939.

Ghosh, M., Sen, A.K., Chakrabarti, B.K. and Kohring, G.A.

(1990) J. Stat. Phys., 61: p. 501.

Hertz, J., Krough, A. and Palmer, R.G. (1991) Introduction to

the Theory of Neural Computation. Addison-Wesley, Read-

ing, MA.

Maiti, P., Dasgupta, P. and Chakrabarti, B.K. (1995) Int. J.

Mod. Phys. B, 9: 3025–3037.

Martin, P.C., Siggia, E.D. and Rose, H.A. (1973) Phys. Rev. A,

8: 423–437.

Sen, P. and Chakrabarti, B.K. (1989) Phys. Rev. A, 40:

4700–4703.

Sen, P. and Chakrabarti, B.K. (1992) Phys. Lett. A, 162:

327–330.

Sompolinsky, H. and Zippelius, A. (1982) Phys. Rev. B, 25:

6860–6875.

Plate 13.1. Plot of average convergence time t as a function of the initial loading factor a for initial overlap m(0)=0.80 (circles) and

m(0)=0.95 (squares) at N=16,000 and m(0)=0.95 (diamonds) at N=1000. The inset shows how t variations fit with the form

t � exp½�Aða� acÞ� (adapted from Ghosh et al., 1990). (For B/W version, see page 157 in the volume.)

Plate 17.1. Representative spectrograms showing the segmentation of uttered words into corresponding syllables. The dotted line in

each spectrogram demarcates the syllables. (a) Utterance of the word ‘stronger’ by an English speaker. Dotted line segments syllables

‘stron’ and ‘ger’. (b)Utterance of the Hindi word ‘suraj’ by a Hindi speaker with syllabic segmentation ‘su’ and ‘raj’. (For B/W version,

see page 210 in the volume.)