[progress in brain research] models of brain and mind - physical, computational and psychological...
TRANSCRIPT
R. Banerjee & B.K. Chakrabarti (Eds.)
Progress in Brain Research, Vol. 168
ISSN 0079-6123
Copyright r 2008 Elsevier B.V. All rights reserved
CHAPTER 13
Neural network modeling
Bikas K. Chakrabarti1,2,� and Abhik Basu2
1Centre for Applied Mathematics and Computational Science, Saha Institute of Nuclear Physics, Calcutta 700064, India2Theoretical Condensed Matter Physics Division, Saha Institute of Nuclear Physics, Calcutta 700064, India
Abstract: Some of the (comparatively older) numerical results on neural network models obtained by ourgroup are reviewed. These models incorporate synaptic connections constructed by using the Hebb’s rule.The dynamics is determined by the internal field which has a weighted contribution from the time delayedsignals. Studies on relaxation and the growth of correlations in the Hopfield model are discussed here. Thememory capacity of such networks have been investigated also for some asymmetric synaptic interactions.In some cases both the asynchronous (or Glauber; Hopfield) and synchronous (Little) dynamics are used.At the end, in the appendix, we discuss the effects of asymmetric interactions on the statistical properties ina related model of spin glass (new results).
Keywords: neural network models; spin glasses; associative memory; Hopfield model; Little model; retrievaldynamics; relaxation
Introduction
The human brain is formed out of an intercon-nected network of roughly 1010–1012 relativelysimple computing elements called neurons. Eachneuron, although a complex electro-chemicaldevice, performs simple computational tasks ofcomparing and summing incoming electrical sig-nals from other neurons through the synapticconnections. Yet the emerging features of thisinterconnected network are surprising and under-standing the cognition and computing ability ofthe human brain is certainly an intriguing problem(see Amit, 1989; Hertz et al., 1991). Although thereal details of the working of a neuron can be quitecomplicated, for a physics model, following
�Corresponding author. Tel.: +91 33 2321 4869;
Fax: +91 33 2337 4637;
E-mail: [email protected]
DOI: 10.1016/S0079-6123(07)68013-3 155
McCullough and Pitts (see Amit, 1989), neuronscan be taken as two-state devices (with firing stateand nonfiring states). Such two-state neurons areinterconnected via synaptic junctions where, aspointed out first by Hebb (see Amit, 1989),learning takes place as the pulse travels throughthe synaptic junctions (its phase and/or strengthmay get changed).
Present day super-computers employ about 108
transistors, each connected to about 2–4 othertransistors. The human brain, as mentioned before,is made up of B1010 neurons and each neuron isconnected toB104 other neurons. The brain is thusa much more densely interconnected network, andalthough it uses much slower operating elements(the typical time scale is of the order of millisecondsfor neurons) compared to silicon devices (nanose-cond order), many of the computations performedby the brain, such as in perceiving an object, areremarkably faster and suggest an altogetherdifferent architecture and approach.
156
Two very important features of neural compu-tations by the brain which largely outperformspresent day computers (though they performarithmetic and logical operations with everincreasing speeds) are:
� Associative memory or retrieval from partialinformation.
� Pattern or rhythm recognition and inductiveinference capability.
By associative memory, we mean that the braincan recall a pattern or sequence from partial orincomplete information (partially erased pattern).In terms of the language of dynamics of anynetwork it means that by learning various patternsthe neural network forms corresponding (distinct)attractors in the configuration space, and if thepartial information is within the basin of attractionof the pattern then the dynamics of the networktakes it to the (local) fixed point or attractorcorresponding to that pattern. The network is thensaid to have recognized the pattern. A look atneural networks in this way suggests that thelearning of a large (macroscopic) number ofpatterns in such a network means creation of alarge number of attractors or fixed points ofdynamics corresponding to the various patterns(uncorrelated by any symmetry operation) to belearned. The pattern or rhythm recognition capa-bility of the brain is responsible for inductiveknowledge. By looking at a part of a sequence, saya periodic function in time, the brain can extra-polate or predict (by adaptively changing thesynaptic connections) the next few steps or the restof the pattern. By improved training, expert systemscan comprehend more complicated (quasi-periodicor quasi-chaotic) patterns or time sequences (e.g., inmedical diagnosis). In the following sections wegive a brief (biased) review of our studies onassociative memory, time series prediction andoptimization problems. In this manuscript wediscuss asynchronous (Hopfield) and in some casessynchronous (Little) dynamics also.
There have been a fair number of studiesindicating that the recall performance of a neuralnetwork model increases if the synaptic strengthhas asymmetry in inter-neuron connectivity. In
view of this, we briefly discuss an associated spinglass model with asymmetric interactions in theappendix. We show that such a model lacks theFluctuation–Dissipation Theorem (FDT) and con-sequently it is very difficult to analyze the proper-ties of the model.
Hopfield model of associative memory
In the Hopfield model, a neuron i is represented bya two-state Ising spin at that site (i). The synapticconnections are represented by spin–spin interac-tion and they are taken to be symmetric. Thissymmetry of synaptic connections allows one todefine an energy function. Synaptic connectionsare constructed following Hebb’s rule of learning,which says that for p patterns the synaptic strengthfor the pair (i, j) is
Jij ¼1
N
Xpm
zmi zmj (1)
where zmi ; i ¼ 1; 2; . . . ;N, denotes the m-th patternlearned ðm ¼ 1; 2; . . . ; pÞ. Each can take values 71.The parameter N is the total number of neuronseach connected in a one to all manner and p is thenumber of patterns to be learned. For a system ofNneurons, each with two states, the energy function is
H ¼ �XNi4j
Jijsisj (2)
The expectation is that, with Jij’s constructed asin (1), the Hamiltonian or the energy function (2)will ensure that any arbitrary pattern will havehigher energy than those for the patterns to belearned; they will correspond to the (local) minimain the (free) energy landscape. Any pattern thenevolves following the dynamics
siðtþ 1Þ ¼ sgnðhiðtÞÞ (3)
where hi(t) is the internal field on the neuron i,given by
hi ¼Xj
JijsjðtÞ (4)
157
Here, a fixed point of dynamics or attractoris guaranteed; i.e., after a certain (finite) numberof iterations t�, the network stabilizes andsiðt� þ 1Þ ¼ siðt�Þ. Detailed analytical as well asnumerical studies (see Amit, 1989; Hertz et al.,1991) shows that the local minima for H in (2)indeed correspond to the patterns fed to be learnedin the limit when memory loading factora ð¼ p=NÞ tends to zero; and they are less thanB3% of the patterns fed to be learned whenaoac � 0:142. Above this loading, the networkgoes to a confused state where the local minima inthe energy landscape do not have any significantoverlap with the patterns fed, to be learned by thenetwork.
Relaxation studies in the Hopfield model
As already mentioned, after learning (i.e., after theconstruction of Jij’s as per Hebb’s rule in (1) isover), any arbitrary (corrupted or distorted)pattern fed to the network evolves according to
Fig. 1. Plot of average convergence time t as a function of the init
m(0)=0.95 (squares) at N=16,000 and m(0)=0.95 (diamonds) at N
t � exp½�Aða� acÞ� (adapted from Ghosh et al., 1990). (See Color P
the dynamics given in (3). The average of theiteration numbers required to reach the fixedpoints may be defined as relaxation time t (i.e.,averaged over all the patterns) for the ‘‘distorted’’patterns considered. If one starts with a patternobtained by distortion of any of the learnedpatterns by a fixed amount of corruption andcompares their relaxation time (to reach thecorresponding fixed point), one expects t toincrease as a increases (from zero) to a ¼ ac(for fixed N); the so called critical slowing down.In a recent study (Ghosh et al., 1990; see alsoChakrabarti and Dasgupta, 1992), systems rangingin size from N=1000 to N=16,000 neurons wereinvestigated numerically, in the absence of noise(zero temperature). For a fixed but small initialcorruption, the variation of the average conver-gence or relaxation time (t) with storage capacitywas found as t � exp½�Aða� acÞ� where A is aconstant dependent on system size N (Ghosh et al.,1990). This is illustrated in Fig. 1.
It may be noted that this critical slowing downof the recall process near ac is somewhat unusual
ial loading factor a for initial overlap m(0)=0.80 (circles) and
=1000. The inset shows how t variations fit with the form
late 13.1 in color plate section.)
158
compared to such phenomena near the phasetransition points. Below ac, the local minima or themetastable states act as the overlap or memorystates (not those actual minima, which are spin-glass states) and these metastable states (withoverlap) disappear at a4ac. This transition at ac istherefore not comparable to a phase transition,where the states corresponding to (free) energyminima change. Here, the minimum energy statesremain as the spin-glass states for any a40:05 (seeHertz et al., 1991); only the overlap metastablestates disappear at aX0.14.
A logical extension is to study the variation of twith synaptic noise. After the learning stage, thepatterns were again distorted (by changing the spinstates at a fixed small fraction of the sites orneurons) and retrieval is governed by the Glaubertype dynamics:
siðtþ 1Þ ¼ sgnðhiÞ with probability1
½1þ expð�2hi=TÞ�
¼ � sgnðhiÞ otherwise ð5Þ
Here T is the temperature parameterizing thesynaptic noise and hi is given by (4). Here, ofcourse, instead of looking for the attractors (whichexist only for T=0), one measures the averageoverlap (with the learned patterns) that develops intime and looks for its relaxation to some equili-brium value. Again, the relaxation time increasesabruptly as the metastable overlap states disappearacross the phase boundary in the a� T plane.Indeed, the (dynamical) phase boundary obtainedhere (Sen and Chakrabarti, 1989, 1992), studyingthe divergence of relaxation time, comparesfavorably with that obtained from the static(free energy) study. It was found that the ave-rage relaxation time t grows approximately asexp½ðT � TmÞ
2� near the phase boundary at TmðaÞ,
when aoac.
Growth of correlations in the Hopfield model
The Hopfield model is solvable in the thermo-dynamic limit. In the exact solution of the model(see Hertz et al., 1991), it is necessary to assumethat the learned patterns are completely random
and uncorrelated. Models with correlated patternshave been investigated, where average correlationbetween all the patterns have been kept finite.A different problem in which only two of thepatterns are correlated (i.e., have finite overlap), sothat average correlation is still irrelevant, hasrecently been studied (Chakrabarti and Dasgupta,1992). The aim was to investigate the ability (orlack of it) of the Hopfield model to store thecorrelation between two (arbitrary) patterns. Therestriction of the correlation between two patternsonly ensures that the critical memory loadingcapacity remains unaffected at B14%; i.e., acnearly 0.14. The variation of final correlationbetween learned patterns with the initial correla-tion between a randomly chosen pair of patterns(denoted by 1 and 2) to be learned was measured.The initial correlations between the patterns aregiven by rmn ¼ SiB
mi B
ni =N, where rmn ¼ r0 for m=1
and n=2, and rmn ¼ 0 for other values of m and n.The system is then allowed to evolve fromthe initial states sð1Þi ð0Þ ¼ B1i and sð2Þi ð0Þ ¼ B2ifollowing the Hopfield dynamics (3) and (4). Thecorrelation between the states s(1) and s(2) isrf ¼ Sis
ð1Þi ðt�Þsð2Þi ðt�Þ=N, where sð1Þi ðt�Þ and sð2Þi ðt�Þ
denote two states after reaching their respectivefixed points. In simulation, the system size is takenas N = 500; the patterns between which the corre-lation is introduced are selected randomly andthe averaging is done over 25 configurations foreach value of r0 and a. When the ratio rf=r0 isplotted against r0 for different loading a it is foundthat the ratio rf=r0 � 1; i.e., the Hopfield modelretains correlation of a particular pair of patterns;either it keeps the same correlation or enhancesit (see Figs. 3 and 4 of Chakrabarti and Dasgupta,1992). For studying the r0 ! 0 limit moreaccurately, a larger system of N=1000 wastaken. The critical value rc of average r0 uptowhich the ratio remains unity (and above whichit starts deviating towards higher values), ismeasured against different loading capacity a. Asa increases, rc decreases continually and as aexceeds a value of the order of 0.05, rc reacheszero. Hence the Hopfield model generates somecorrelations, however small, for completely uncor-related patterns, above a certain value of a(=0.05).
159
Extended memory and faster retrieval with delay
dynamics
In the literature, there have been a number ofindications that dynamically defined networks,with asymmetric synaptic connections, may havebetter recall performance because of the suppres-sion of spin-glass-like states. See the appendix fora discussion on a spin glass model with asymmetricinteractions. The exactly solvable models (seeCrisanti and Sompolinsky, 1987) indeed uses thesame Hebb’s rule for connection strength but withextreme dilution (inducing asymmetry). This givesbetter recall properties. There were also indica-tions that addition of some local memory of eachneuron, in the sense that the internal field in (3) isdetermined by the cumulative effect of all theprevious states of the neighboring (as indeed isbiologically plausible), gives better recall dynamicsin some analog network models. In all these cases,no effective energy function can be defined(because of the asymmetry of or of the localmemory effect) and the use of statistical physics ofspin-glass-like systems is not possible. All thenetworks are thus defined dynamically. In a recentsimulation study it has been shown that adynamics with a single time step delay with sometunable weight l (Sen and Chakrabarti, 1992)
siðtþ 1Þ ¼ sgnXj
JijðsjðtÞ þ lsjðt� 1ÞÞ
" #(6)
instead of (3) along with (2), improves theperformance of the Hopfield-like model (withsequential updating) enormously. Here the samesynaptic connections Jij in Eq. (1), obtained byusing Hebb’s rule, are employed. In particular, fora network with N=500, the simulations indicatedthat the discontinuous transition for the overlapfunction at ac � 0:14 disappears and the overlapm
mf ¼
Pisiðt
�Þzmi , for m-th pattern becomes contin-uous. Accepting recall with final overlap m � 0:9,the effective threshold loading capacity acincreases from 0.14 for l=0 to B0.25 for l � 1.It was also found that average recall time orrelaxation time t (defined before) becomes mini-mum around l � 1 at any particular loadingcapacity a.
The above interesting and somewhat surprisingresults for this single step time delayed dynamics arefrom simulations of a rather small size (N=500)of the network. Here we present the results of adetailed numerical study of this dynamics, bothwith asynchronous (sequential as well as random;Hopfield-like) and synchronous (Little-like) updat-ing. We check our numerical results for a biggernetwork (for 250pNp4000) and study if there isany significant finite size effect. We find that thefixed point memory capacity improves consider-ably with l, both with Hopfield and Littledynamics. We present here the results of a detailednumerical study of this dynamics, both withasynchronous (sequential as well as random;Hopfield-like) and synchronous (Little-like) updat-ing. With 90% final overlap (mfX0.90) with thelearned patterns, the loading capacity increases toalmost 0.25 for lB1. The relaxation or averagerecall timer, for any fixed loading capacity a(p0.25), is also minimum at lB1. We could notdetect any significant finite size effect (for250pNp4000) and we believe that this result istrue in the thermodynamic limit. We also observethat for l=1, both the Little dynamics andHopfield dynamics give identical results (all thelimit cycles of the Little model disappear to givefixed points). The performance of the network formemory storage in limit cycles for negative l hasalso been studied for Hopfield and Littledynamics. Here, the negative l term provides adamping and limit cycles become frequent but theoverlap properties deteriorate significantly. Wehave also studied the performance of suchdynamics for randomly diluted networks. In thenext section we discuss the significance of theseresults.
Simulations and numerical results: Hopfield and
Little models
We consider a network of N neurons (spins). Thesynaptic interactions between neurons i and j arefound using the Hebb’s rule (1) for a set ofp-random Monte Carlo generated patterns. Afterthis ‘learning’ stage, each pattern is corruptedrandomly by a fixed amount (typically 10% of the
Fig. 2. The variation of the average final overlap mf (for the fixed point obtained) with lR (=(1�l)(1+l)), positive l at a
fixed loading capacity a=0.25, with sequential dynamics for networks with N=250 (solid line), 500 (dashed line) and 1000
(dash-dot line). The inset shows the variation of average recall or relaxation time t against lR for different N (adapted from Maiti
et al., 1995).
160
neuron states are changed randomly; with initialoverlap m
mi ¼ 0:9 for any pattern m) and the
dynamics (updating process) following (3) startswith these corrupted patterns. Both sequential(Hopfield) and parallel (Little) updating areemployed. Since our updating process requirestwo initial patterns (at (t � 1) and (t � 2)), we startwith two randomly distorted patterns with thesame value of distortion (for each m). We study theoverlap of the network state with the pattern mduring the updating process and the final overlapm
mf of the fixed points (checked for two successive
time steps) or limit cycle patterns are noted. Theaverages over all p patterns (and over two to fivesets of random number seed for the entire run) aretaken (average of over all p patterns). The numberof iterations (t�) required to reach the fixed points(or limit cycles) are also noted and their average(over the patterns) give the average recall orrelaxation time (t). Our typical results for theHopfield model (dynamics) are for N=1000 andthose for the Little model are for N=500. We
consider mostly positive l values. Negative lvalues do not improve the performance andhave been studied just for some typical investiga-tions only.
In Fig. 2, we give the variation of the finalaverage overlap m (of any pattern) for thefixed point (checked over two successive timesteps) with the redefined l [lR=(1�l)(1+l)] atfixed loading capacity a � p/N=0.25 for sequen-tial as well as random (Glauber) updating. It maybe noted that for the Hopfield dynamics (l=0)the final overlap mfo0:30 for a=0.25. Strictlyspeaking, this value should be zero in the thermo-dynamic limit. The results shown in Fig. 2 arefor N=250, 500 and 1000, and the nonvanishingvalue of mf for l ¼ 0 ðlR ¼ 1Þ at a=0.25 isprecisely due to this finite size effect as can beeasily checked (decreases considerably as N be-comes large). It is seen that the final overlap gra-dually improves with increasing l and it saturatesbeyond l ¼ 1 ðlR ¼ 0Þ to mj � 0:90 (for a=0.25).The inset figure shows that the average recall time
161
(or relaxation time) t and the fraction of limitcycles also decreases as l increases from l=0 andfinally attains an optimal value at l=1. Recallthat time t again shows a minimum at l=1 asin the Glauber (Hopfield) case. An interestingobservation is that at l=1 the limit cycles ofthe Little model (parallel dynamics) disappearand all reach to fixed points with identical over-lap properties as in the case of the Hopfieldmodel.
For sequential updating, the variation of theaverage final overlap mf with the loading capacitya, for some typical positive values of l are shownin Fig. 3. The results are for N=500. The insethere shows variation of average recall time (t) witha for different l. For negative values of l even forsequential updating, one gets limit cycles (e.g., thefraction of limit cycles is B0.48, for l=�0.8 ata=0.25) and the fixed point fraction decreasesfurther for parallel (Little) dynamics. However,the overlap with the learned patterns decreasesconsiderably and we do not see any significant
Fig. 3. Average final overlap mf (for the fixed point obtained) at a fixe
lR (=(1�l)(1+l)); positive l). The results are for parallel updating (Ldynamics (Hopfield model) by circle (filled or unfilled). The inset show
cycles against lR (adapted from Maiti et al., 1995).
memory capability (for storage in limit cycles)for negative values of l (typically mf ffi 0:42 forl=0.8 at a=0.25, using sequential updating)(Fig. 4).
In order to investigate the finite size effect in theimproved performance of the network for positivel values (for sequential dynamics), we studied thevariation of average final overlap with 1/N at fixedloading capacity a (=0.20 and 0.25) and fixedl (=1). These results, for 250 p N p 4000, areshown in Fig. 5. The inset there shows the samevariation of with l/N at fixed a (=0.25) for l=0.One can clearly compare the finite size effectsobserved: for l=1, no significant variation of with1/N is observed for both a=0.20 and 0.25(mf ffi 0:4 for a=0.20, and mf ffi 0:8 for a=0.25at l=1 as N ! 1). For l=0, however thereis significant finite size effect. From the extrapola-tion we find the extrapolated result for mf
(for large N) to be around 0.29 here at this a(=0.25); see the inset. See Fig. 5 for the variationsof mf with the loading a for various values of l.
d loading capacity a= 0.25, for a network with N=500 against
ittle model), shown by square (filled or unfilled); for the Glauber
s the variation of average recall time t and the fraction of limit
Fig. 4. Variation of average final overlap mf against 1/N for fixed leading capacity a=0.2 and a=0.25 for l=1. The inset shows the
same for a=0.25 at l=0. The results are for 250pNp4000 (adapted from Maiti et al., 1995).
Fig. 5. Variation of average final overlap mf against loading capacity for some typical values of l (l=0 for the bottom curve and
l=0.2, 0.5, 1.0, 1.5, 2 successively upwards, with sequential dynamics for N=1000). The inset shows the variation of the recall time tagainst a for the same values of l.
162
Fig. 6. The variation of the configurational energy E(t) with time (updating iteration) t, for the sequential updating with l=4 for two
typical cases (arbitrary distorted patterns) before and after reaching the fixed points. The inset shows the same for a distorted pattern
with l=0 dynamics (adapted from Maiti et al., 1995).
163
It is to be noted that for positive l, the over-lap and other properties saturate beyondl=1 (lR=0) for any a. This is true even for verylarge values of l in the updating (6). This issomewhat tricky in the sense that apparently onetime step is almost skipped in every updating;practically {Si(t�2)} (together with only a smallfraction of {Si(t�1)}) determines {Si(t)} through(6) and is kept in a buffer for latter use. Simi-larly {Si(t�1)} from the buffer is used to deter-mine {Si(t=1)}. We have checked independentlythe performance of such an updatingðSiðtþ 1Þ ¼ sgn½SjJijðl0SjðtÞ þ lSjðt� 1Þ�, insteadof (6), with l0 ! 0). We find for lu=0, theHopfield model result is recovered. In fact, thiscan be seen from Fig. 3, where the overlap mf
decreases rapidly for very large values of l in (6)(lX10). It may be noted, however, that theeffective size of the domain of attraction of thefixed points depends very much on the ratio l/lu.This can be seen easily in the limit l ! 0 but lufinite (or l0 ! 0 but l finite), where the change inenergy of the network per updating step depends
on the value of lu (or l), which determines theeffective shape of the energy landscape seen by thedynamics.
As can be easily seen, for nonzero l, the updatingdynamics in (6) does not minimize the ‘energy’function E(t) defined as EðtÞ ¼
PJijSiðtÞSjðtÞ as
the delay term contributes in the internal field.Some typical variations in E(t) with iterations(updating) are shown in Fig. 6 for a=0.5 andl=4. (The large values of a and l are chosen hereto ensure larger t, so that the effect can be displayedover longer time range.) The results clearly showthe contribution of the l term in escaping over thespurious valleys, in the ‘‘energy’’ landscape. Theinset in Fig. 6 shows the same variation for E(t) inthe l=0 (Hopfield) case. Here, of course, it clearlydecreases monotonically with time.
We have also studied the performance of thisdynamics for diluted network (with sequentialdynamics; N=500), where a random fraction c ofthe synaptic connections Jij are removed. We showin Fig. 7, some results for overlap mf against l at afixed loading capacity l=0.20 for some typical
Fig. 7. The variation of average final overlap with, positive l at a fixed capacity a ¼ 0:20 with sequential dynamics for N=500 at some
typical values of c (=0.10, 0.20 and 0.25; the c=0.1 data being at the top). The inset shows the variation of with a at l=1, for the same
value of c (adapted from Maiti et al., 1995).
164
values of dilution concentration c. The inset showsthe variation of mf against a at a fixed l (=1) forsome typical values of c.
Summary
In this paper, we have reviewed several existingnumerical results on neural network modelsobtained by our group. The models used herehave synaptic connections constructed by usingthe Hebb’s rule. The dynamics is governed bythe internal field given by (3) and (6). Some(new) analytic results for asymmetric interactionsin a related spin glass model is given in theappendix.
Acknowledgment
We are grateful to C. Dasgupta, P. Sengupta,M. Ghosh, G. A. Kohring, P. Maity, A. K. Sen
and P. Sen for collaborations at various stages ofthe above studies and to C. Dasgupta for a criticalreading of the manuscript.
Appendix: A spin glass model with asymmetric
interactions
A. The dynamical model
We discuss an asymmetric spin-glass model in themean-field limit. We follow the notations ofCrisanti and Sompolinsky (1987). The modelconsists of N fully connected spins interacting viarandom asymmetric interactions. The details andmore complete studies of the same would bediscussed elsewhere (Basu, 2007). We choose theinteraction matrix between spins i and j to havethe form
Jij ¼ Jsij þ kJa
ij ; k � 0 (7)
165
where Jsij and Ja
ij are symmetric and antisymmetriccouplings respectively, such that
Jsij ¼ Js
ji; Jaij ¼ �Ja
ji (8)
Each of Jsij and Ja
ij are zero-mean Gaussiandistributed random variables with the variance
½ðJsijÞ
2� ¼
J2
N
1
1þ k2(9)
As in Crisanti and Sompolinsky (1987) squarebrackets denote the quench average with respectto the distribution. The parameter k measuresthe degree of asymmetry in the interactions.Similar to Crisanti and Sompolinsky (1987),Eq. (9) implies
N J2ij
h i¼ J2; N JijJji
� �¼ J2 1� k2
1þ k2(10)
Therefore, from the above it is clear that fork=0 the model reduces to the ordinary symmetricspin glass model with infinite range interactions(Sompolinsky and Zippelius, 1982), whereas k=1corresponds to a fully asymmetric model (Crisantiand Sompolinsky, 1987).
The dynamics for this model is defined by anappropriate Langevin equation for the i-th spin
G�10
@
@tsiðtÞ ¼ � r0siðtÞ �
dV ðsiÞdsiðtÞ
þ SjJijsjðtÞ þ hiðtÞ þ ziðtÞ ð11Þ
In our model we consider a soft spin whichvaries continuously from �N to N. The localpotential V ðsiÞ is as chosen in Crisanti andSompolinsky (1987) and is an even function ofsi. The function hi(t) is a local external magneticfield. Further, the stochastic function ziðtÞ is azero-mean Gaussian distributed variable with avariance
hziðtÞzjðt0Þi ¼
2T
g0dðt� t0Þdij (12)
Note that the above choice of noise correlationswould have validated the FDT relating the
appropriate correlation functions and the corre-sponding response functions for the fully sym-metric (k=0) problem. Furthermore, the staticproperties of the fully symmetric version of themodel can be derived from an appropriateBoltzmann distribution without any reference tothe underlying dynamics. However, for the presentcase with a finite k there is no FDT and,equivalently, no Boltzmann distribution existsfor the problem. As discussed in Crisanti andSompolinsky (1987) and as we show below thismakes the finite k problem far more complicatedas compared to its fully symmetric version.
B. Dynamic generating functional
The generating functional for the correlation andthe response functions for the model Eq. (11) is
Z½f; f� ¼Z
DsDs expXi
L0ðsi; siÞ
"
þ1
2SJs
ijfsjðtÞsiðtÞ þ sðt0Þg
þ1
2SJa
ijfsjðtÞsjðtÞ � siðtÞsiðt0Þg
#ð13Þ
where L0 is the local part of the action:
L0 ¼
ZdtisðtÞ½�G�1
0 @tsiðtÞ � r0siðtÞ � 4us3i þ hiðtÞ
þ TG�10 isiðtÞ�
We now average over the distribution of Jij toobtain
hZi ¼
ZDsDs exp L0ðsi; siÞþ2J2sjðtÞsiðtÞsjðt0Þsiðt0Þ
�
þ2siðtÞsjðt0Þsiðt0ÞsjðtÞJ2 1� k2
1þ k2
�ð14Þ
Here, the field s is the MSR conjugate variable(Martin et al., 1973; Sompolinsky and Zippelius,1982).
We now take the mean-field limit, which in thepresent context, is equivalent to assuming infiniterange interactions (i.e., all spins are interactingwith each other). In that limit, fluctuations, which
166
are O(1/N), are negligible (Sompolinskyand Zippelius, 1982). We then linearize theremaining quartic terms by Hubbard–Stratono-vich transformation (Sompolinsky and Zippelius,1982). To simplify further, in the limit N-p,
we substitute the stationary point values.Furthermore, we set due to causality. We thusobtain
hZi ¼
ZDsDs exp L0 þ
b2J2
2
�
Zdtdt0 Cðt� t0ÞisiðtÞisiðt0Þ
�
þ 21� k2
1þ k2Gðt� t0ÞisiðtÞsiðt0Þ
��ð15Þ
The effective equation of motion, correspondingto the generating functional (15) above is
G�10
@
@tsiðtÞ ¼ r0siðtÞ � 4us3i þ
J2
T2
1� k2
1þ k2
Z t
�1
dt0Gðt� t0Þsiðt0Þ þ ciðtÞ ð16Þ
where the effective noise is zero-mean, Gaussiandistributed with a variance
hciðtÞcjðt0Þi ¼
2
G0dðt� t0Þdij þ b2J2Cðt� t0Þdij
(17)
or in the Fourier space
siðoÞ ¼ G0ðoÞ½cðoÞ þ hiðoÞ� � 4uG0ðoÞ
ZdO1dO2sjðO1ÞsjðO2Þsiðo� O1 � O2Þ
ð18Þ
Further
G0ðoÞ ¼ r0 � ioG�1 �1� k2J2
1þ k2T2GðoÞ (19)
and the effective noise correlation is given by
hciðoÞcjð�oÞi ¼ 2dij2
G0þ
J2
T2CðoÞ
� �(20)
Note that, unlike the case of symmetric couplings,the effective noise no longer does not haveany relation with the effective propagator (19).Therefore, the correlation function C(t) andthe propagator G(t) are independent of eachother in contrast to the case with symmetriccouplings where the two are related by the FDT(Sompolinsky and Zippelius, 1982; Crisanti andSompolinsky, 1987). Further, for the convenienceof our subsequent analyses we write the effectivenoise ciðtÞ as the sum of three parts
ciðtÞ ¼ ZiðtÞ þ xiðtÞ þ zi (21)
Here
hZiðoÞZjð�oÞi ¼ dij2
G0þ
J2
T2
1� k2
1� k2CðoÞ
� �(22)
such that CðoÞ ¼ ð2T=oÞIGðoÞ, where I refers tothe imaginary part. Thus, Zi is the part of the totaleffective noise which respects the FDT. Thefunction C(t) is the part of the correlator C(t)which is related to the propagator G(t) through theFDT. We further define C(t)�C(t)�q(t) such thatqðtÞ ¼ ½hsiðt0ÞiZhsiðtþ t0ÞiZ�xizi . Here, Zi is the time-persistent part of the Gaussian noise with a variance
hziðoÞzjð�oÞi ¼ 2pdðoÞb2J2qdij (23)
This then yields
hxiðoÞxjð�oÞi ¼ 2pdi2k2
1þ k2b2J2CðoÞ
�
þ b2fqðoÞ � qdðoÞg�
ð24Þ
With these effective noise correlators and propa-gator we now proceed to consider the dynamics athigh temperature.
C. Dynamics at high temperature
In the high temperature phase there is no timepersistent correlation. We begin by defining adamping function
G�1ðoÞ ¼ i@G�1
@o(25)
167
Eqs. (19) and (25) together with the Dyson’sequation yield
G�1ðoÞ ¼G�10 þ i @S@o
1� J2
T2 G2ðoÞ 1�k2
1þk2
(26)
Equation (26), therefore, suggests that the inverseof the effective relaxation time G (o=0) has adivergence at a critical temperature given by
T c ¼ JGð0Þ
ffiffiffiffiffiffiffiffiffiffiffiffiffi1� k2
1þ k2
s(27)
This, however, holds provided @S=@o hasa finite limit for o ! 0. We now argue belowin favour of such a result: we begin by noting thatfor k=0, i.e., for the symmetric coupling case,GðoÞ �
ffiffiffiffic
pin the small-o limit, which in turn
implies GðoÞ �ffiffiffiffio
p(Sompolinsky and Zippelius,
1982). We then note that (o), a part of the totalcorrelation C(o), is related to the propagatorG(o) and hence has a small-o dependence ofthe form 1=
ffiffiffiffiw
p. The total correlator C(o) is
given by
CðoÞ ¼ GðoÞG�ðoÞ G�1ðoÞ þ CðoÞ� �
(28)
indicating that C(o) is no more singular than1=
ffiffiffiffio
p. We are then led to the result that
@S=@ojo ¼ 0 is finite. We, therefore, concludethat for TWTc spin fluctuations decay asj1� T=Tcj
�1, where Tc is given by (27).
D. Statics below
Having argued in favour of the possibility of acritical temperature Tc we now consider the staticproperties below Tc. We begin by setting up theFokker–Planck equation (Chaikin and Lubensky,2000) which governs the time-evolution of theprobability distribution of the configurations of si.In particular, we consider the probability distribu-tion P1ðs; tjs0; t0Þ � hdðsi � siðtÞÞis0; t0, which isthe probability of the configuration {s} at time t.
This implies
P1ðs; tþ Dtjs0; t0Þ
¼
ZDs0P1ðs; tþ Dtjs0; tÞP1ðs0; tjs0; t0Þ ð29Þ
We now calculate the conditional probabilityP1ðs; tþ Dtjs0; tÞ from the equation of motion: aTaylor series expansion yields
siðtþ DtÞ ¼ siðtÞ � �r0siðtÞ �dVdsi
�þJ2 1� k2
1þ k2
Z t
�1
dt0Gðt� t0Þsiðt0Þ�G0Dt
þ
Z tþDt
t
ciðt0Þdt0
þ1
2
Z tþDt
t
Z tþDt
t
cðt1Þcðt2Þdt1dt2
ð30Þ
Here, the noise c ¼ Zi þ xi þ zi. We now per-form averages over Zi and xi which are zero-meanand Gaussian distribute with variances discussedabove. After simplifications we finally find
hd½si � siðtþ DtÞ�is0it0
¼ 1þ Dt �r0si �dVdsi
� �
þ J2 1� k2
1þ k2
Z tþDt
t
dt0Gðt� t0Þsiðt0Þ � ziDt�@P1
@si
þ DtJ2
Z Dt
�Dt
2k2
1þ k2J2Cðt0ÞJ2½qðt0Þ � q�
� �@2
@s2iP1
ð31Þ
yielding
@P1
@t¼ T
@
@si
1
T�r0si �
dVdsi
� � zi
�
þJ2 1� k2
1þ k2
Z tþDt
t
Gðt� t0Þsiðt0Þ
þJ2
Z Dt
�DtCðt0Þdt0 þ J2 2k2
1þ k2qðt0Þ � q½ �
�ð32Þ
168
The steady-state solution of Eq. (32), P1 (steady)can be obtained by setting @P1=@t to zero. FromP1 (steady) one would be able to calculateq � ½hsi2n;x�z. From Eq. (33) one notes thatq � ½hsi2n;x�z appears in Eq. (32). Therefore, onewould need an equation for the distributionP2 � ½hsiZðx; zÞ�. The equation is
@P2
@t¼
@
@s�r0si �
dVdsi
� zi � xiðtÞ
�
þ J2 1� k2
1þ k2
Z tþDt
t
Gðt� t0Þsiðt0Þ
þJ2 1� k2
1þ k2
Z Dt
�DtCðt0Þdt0
@
@si
�P2 ð33Þ
Clearly there is no steady-state solution sincehas an explicit time dependence. We formallywrite the solution of the Eq. (33) as P2ðtÞ ¼
exp½R t
0 LðtÞdt�P2ðt ¼ 0Þ where L is the operator
@
@s�r0si �
dVdsi
� zi � xiðtÞ
�
þ J2 1� k2
1þ k2
Z tþDt
t
Gðt� t0Þsiðt0Þ
þJ2 1� k2
1þ k2
Z Dt
�DtCðt0Þdt0
@
@si
�ð34Þ
and P2ðt ¼ 0Þ is the initial condition. The function(t) is then given by
qðtÞ ¼
ZDsP2ðt ¼ 0ÞDsDxDz
exp
Z t0
t
Ldt0
" #siðt0Þsiðt0 þ tÞP½x�P½z�
ð35Þ
Note that we have performed averaging overinitial conditions also. Functions P[z] and P[x] aredistributions of z and x respectively. Assumingthat the initial distribution is normalised we have
qðtÞ ¼
ZDsDxDz
exp
Z t0
0
Lðt0Þdt0
" #siðt0Þsiðt0 þ tÞP½x�P½z�
ð36Þ
Further, the mðzÞ � hsiZ;x ¼RDsP1½z�s. These
formally complete the discussions on the staticproperties below Tc. In summary we have inves-tigated a spin glass model with asymmetric spin–spin interactions in the mean-field limit. Due to theasymmetry there is no FDT in the model andconsequently analysis of the model becomes farmore complicated. Our result [Eq. (28)] suggeststhat the spin glass transition temperature isreduced in the presence of the asymmetry, andtherefore, indicates the possibility of highermemory capacity of neurons with such asymmetricsynaptic interactions. Thus asymmetric interac-tions tend to suppress the spin-glass phase. In thiscontext we refer to some recent related studies:Crisanti and Sompolinsky (1987) studied a sphe-rical model and the Ising version of the problemrespectively., and found no spin-glass phase at anyfinite temperature for any strength of the asym-metry. Further extensive studies on this problemare required for a satisfactory resolution of theissues raised here.
References
Amit, D.J. (1989) Modelling Brain Function. Cambridge
University Press, Cambridge.
Basu, A. (2007). Unpublished.
Chaikin, P.M. and Lubensky, T.C. (2000) Principles of
Condensed Matter Physics. Cambridge University Press,
Cambridge.
Chakrabarti, B.K. and Dasgupta, P.K. (1992) Phys. A, 186:
33–48.
Crisanti, A. and Sompolinsky, H. (1987) Phys. Rev. A, 36:
4922–4939.
Ghosh, M., Sen, A.K., Chakrabarti, B.K. and Kohring, G.A.
(1990) J. Stat. Phys., 61: p. 501.
Hertz, J., Krough, A. and Palmer, R.G. (1991) Introduction to
the Theory of Neural Computation. Addison-Wesley, Read-
ing, MA.
Maiti, P., Dasgupta, P. and Chakrabarti, B.K. (1995) Int. J.
Mod. Phys. B, 9: 3025–3037.
Martin, P.C., Siggia, E.D. and Rose, H.A. (1973) Phys. Rev. A,
8: 423–437.
Sen, P. and Chakrabarti, B.K. (1989) Phys. Rev. A, 40:
4700–4703.
Sen, P. and Chakrabarti, B.K. (1992) Phys. Lett. A, 162:
327–330.
Sompolinsky, H. and Zippelius, A. (1982) Phys. Rev. B, 25:
6860–6875.
Plate 13.1. Plot of average convergence time t as a function of the initial loading factor a for initial overlap m(0)=0.80 (circles) and
m(0)=0.95 (squares) at N=16,000 and m(0)=0.95 (diamonds) at N=1000. The inset shows how t variations fit with the form
t � exp½�Aða� acÞ� (adapted from Ghosh et al., 1990). (For B/W version, see page 157 in the volume.)
Plate 17.1. Representative spectrograms showing the segmentation of uttered words into corresponding syllables. The dotted line in
each spectrogram demarcates the syllables. (a) Utterance of the word ‘stronger’ by an English speaker. Dotted line segments syllables
‘stron’ and ‘ger’. (b)Utterance of the Hindi word ‘suraj’ by a Hindi speaker with syllabic segmentation ‘su’ and ‘raj’. (For B/W version,
see page 210 in the volume.)