implementing plastic weights in neural networks using low precision arithmetic

5
Implementing plastic weights in neural networks using low precision arithmetic Christopher Johansson , Anders Lansner School of Computer Science and Communication, Royal Institute of Technology, Lindstedtsv. 3 Stockholm,100 44, Sweden article info Article history: Received 21 July 2007 Received in revised form 19 January 2008 Accepted 25 April 2008 Communicated by G. Palm Available online 10 May 2008 Keywords: Neural networks Plastic weights Exponentially weighted moving average Leaky integrator Fixed-point arithmetic Low precision variables abstract In this letter, we develop a fixed-point arithmetic, low precision, implementation of an exponentially weighted moving average (EWMA) that is used in a neural network with plastic weights. We analyze the proposed design both analytically and experimentally, and we also evaluate its performance in the application of an attractor neural network. The EWMA in the proposed design has a constant relative truncation error, which is important for avoiding round-off errors in applications with slowly decaying processes, e.g. connectionist networks. We conclude that the proposed design offers greatly improved memory and computational efficiency compared to a naı¨ve implementation of the EWMA’s difference equation, and that it is well suited for implementation in digital hardware. & 2008 Elsevier B.V. All rights reserved. 1. Introduction One of the most important advantages of neural networks over more traditional types of algorithms are their ability to learn and adapt to new inputs, and this requires that the networks have plastic weights. A difficulty often overlooked in many hardware implementations of neural networks is how to incorporate learning. A requisite for building efficient digital hardware is that all computations are done using low precision fixed-point arithmetic, and implementing plastic weights under this con- straint is non-trivial [1]. In this letter we propose and analyze an implementation of plastic weights using this type of arithmetic. The weights in this design are computed using an exponen- tially weighted moving average (EWMA), and the focus of this paper is on how to implement this EWMA efficiently. The EWMA is computed in the logarithmic domain and therefore it has a constant relative truncation error. This implementation can handle small numerical values, which arise in slowly decaying processes, without serious round-off errors. The targeted use of the proposed algorithm is digital hardware implementations of integrate-and-fire type of neural networks [14] and in particular networks that have dynamic synapses [8,9,15]. This type of networks are used to model networks of nerve cells as well as in data analysis and data mining applications [13]. In fact, the EWMA represents a very general type of computation that is present in an even wider range of applications; statistical process control [17], signal processing [12], biophysically detailed models of neurons [5], and reinforcement learning [16], just to mention a few. In these applications EWMA:s are referred to as either leaky integrators or first-order low-pass filters. The EWMA can also be interpreted as an instance of the Bayesian m-estimate that is used to estimate parameters from conditional probabilities in a large number of different algorithms, e.g. the naı ¨ve Bayesian classifier [11]. In the experimental evaluation of the proposed design of the EWMA we study its properties when used in a BCPNN-network [3,6,7,15] that instantiates an attractor memory. A BCPNN-net- work is quite similar to a Hopfield-network but employs a Hebbian–Bayesian learning rule. What distinguishes this rule from standard Hopfield learning rule is that it has more complex dynamics in its synaptic connections. A network with this type of dynamics in the synaptic connections can exhibit a range of interesting behaviors when used as a model for human memory [15] and in sparse associative memories [3,4]. Here, we describe a fixed-point arithmetic implementation of the EWMA that makes hardware as well as software implementa- tions of the mentioned applications more efficient with respect to memory and computation. 2. Exponentially weighted moving average In this section, we first derive the EWMA and show how it can be formulated as a first-order differential equation. Then, we show ARTICLE IN PRESS Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/neucom Neurocomputing 0925-2312/$ - see front matter & 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2008.04.007 Corresponding author. E-mail addresses: [email protected] (C. Johansson), [email protected] (A. Lansner). Neurocomputing 72 (2009) 968– 972

Upload: christopher-johansson

Post on 10-Sep-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Implementing plastic weights in neural networks using low precision arithmetic

ARTICLE IN PRESS

Neurocomputing 72 (2009) 968– 972

Contents lists available at ScienceDirect

Neurocomputing

0925-23

doi:10.1

� Corr

E-m

journal homepage: www.elsevier.com/locate/neucom

Implementing plastic weights in neural networks usinglow precision arithmetic

Christopher Johansson �, Anders Lansner

School of Computer Science and Communication, Royal Institute of Technology, Lindstedtsv. 3 Stockholm, 100 44, Sweden

a r t i c l e i n f o

Article history:

Received 21 July 2007

Received in revised form

19 January 2008

Accepted 25 April 2008

Communicated by G. Palmtruncation error, which is important for avoiding round-off errors in applications with slowly decaying

Available online 10 May 2008

Keywords:

Neural networks

Plastic weights

Exponentially weighted moving average

Leaky integrator

Fixed-point arithmetic

Low precision variables

12/$ - see front matter & 2008 Elsevier B.V. A

016/j.neucom.2008.04.007

esponding author.

ail addresses: [email protected] (C. Johansson),

a b s t r a c t

In this letter, we develop a fixed-point arithmetic, low precision, implementation of an exponentially

weighted moving average (EWMA) that is used in a neural network with plastic weights. We analyze the

proposed design both analytically and experimentally, and we also evaluate its performance in the

application of an attractor neural network. The EWMA in the proposed design has a constant relative

processes, e.g. connectionist networks. We conclude that the proposed design offers greatly improved

memory and computational efficiency compared to a naıve implementation of the EWMA’s difference

equation, and that it is well suited for implementation in digital hardware.

& 2008 Elsevier B.V. All rights reserved.

1. Introduction

One of the most important advantages of neural networks overmore traditional types of algorithms are their ability to learn andadapt to new inputs, and this requires that the networks haveplastic weights. A difficulty often overlooked in many hardwareimplementations of neural networks is how to incorporatelearning. A requisite for building efficient digital hardware is thatall computations are done using low precision fixed-pointarithmetic, and implementing plastic weights under this con-straint is non-trivial [1]. In this letter we propose and analyze animplementation of plastic weights using this type of arithmetic.

The weights in this design are computed using an exponen-tially weighted moving average (EWMA), and the focus of thispaper is on how to implement this EWMA efficiently. The EWMAis computed in the logarithmic domain and therefore it has aconstant relative truncation error. This implementation canhandle small numerical values, which arise in slowly decayingprocesses, without serious round-off errors. The targeted use ofthe proposed algorithm is digital hardware implementations ofintegrate-and-fire type of neural networks [14] and in particularnetworks that have dynamic synapses [8,9,15]. This type ofnetworks are used to model networks of nerve cells as well asin data analysis and data mining applications [13]. In fact, theEWMA represents a very general type of computation that is

ll rights reserved.

[email protected] (A. Lansner).

present in an even wider range of applications; statistical processcontrol [17], signal processing [12], biophysically detailed modelsof neurons [5], and reinforcement learning [16], just to mention afew. In these applications EWMA:s are referred to as either leakyintegrators or first-order low-pass filters. The EWMA can alsobe interpreted as an instance of the Bayesian m-estimate that isused to estimate parameters from conditional probabilities in alarge number of different algorithms, e.g. the naıve Bayesianclassifier [11].

In the experimental evaluation of the proposed design of theEWMA we study its properties when used in a BCPNN-network[3,6,7,15] that instantiates an attractor memory. A BCPNN-net-work is quite similar to a Hopfield-network but employs aHebbian–Bayesian learning rule. What distinguishes this rulefrom standard Hopfield learning rule is that it has more complexdynamics in its synaptic connections. A network with this type ofdynamics in the synaptic connections can exhibit a range ofinteresting behaviors when used as a model for human memory[15] and in sparse associative memories [3,4].

Here, we describe a fixed-point arithmetic implementation ofthe EWMA that makes hardware as well as software implementa-tions of the mentioned applications more efficient with respect tomemory and computation.

2. Exponentially weighted moving average

In this section, we first derive the EWMA and show how it canbe formulated as a first-order differential equation. Then, we show

Page 2: Implementing plastic weights in neural networks using low precision arithmetic

ARTICLE IN PRESS

C. Johansson, A. Lansner / Neurocomputing 72 (2009) 968–972 969

how it is computed in the logarithmic domain before we presentour fixed-point arithmetic implementation.

2.1. Definition

Eq. (1) describes an EWMA in continuous time

tdpðtÞ

dt¼ sðtÞ � pðtÞ (1)

where sA(0,1) is a time varying input, p is a continuously updatedestimate of s, and t is a plasticity parameter that controls thespeed by which the value of p approaches s. The numericalsolution of Eq. (1), using Euler’s forward method, is

ptþ1 ¼ pt þst � pt

t(2)

In Eq. (2) the EWMA is described as a difference equation. Therecursion in Eq. (2) weights older values of the input s withdecreasing powers of t according to

pt ¼ðt� 1Þt

ttp0 þ

Xt�1

k¼0

ðt� 1Þt�k�1

tt�ksk (3)

By substituting l ¼ 1/t we can rewrite Eq. (3) as Eq. (4) whereolder values of the input s are weighted with increasing powers ofl

pt ¼ ð1� lÞtp0 þ lXt�1

k¼0

ð1� lÞt�k�1sk (4)

Eq. (4) clearly shows that an EWMA places more emphasis onthe most recent samples as opposed to a moving average thatweights all the collected samples equally.

2.2. Conversion to the logarithmic domain

The EWMA in Eq. (1) can be computed in the logarithmicdomain if the state variable p in the linear domain is replaced by avariable w ¼ loga p, representing the state in the logarithmicdomain. We show that Eq. (2) can be written as a differenceequation in the logarithmic domain. The starting point for thederivation is Eq. (1) which is converted into Eq. (5) by substitutingp with aw, and dp with aw log a dw:

tdwðtÞ

dt¼

1

log aðsðtÞa�wðtÞ � 1Þ (5)

Here, a is an arbitrary base. Eq. (5) cannot be effectively solvednumerically with Euler’s method because it is stiff. One solution isto use a more sophisticated numerical method but this means alarge amount of complicated computations. Instead we analyti-cally solve Eq. (5) for a constant input s, which gives us Eq. (6)

wt ¼ logaðs� e�t=tðs� aw0 ÞÞ (6)

Now we set the time-step to 1, and again we have a differenceequation where s can vary with time:

wtþ1 ¼ logaðst � e�1=tðst � awt ÞÞ (7)

An interesting property of Eq. (6) is that if s ¼ 0 and a ¼ e (thenatural exponent) the equation is reduced to wt ¼ w0�t/t. Thismeans that w is subject to a linear decay when the input isconstantly zero, which requires very few operations. The inputs ¼ 0, is frequent in models of spiking neurons and of synaptictraces, and hence our algorithm enables a very efficient imple-mentation of these. Eq. (7) is defined for sA(0,1) and wA[�N, 0).But we intend to work with variables in the range (0, M) andtherefore we rewrite Eq. (7) as

wtþ1 ¼ M logMðst � e�1=tðst �Mwt=MÞÞ (8)

where sA(0, M), wA(0, M), and the time-step is set to 1. Similarly,we can write the linear decay, when s ¼ 0, on a general form:

wtþ1 ¼ wt þM logM e�1=t (9)

This makes it possible to implement these computations withinteger valued look-up tables.

The logarithmic values, w, of the EWMA can be converted backinto the linear domain by

p ¼ Mw=M�1 (10)

2.3. Implementation

In this section, we discuss how the EWMA transformed intothe logarithmic domain is implemented with discrete valuedvariables. By using probabilistic fractional bits (PFB) [2,10]repeated truncation errors are cancelled out by the use ofprobabilistic calculations. This means that PFB allows us torepresent fractional values over extended periods of time withoutbias using low precision variables.

In the following we will assume that a discrete implementa-tion is made with variables that have a precision of log2(M+1) bits;sA(0, M), wA(0, M). Random numbers are generated with aprecision of log2(K+1) bits in the range (0, K�1) by the functionR(). A maximum value of K�1 guarantees a finite possibility for adecay to take place. The function floor() returns the integer part ofa number and ceil() returns floor(x)+1 if xX0 and floor(x)�1 if xo0.The value of the plasticity parameter, t, is fixed but it is easy toextend the algorithm with multiple values of t, which of courseincreases the sizes of the look-up tables. To implement Eq. (8) weuse six look-up tables, three for the integer values

T1½x� ¼ floorðMx=MÞ 0pxpM

T2½x� ¼ floorðx e�1=tÞ 0pxpM

T3½0� ¼ 0; T3½x� ¼ floorðM logM xÞ 1pxpM

(11)

and three for the fractional bits

Tf 1½x� ¼ floorððK þ 1ÞðMx=M� T1½x�ÞÞ 0pxpM

Tf 2½x� ¼ floorððK þ 1Þðx e�1=t � T2½x�ÞÞ 0pxpM

Tf 3½0� ¼ 0;

Tf 3½x� ¼ floorððK þ 1ÞðM logM x� T3½x�ÞÞ 1pxpM

(12)

Similarly, we use two single-valued look-up tables to imple-ment the linear decay in Eq. (9), one for the integer values,T4 ¼ floor(�M logM e�1/t), and one for the fractional bits,Tf4 ¼ floor((K+1)(�M logM e�1/t

�T4)).Using these eight look-up tables we can now implement the

computation in both Eqs. (8) and (9) as

wtþ1 ¼

wt � Y if st ¼ 0 ^wtXY

PFBðst þ PFBð�X;2Þ;3Þ else if Xo0

PFBðst � PFBðX;2Þ;3Þ otherwise

8><>: (13)

Here, we have defined X�st�PFB(wt, 1) and Y�PFB(0, 4), andPFB(x, y) is the function that implements the probabilisticcomputation:

PFBðx; yÞ Ty½x� þ 1 if ðRðÞoTf y½x�Þ ^ ðTy½x�oMÞ

Ty½x� otherwise

((14)

We conclude this section by a summary of the implementation:

(i)

All eight tables are setup using particular values of thevariables t, K, and M. The w-variable is initialized to anarbitrary value in the range (0, M), e.g. 0.
Page 3: Implementing plastic weights in neural networks using low precision arithmetic

ARTICLE IN PRESS

C. Johansson, A. Lansner / Neurocomputing 72 (2009) 968–972970

(ii)

The w-variable is updated with an input, s, using Eq. (13). Inthe case s40 we start by computing X using tables T1 and Tf1,and then proceed by using tables T2, Tf2, T3, and Tf3. If s ¼ 0,we compute the decrement by first evaluating Y using tablesT4 and Tf4.

(iii)

We return to step (ii) with a new input, s.

3. Analysis

The computational requirement of the proposed implementationis 2 additions and 3 table lookups for an increment (s40) and 1addition and 1 table lookup for a decrement (s ¼ 0). In case of thetargeted use of this algorithm in implementations of spikingneurons, the decrements usually outnumber the increments by afactor of hundred or more. This means that the computationalrequirements are close to that of the decrement. In order totransform the w-values back to p-values in the linear domain anadditional table based on Eq. (10) is needed. But in our intendedapplication for this implementation, the logarithmic w-values aredirectly used to compute the weights in a neural network [15], andthus avoiding the extra table for conversion back to the lineardomain. These figures, 1 addition and 1 table lookup, should becompared with the naıve implementation, Eq. (2), that requires 2additions and 1 multiplication.

How many bits are needed for the variables and the randomnumbers? In the case of s40 there is no easy way to calculate theresulting variance because three non-linear tables are used in thecomputation. Experimentally, we have seen that the average relativeerror is low and without bias, given that the random numbers aregenerated with at least 3–4 bits. But for the linear decay, s ¼ 0, it ispossible to compute the mean and variance, which is somewhatsimilar to calculating the time it takes for a random walk process toreach a certain state. This is done in next section.

When using EWMA to compute the weights in an attractorneural network we have found that at least 4 bits are needed inthe T-tables (and also for the w-variables) and 2 bits in the Tf-tables (and also for the random numbers) in order to get a well-functioning network.

3.1. Analytical treatment of the zero input case

Here we study the decay process (s ¼ 0) of the proposedEWMA implementation. We want to calculate the mean andvariance of the time it takes for w, initialized with w ¼ M, to reachzero. To this end we compute the probabilities Pw(k) of w

becoming zero after k decrements. Here we study two cases, inthe first case t is small and T4 ¼ 0, and in the second case t issomewhat larger so that T4 ¼ 1.

For the first case, we let the probability of a decrement of w bePdec and we have chosen the values of K and Pdec, such thatTf4 ¼ floor(KPdec) ¼ 1. We also have that T4 ¼ 0. The probabilitiesPw(k) can now be computed as

PwðkÞ ¼k� 1

M � 1

� �PM

decð1� PdecÞk�M ; 8kXM (15)

The probability that w ¼ 0 after TXM decrements is

Prðw ¼ 0Þ ¼XT

k¼M

PwðkÞ (16)

After a finite number of decrements T there is a probability,Pr(w40)40, that w has not reached zero:

Prðw40Þ ¼XM�1

i¼0

T

i

� �Pi

decð1� PdecÞT�i (17)

Here, Pr(w ¼ 0)+Pr(w40) ¼ 1.

Eq. (15) describes a negative binomial distribution that has themean M/Pdec and variance M(1�Pdec)/(Pdec)2. Now, Pdec ¼ 1/K,which allows us to rewrite the mean and variance as; MK andMK(K�1). The mean number of decrements, that occur with aprobability Pdec ¼ 1/K, until w ¼ 0 is reached from w ¼ M iskmean ¼ MK. This implies that by increasing either of the twoconstants M and K, the time kmean during which w40 is increased.Given a constant kmean we can increase M and decrease K. Morememory is used but the variance is decreased. If we insteadincrease K and reduce M, we can still hold kmean constant. Thememory usage then goes down but the variance is increased.Hence there is a tradeoff between low variance and memoryusage.

We now turn to consider the second case where T4 ¼ 1 andTf4 ¼ 1, which is different from the previous case because w ¼ 0will always occur after a finite number of decrements (time). Wehave that w ¼ 0 can occur no earlier than after kmin ¼ ceil(M/(T4+1)) decrements and at latest after kmax ¼ M decrements. Weexamine two special cases where M ¼ (kmin(T4+1)) andM6¼(kmin(T4+1)), respectively. For the former case we can writethe distribution function Pw( � ) as

PwðkÞ ¼

Pkmax�kdec if k ¼ kmin

k� 1

2ðk� kminÞ � 1

!Pkmax�k

dec ð1� PdecÞ2ðk�kminÞ�1

þ

k� 1

2ðk� kminÞ

!Pkmax�k

dec ð1� PdecÞ2ðk�kminÞ

0BBBBB@

1CCCCCA else if kminokokmax

ð1� PdecÞk�1 else if k ¼ kmax

0 otherwise

8>>>>>>>>>>>>>><>>>>>>>>>>>>>>:

(18)

and in the latter case as

PwðkÞ ¼

ð1� PdecÞk�1 if k ¼ kmax

k� 1

2ðk� kminÞ

!Pkmax�k

dec ð1� PdecÞ2ðk�kminÞþ

k� 1

2ðk� kminÞ þ 1

!Pkmax�k

dec ð1� PdecÞ2ðk�kminÞþ1

0BBBBB@

1CCCCCA else if kminpkokmax

0 otherwise

8>>>>>>>>>><>>>>>>>>>>:

(19)

In a similar manner, the distribution function Pw( � ) can becomputed for different values of T4 but the expressions are evenmore involved. Using either of the distributions in Eqs. (18) and(19) it is possible to numerically calculate both the mean numberof decrements required to reach w ¼ 0 and the variance.

3.2. Experimental analysis

The accuracy of the proposed implementation was experimen-tally evaluated and compared to the naıve implementation,Eq. (2), by feeding it with a single binary event. Both implementa-tions use probabilistic fractional bits (PFB:s). The root meansquare error was measured as the trace declined to zero. Theresults shown in Table 1 are averages from many trials. As theprecision increased, t was increased as well as t ¼ 2M/2round(M/3).Scaling of t and the number of bits in this way is typical when oneimplements long-term memories of increasing size. In Table 1the root mean square error is given for the proposed, rmslog, andthe naıve, rmsnaıve, implementations. Table 1 shows that as theprecision of the variables and t are increased, the advantage ofusing the proposed implementation in terms of relative rms-errorincreases. Hence, when the size of a network increases, and thenumber of bits and t are increased, it becomes more advantageousto implement the network with the proposed design of theEWMA:s.

Page 4: Implementing plastic weights in neural networks using low precision arithmetic

ARTICLE IN PRESS

Table 1The root mean square error of the proposed (rmslog) and the naıve (rmsnaıve) implementation. For comparison, the quotient of these two is also given

Bits 6 8 10 12 14 16 18 20

rmslog 8�10�3 2�10�3 4�10�4 1�10�4 2�10�5 5�10�6 1�10�6 3�10�7

rmsnaıve 1�10�2 4�10�3 1�10�3 5�10�4 1�10�4 5�10�5 2�10�5 6�10�6

rmsnaıve/rmslog 2 3 3 5 6 12 16 25

5 10 15 200

10

20

30

40

50

60

70

Floating-point arithmetic, palimpsest

Floating-point arithmetic, non-palimpsest

bits

stor

ed p

atte

rns

proposed implementationnaïve implementation w. PFBnaïve implementation

C. Johansson, A. Lansner / Neurocomputing 72 (2009) 968–972 971

3.3. Numerical error

Computing the discrete valued EWMA in the logarithmicdomain means that it has a constant relative truncation error inthe linear domain. In the logarithmic domain the absolutetruncation error is constant, which can easily be realized byconsidering three consecutive discrete values; w1ow2ow3, wherew2�w1 ¼ w3�w2. This relation can be rewritten as 2w2 ¼ w1+w3.Now, considering the relative error of w1, w2, and w3 converted tothe linear domain by Eq. (10), we can show that the EWMA,computed in the logarithmic domain, has a constant relative errorin the linear domain:

Mw2=M�1

Mw1=M�1¼

Mw3=M�1

Mw2=M�1) M2w2 ¼ Mw1þw3 ) M2w2 ¼ M2w2 (20)

By implementing Eq. (10) as a look-up table, we can computethe EWMA in the logarithmic domain and then convert it, as it isneeded, to the linear domain. In this way we get an EWMA with aconstant relative truncation error.

Fig. 1. The storage capacity plotted as a function of the precision in the

connections’ state variable shown for three different fixed-point arithmetic

implementations of a BCPNN: one using our proposed design of the EWMA, one

using the naıve design with PFB:s, and one using the naıve design without PFB:s.

These three networks operate in the palimpsest mode and their performance are

compared to a floating-point arithmetic implementation running in the palimpsest

mode and one running in the non-palimpsest mode.

4. Implementation of an attractor network

In this section, we present results on implementing BCPNNattractor networks [3,6,7,15] using the proposed EWMA design.This type of attractor network can operate in two modes: one non-palimpsest mode where the network can be subjected tocatastrophic forgetting, and one palimpsest mode in which thenetwork forgets old patterns in favor of new ones [15]. In itspalimpsest mode the network has a lower storage capacity thanwhen it is run in its non-palimpsest mode (Fig. 1, dotted anddashed lines).

The BCPNN:s were implemented following a previouslypresented design [3] with the only minor difference that theactivities of the units in the networks were asynchronously andinstantaneously updated in a deterministic instead of stochasticmanner. The storage capacity was measured by presenting thenetworks with retrieval cues that resembled the stored patterns to90%, and a pattern was classified as stored if the retrieved patternhad a 100% overlap with the stored pattern. The networks run inthe palimpsest mode (all but one) were trained with 300 patternsand the network run in the non-palimpsest mode was trainedwith 80 patterns. If the latter network was trained with 300patterns, none of these were stored properly because catastrophicforgetting then occurred. The networks implemented in Fig. 1 had100 units partitioned into 10 winner-take-all modules with 10units in each. The palimpsest networks had t ¼ 40 in theirEWMA:s.

Fig. 1 shows the storage capacity of three networks imple-mented with fixed-point arithmetic: one that had our proposedimplementation of the EWMA with PFB:s, one that used the naıveimplementation with PFB:s, and one based on the naıveimplementation but without PFB:s. All of these networks wererun in the palimpsest mode. Fig. 1 shows that networks run withthe naıve implementation requires more bits to represent theEWMA’s state variable to give the same level of performance as

our implementation and this also means that more memory isneeded to run these networks. The peak in storage capacity seenaround 8 and 12 bits precision for the networks using theproposed and naıve implementation without PFB:s, respectively,is due to a match between the networks intrinsic storage capacityand the length of the history of previous inputs stored in theEWMA:s. If the EWMA:s only store information about the last 80inputs to a palimpsest network, this network can in principleachieve the higher storage capacity of a non-palimpsest network.The naıve implementation with PFB:s does not have a peak in itsstorage capacity. This is because unlike the two other implemen-tations, the EWMA:s of the naıve implementation with PFB:s arenot always updated following the presentation of a new pattern.This means that there is a large variance in how long a trace of aninput is retained and hence there is no match between theintrinsic storage capacity and the length of the history of previousinputs. Thus, we do not see a peak in the storage capacity for thenaıve implementation with PFB:s.

In the case of large BCPNN networks run on cluster computers,implementations using the proposed fixed-point arithmetic de-sign of the EWMA has been found to run slightly faster thannetworks implemented with floating-point arithmetic [3].Furthermore, a fixed-point arithmetic implementation of BCPNNusing the proposed design is much faster than the oneimplemented with the naıve design of the EWMA:s: a networkwith 4900 units partitioned into 70 modules with 70 units in eachruns 32 times faster during training and 1.3 times faster duringretrieval of patterns using the proposed instead of the naıve

Page 5: Implementing plastic weights in neural networks using low precision arithmetic

ARTICLE IN PRESS

C. Johansson, A. Lansner / Neurocomputing 72 (2009) 968–972972

design. In both cases the gains in performance is due to twofactors. The first is that fewer operations on average are neededfor updating a connection and the second is that less memory isused for a connection’s state variable, which means that the rateby which state variables can be fetched from memory is increased.Much of the 32-fold speed-up during training in the secondexample comes from the fact that, on average, fewer operationsare needed per connection update. The small speed-up duringretrieval is mainly due to the increased rate by which the statevariables can be fetched from memory.

5. Conclusions

In this letter we have derived a fixed-point arithmeticimplementation of the EWMA that is efficient with respect tomemory and computation, and we demonstrated how it could beused to implement plastic weights in a neural network. TheEWMA is computed in the logarithmic domain and provides aconstant relative error in the linear domain. This makes theimplementation well suited for representing processes that has alarge dynamical range where the difference between the smallestand largest values is an order of magnitude or more, such as decayprocesses, e.g. fading traces of stored memories in palimpsestattractor networks. It can be implemented with low precisionvariables and computations, which makes it well adapted forcompact implementations in digital hardware. Finally, becausethe memory required to store the state variable of the EWMA isreduced, the proposed implementation can also enhance large-scale implementations of neural networks on general-purposeprocessors.

Acknowledgements

This work was partly supported by grants from the SwedishScience Council (Vetenskapsradet, VR-621-2004-3807), theEuropean Union (FACETS project, FP6-2004-IST-FETPI-015879),and the Swedish Foundation for Strategic Research (SSF, viaStockholm Brain Institute).

References

[1] S. Draghici, On the capabilities of neural networks using limited precisionweights, Neural Networks 15 (3) (2002) 395–414.

[2] M. Hoehfeld, S.E. Fahlman, Learning with limited numerical precision usingthe cascade-correlation algorithm, IEEE Trans. Neural Networks 3 (4) (1992)602–611.

[3] C. Johansson, A. Lansner, Towards cortex sized artificial neural systems,Neural Networks 20 (1) (2006) 48–61.

[4] C. Johansson, A. Lansner, Imposing biological constraints onto an abstractneocortical attractor network model, Neural Comput. 19 (7) (2007)1871–1896.

[5] C. Koch, I. Segev, Compartmental models of complex neurons (Chapter 3), in:T.J. Sejnowski, T.A. Poggio (Eds.), Methods in Neural Modeling: from Ions toNetworks, second ed., A Bradford Book, London, England, 1998.

[6] A. Lansner, O. Ekeberg, A one-layer feedback artificial neural network with aBayesian learning rule, Int. J. Neural Syst. 1 (1) (1989) 77–87.

[7] A. Lansner, A. Holst, A higher order Bayesian neural network with spikingunits, Int. J. Neural Syst. 7 (2) (1996) 115–128.

[8] M. Mattia, P.D. Giudice, Efficient event-driven simulation of large networks ofspiking neurons and dynamical synapses, Neural Comput. 12 (2000)2305–2329.

[9] N. Mehrtash, et al., Synaptic plasticity in spiking neural networks (SP2INN): asystem approach, IEEE Trans. Neural Networks 14 (5) (2003) 980–992.

[10] M. Melton, et al., The TInMANN VLSI chip, IEEE Trans. Neural Networks 3 (3)(1992) 375–384.

[11] T.M. Mitchell, Machine Learning, McGraw-Hill Inc., New York, 1997.[12] A.V. Oppenheim, A.S. Willsky, Signals & systems, in: H. Nawab (Ed.), Signal

Processing, second ed., Prentice-Hall International, London, 1997.[13] R. Orre, et al., Bayesian neural networks with confidence estimations applied

to data mining, Comput. Stat. Data Anal. 34 (8) (2000) 473–493.[14] E.T. Rolls, A. Treves, Neural Networks and Brain Function, Oxford University

Press, New York, 1998.[15] A. Sandberg, et al., A Bayesian attractor network with incremental learning,

Network: Comput. Neural Syst. 13 (2) (2002) 179–194.[16] R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction, MIT Press,

Cambridge, MA, 1998.[17] S.-T. Tseng, et al., A study of variable EWMA controller, IEEE Trans.

Semiconductor Manuf. 16 (4) (2003) 633–643.

Christopher Johansson received his Ph.D. degree inComputer Science in 2007 and his M.Sc. in ElectricalEngineering in 2001, both from the Royal Institute ofTechnology, Stockholm, Sweden. His areas of researchcover parallel implementation of neural networks,modelling of cortical circuits, reinforcement learning,and brain inspired computing.

Anders Lansner received his M.Sc. in EngineeringChemistry and Biochemistry in 1974 and his Ph.D. inComputer Science in 1986, both from the RoyalInstitute of Technology (KTH), Stockholm. He wasappointed associate professor and professor in Com-puter Science in 1992 and 1999, respectively.

He is the project manager for the CBN group(Computational Biology and Neurocomputing) at theschool of Computer Science and Communication at KTH.

His research interests range from neural computationand control to biophysically detailed and connectionistnetwork models of specific biological systems and

functions, e.g. cortical associative memory.