attractor neural network modelling of the lifespan retrieval …1466428/...1 kth royal institute of...

IN DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING,SECOND CYCLE, 30 CREDITS

, STOCKHOLM SWEDEN 2020

Attractor Neural Network modelling of the Lifespan Retrieval Curve

PATRÍCIA PEREIRA

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

1

KTH Royal Institute of Technology

School of Electrical Engineering and Computer Science

Master Programme in Systems, Control and Robotics

June 2020

Author: Patrícia Pereira, [email protected]

Supervisors: Pawel Herman, [email protected]

Anders Lansner, [email protected]

Examiner: Erik Fransén, [email protected]

mailto:[email protected]

mailto:[email protected]

2

Abstract

Human capability to recall episodic memories depends on how much time has passed

since the memory was encoded. This dependency is described by a memory retrieval

curve that reflects an interesting phenomenon referred to as a reminiscence bump - a

tendency for older people to recall more memories formed during their young

adulthood than in other periods of life. This phenomenon can be modelled with an

attractor neural network, for example, the firing-rate Bayesian Confidence Propagation

Neural Network (BCPNN) with incremental learning.

In this work, the mechanisms underlying the reminiscence bump in the neural network

model are systematically studied. The effects of synaptic plasticity, network

architecture and other relevant parameters on the characteristics of the reminiscence

bump are systematically investigated.

The most influential factors turn out to be the magnitude of dopamine-linked plasticity

at birth and the time constant of exponential plasticity decay with age that set the

position of the bump. The other parameters mainly influence the general amplitude of

the lifespan retrieval curve.

Furthermore, the recency phenomenon, i.e. the tendency to remember the most recent

memories, can also be parameterized by adding a constant to the exponentially

decaying plasticity function representing the decrease in the level of dopamine

neurotransmitters.

Keywords: reminiscence bump, attractor neural network, Bayesian Confidence

Propagation Neural Network (BCPNN), recency, synaptic plasticity, episodic memory

3

Sammanfattning

Människans förmåga att återkalla episodiska minnen beror på hur lång tid som gått

sedan minnena inkodades. Detta beroende beskrivs av en sk glömskekurva vilken

uppvisar ett intressant fenomen som kallas ”reminiscence bump”. Detta är en tendens

hos äldre att återkalla fler minnen från ungdoms- och tidiga vuxenår än från andra

perioder i livet. Detta fenomen kan modelleras med ett neuralt nätverk, sk attraktornät,

t ex ett icke spikande Bayesian Confidence Propagation Neural Network (BCPNN)

med inkrementell inlärning. I detta arbete studeras systematiskt mekanismerna bakom

”reminiscence bump” med hjälp av denna neuronnätsmodell. Exempelvis belyses

betydelsen av synaptisk plasticitet, nätverksarkitektur och andra relavanta

parameterar för uppkomsten av och karaktären hos detta fenomen.

De mest inflytelserika faktorerna för bumpens position befanns var initial

dopaminberoende plasticitet vid födseln samt tidskonstanten för plasticitetens

avtagande med åldern. De andra parametrarna påverkade huvudsakligen den

generella amplituden hos kurvan för ihågkomst under livet. Dessutom kan den s k

nysseffekten (”recency effect”), dvs tendensen att bäst komma ihåg saker som hänt

nyligen, också parametriseras av en konstant adderad till den annars exponentiellt

avtagande plasticiteten, som kan representera densiteten av dopaminreceptorer.

Nyckelord: ”reminiscence bump”, attraktorneuronnät, Bayesian Confidence

Propagation Neural Network (BCPNN), nysseffekt, synaptisk plasticitet, episodiskt

mine.

4

Acknowledgements

I would like to thank Professors Pawel Herman and Anders Lansner for their

enthusiastic supervision throughout the project.

A warm thanks to my friends and colleagues with whom I shared my academic

journey.

My most grateful thanks to my parents for their love, care and support.

5

To my dear parents.

6

“Thus, our knowledge of the world, including ourselves, is incomplete as to space and

indefinite as to time. This ignorance, implicit in all our brains, is the counterpart of the

abstraction which renders our knowledge useful” - McCulloch and Pitts

7

Contents

1. Introduction ..................................................................................................................... 8

1.1 Research question ....................................................................................................... 9

1.2 Aim and scope ............................................................................................................. 9

1.3 Thesis Outline ............................................................................................................ 10

2. Background ................................................................................................................... 11

2.1. Reminiscence bump .................................................................................................. 11

2.1.1. Psychological and biological hypotheses ............................................................. 12

2.1.2. Different ways of cuing lead to different bumps ................................................... 14

2.2. Neuronal computational models ............................................................................... 15

2.2.1. Abstract Models ...................................................................................................... 15

2.2.2. Detailed Models ....................................................................................................... 16

2.2.3. Attractor neural network memory modelling ........................................................ 18

2.2.4. Other models ........................................................................................................... 19

3. Methods ......................................................................................................................... 24

3.1. Attractor Memory Network Model ............................................................................. 24

3.1.1. Modularity ................................................................................................................ 24

3.1.2. BCPNN learning and network dynamics ............................................................... 25

3.1.3. Meaning of model parameters ................................................................................ 26

3.2. Simulation protocol ................................................................................................... 27

3.3. Analysis and evaluation ............................................................................................ 29

4. Results ........................................................................................................................... 30

4.1. Reminiscence bump .................................................................................................. 31

4.2. Recency ...................................................................................................................... 44

5. Discussion ..................................................................................................................... 48

5.1. Summary of findings.................................................................................................. 48

5.2. Interpretation of the results and their impact ........................................................... 48

5.3. Limitations .................................................................................................................. 49

5.4. Social, ethical and sustainability aspects ................................................................ 49

6. Conclusion and Future Work ....................................................................................... 51

Bibliography ...................................................................................................................... 52

8

Chapter 1

Introduction

The advancement of neuroscience is beneficial to the humankind in many ways. There

are however two main directions that have been tangibly capitalized on in recent times.

The first one is that an improved understanding of neurological and psychological

mechanisms enables the development of better medical treatments and therapies for

neurological illnesses. It can also empower society as is illustrated by the example of

headphones that maximize motor learning by applying a small electric current to the

area of the brain that controls movement1. The other way in which a deepened

understanding of the brain is beneficial is that it is a source of inspiration for algorithms

that have useful applications such as deep learning algorithms used in computer vision

and speech processing. In this direction, it contributes to the development of more

“human-like” and powerful artificial agents.

Within neuroscience, memory plays a key role. Studying memory is important because

there has been an increasing interest in tackling brain diseases of which memory

deficits are common symptoms such as Alzheimer’s disease and other types of

dementia [1]. Memory is also a key aspect of cognition fundamental for intelligent

behavior, namely in learning and decision processes [2]. In another perspective, we

can consider life as a sum of memories important to keep our identity and mental

health.

The focus of this project is on long-term memory, precisely episodic memory. Episodic

memory is a category of long-term memory which concerns events that occurred

throughout one’s life. Important personal experiences belong to this category [3].

Long-term memory concerns information stored in the brain over a long period of time.

It is established through long-term potentiation and depression, by which circuits of

neurons in the brain are strengthened or weakened resulting in the strengthening or

weakening synapses helping shape memory specific ensembles [4]. Long-term

memory is important because it concerns the ability to learn new information and to

recall that information later in time.

One such phenomenon is the reminiscence bump, the tendency for people above

middle-aged to recall more memories from their 10-30 years old than from other

periods of their life, that has consistently been observed in autobiographical memory

research [5].

This phenomenon was observed in 68 experiments between 1988 and 2017 [5].

Although the precise years in which the bump occurs vary according to the experiment,

there is a strong empirical evidence that the maximal proportion of memories comes

from adolescence and young adulthood [5].

1 www.haloneuro.com

9

Due to the importance of this phenomenon, computational models have been built to

study potential mechanisms underlying this phenomenon. The advantage of a

computational model is that it allows us to predict how each parameter corresponding

to a biological mechanism affects the reminiscence bump which otherwise would be

infeasible to test in human experiments.

Among computational attempts at modelling the phenomenon in question are the

Memory Chain Model [6] and the AM-ART model [7]. However, the focus in this work

is on one of the most successful approaches to date in memory modelling, for its direct

correspondence to neuronal circuits and interpretation of biological mechanisms [8],

Bayesian confidence propagation neural network (BCPNN), developed in various

forms at KTH Royal Institute of Technology. BCPNN employs a Hebbian learning rule

[9] derived from the Bayes rule, which allows this recurrent model to function as an

associative attractor memory network [8]. There have also been cases where BCPNN

is used in a feed-forward architecture used for classification [10] and data mining [11].

Also, despite the availability of the BCPNN learning in a more biologically plausible

spiking neural network implementation [10], in this thesis a more abstract rate based

implementation is exploited due to the long-term nature of the memory phenomenon

that is at the center of attention here.

1.1 Research question

We hypothesize that plasticity parameters play a fundamental role in modulating the

aforementioned reminiscence bump. In addition, we test the capability of the model to

account for the improved recall of the most recently encoded memories called a

recency effect [12].

1.2 Aim and scope

The primary objective of this project is to study the effect of synaptic plasticity

parameters governing the incremental BCPNN learning process of a rate based

modular attractor memory network model on the characteristics of episodic memory

recall. In addition, it is aimed to investigate how network size affects the storage of

long-term memories over the modelled lifetime.

It is expected that the network model’s mechanisms can be interpreted in the context

of neurobiological effects. Therefore this study has potential to provide an embryo for

new neurobiological hypotheses helping in understanding long-term effects of episodic

memory recall.

10

In addition, it is intended to examine other phenomena in the memory recall over the

lifetime, e.g. recency effect.

The limitation defining the scope of the project is that the model only considers formed

long-term memory and therefore does not represent the transfer of the memories from

short-term memory into long-term as for example the Memory Chain Model [6] or the

interaction between different areas of the brain such as in the Tracelink model [13].

Another limitation is the use of a firing-rate based network instead of its more realistic

spiking counterpart. Finally, the focus is on the reminiscence bump and recency

phenomena so no other episodic memory recall effects are considered in simulations.

1.3 Thesis Outline

Chapter 2 is an introduction to the reminiscence bump, the psychological and

biological hypotheses for this phenomenon and how different ways of cuing lead to

different bumps and provides an overview of neuronal computational models. Chapter

3 describes the model and explains the meaning of model parameters, describes the

simulation protocol and the analysis and evaluation. Chapter 4 presents the results.

Chapter 5 is a discussion of the results and confrontation with the hypotheses in

Chapter 2. Chapter 6 is the conclusion and a description of future work.

11

Chapter 2

Background

2.1. Reminiscence bump

The reminiscence bump is the tendency for people above middle-aged to recall more

memories from their 10-30 years old than from other periods of their life and has

consistently being observed in autobiographical memory research [5]. The

reminiscence bump from 4 different experimental studies is displayed (Figure 1).

Figure 1: Distribution of autobiographical memories from older adults

as a function of reported age at time of event [14]

12

2.1.1. Psychological and biological hypotheses

There are several psychological and biological hypotheses for the origins of this

phenomenon that are compatible and even support or complement each other.

A cognitive account [14] states that in childhood one is confronted with many novel

events but due to their rapid change they are less useful in later situations. Additionally,

since novel events are more distinct and require more effort to process, memory

organization changes constantly. On the other hand, in periods of stability starting from

young adulthood, events are not that novel so there is less encoding effort and

increased proactive interference2, resulting in poorer recall. Therefore, events from

early adulthood corresponding to the transition period of memory organization from

rapid change to stability are the ones that are more likely to be recalled since they

have strong encoding in a stable memory organization with little proactive interference.

Besides the reminiscence bump, the lifespan retrieval curve exhibits other phenomena

such as childhood amnesia, the inability of adults to recall episodic memories from

their early childhood and recency, the tendency to retrieve recent memories. There

exists work that attempts to remove recency from the experimental results, focusing

on the reminiscence bump [15].

In a more concrete case of voluntary migration, the emigration period can be

considered as a period of novelty in which the migrant is confronted with new realities

and adaptation challenges. It is then followed by a period of stability, in which the

immigrant settled down. In the light of the cognitive account, this would affect the

reminiscence bump period of individuals that experienced migration and the peak

would be expected to correspond to the migration or adaptation period. This is exactly

what is reported in experimental studies [16]. Seniors that experienced migration were

divided in different groups, according to their migration age. As expected, seniors that

migrated during the usual bump period showed a bump corresponding to both the

usual bump period that was also the migration period. More notably, seniors that

migrated after the bump period, showed a bump corresponding to the migration period

instead of the usual bump period, thus reflecting the influence of periods of novelty

and adaptation and subsequent stability on the reminiscence bump. There were no

significant feature differences in their memories neither were they more emotional than

memories from other periods, indicating a pure cognitive adaptation phenomenon.

In the basic-systems model of episodic memory [17], it is claimed that episodic

memory is formed through the interaction of other supportive cognitive systems such

as diverse sensory and action systems, memory systems, and other types of systems

that result in multiple abilities such as search and retrieval, linguistic, emotional and

2 Proactive interference the interference effect of previously encoded memories on the encoding and

retrieval of new memories. An example of proactive interference is the difficulty in remembering someone’s new phone number after having previously learned the old one.

13

narrative capabilities. In this context, episodic memory is a subtype of autobiographical

memory that concerns the salient experiences that occurred throughout one’s life.

This model can be used to support a cognitive abilities account [18] for the reminisce

bump that based on the assumption that the rise to and decline from the reminiscence

bump coincides with the evolution of other cognitive abilities. In accordance with the

basic-systems model, the bump would be a result of the level of functioning of the

other abilities.

To test this theory, experiments assessing verbal and visuospatial memory together

with autobiographical memory retention confirmed the hypothesized link between

these and the latter [18], indicating that several cognitive abilities might have a direct

influence on the reminiscence bump. However, tests addressing processing speed,

memory and intelligence showed a much more rapid ability increase and much slower

decrease that could not alone explain the bump evolution [14]. Nevertheless, this

theory is still noteworthy and is featured throughout the autobiographical lifespan

retrieval literature.

Another account based on genetic fitness [14], in line with Darwin’s theory of evolution,

states that since early adulthood is the period of reproduction an enhanced memory

would serve the purpose of boosting cognitive abilities for selecting the best mate.

Thus, a stable memory ability throughout all lifespan would be traded for an enhanced

memory to support cognitive abilities during the reproduction period.

This explanation provides no direct mechanism to be tested but can rather be viewed

as an underlying explanation for the abovementioned accounts.

There are also hypotheses based on identity formation that result in slightly different

distributions for autobiographical memory and its content across different cultures but

with small differences in the reminiscence bump [19]. According to this account, late

adolescence and early adulthood is the time when a person develops its ideals and

vocations and defines oneself socially. Thus, this is the period from which events have

a great impact and are integrated in one’s view of oneself and life story, having higher

importance in memory organization [14]. This could also motivate the cognitive and

cognitive abilities account and benefit from them. There is a bump related to social

identity, one’s association with cultural and social groups, from tens to twenties and a

bump related to personal identity, one’s formation of life objectives and significant

relationships, form twenties to thirties [20].

Lastly among the psychological hypotheses about the origin of the reminiscence bump

is a hypothesis based on the life script, i.e. key events that one expects to experience

throughout one’s life at specific ages [21] such as completing school, getting a job,

marrying or having a child. These expectations are hypothesized to guide how one

constructs and recalls one’s life story and are highly influenced by one’s culture rather

than being focused on the individual, contrasting to the previous accounts [21].

Notably, the recall of negative or traumatic events does not lead to a bump, which may

be due to the case that they are not expected since they are not present in the life

script, which is composed of positive memories [22].

14

Coherent with these accounts on stronger encoding at the bump period is the

experimental data, in which the bump is connected to the age at the time of encoding

rather than age of time of retrieval [23].

Possible neurobiological hypothesis about mechanisms underlying the reminiscence

bump are the decrease of brain plasticity with aging due to decrease of dopamine

receptors [24] and pruning of synapses with aging [25]. Such mechanisms are the

focus of this modelling work, which is an extension of previous work with a similar aim

[8].

2.1.2. Different ways of cuing lead to different bumps

There are several methods for lifespan retrieval experiments with humans, such as

olfactory cues and different types of word cues. There is the Galton cue-word

technique, that rose from his recall of memories by using objects from the environment

to his creation of lists of cue words to trigger the recall of memories, counting the time

of recall and noting their distribution in the lifespan [26]. The Crovitz and Schiffman

technique attempts to be an improvement of the Galton technique by reducing its bias

and applying this improved technique to several participants [27]

Galton states in his work “Psychometric Experiments” [26] that ideas emerge by

association with an object perceived by the senses. He used to let the mind come up

with ideas that emerged from a certain object, being careful to avoid coming up with

ideas emerging from previous ideas. Afterwards, he would collect his thoughts and

draw conclusions about them. Galton usually walked around 400 meters in Pall Mall3

and scrutinized every object he saw until one or two thoughts arose. He then took note

of them and proceeded onto the next object, never allowing the mind to ramble. He

noticed the great variety of ideas that could emerge by repeating the same walk

several times but also how repetitive they could be. He then created his method which

consisted in writing lists of words on different small pieces of paper and placing them

under a book so that after some days he could read one word without knowing what

the other words were. Then he used a chronograph to count the time from reading a

word to the emergence of an idea. He would come up with one to four ideas per word

cue. This whole process required a very calm and neutral mindset. He would also note

from which period of his life his ideas came from and concluded that half of them were

from the period after leaving college.

In fact, both location and size of the bump can vary according to the cueing method

that is used, with several accounts for that phenomena. From experiments with

different cueing methods such as asking for important memories versus providing a

word cue, it is hypothesized that cueing could be more relevant than encoding in the

bump size and location [28,29]. Word cuing permits any association between the cue

and the memory resulting in an early and smaller bump peak contrasting with

important memory cuing that produces a narrative-based search connected to a

3 Street in London

15

person’s life story and produces a higher and later peak and a second peak in older

years [29].

Olfactory cues result in a bump in the first decade of life [28,29,30]. This can be

modelled in encoding by an accelerated decrease of plasticity of the olfactory cued

memory system, although other accounts for this earlier bump exist.

2.2. Neuronal computational models

In this project, the mechanisms of a neuronal computational model underlying

reminiscence bump parametrization are studied. Thus, in this section an overview of

neuronal computational models is provided.

2.2.1. Abstract Models

The McCulloch and Pitts model from 1943 resulted in the first brain inspired network

which units would correspond to basic brain cells [31] (Figure 2).

Figure 2: A McCulloch and Pitts unit

A unit receives several binary inputs 𝐼𝑘, representing synaptic inputs on dendrites, that

are summed in the soma. The neuron fires if this summed input exceeds a threshold,

resulting in the binary output 𝑦. If one of the inputs is inhibitory, with value 0, the neuron

does not fire. Such units can be used to implement Boolean functions such as OR,

AND and NOT and several units in a network can implement more complex functions

such as division by two. However, it has its limitations namely that the functions to be

implemented need to be hard coded and that it does not allow the implementation of

functions that are not linearly separable, such as the XOR function.

16

The Rosenblatt Perceptron proposed in 1957 [32] overcomes some drawbacks of the

McCulloch and Pitts model (Figure 3).

Figure 3: A Rosenblatt perceptron

The 𝑥𝑠 are the inputs and the 𝑤𝑠 are the weights. The bias, Ɵ, is the negative of the

activation threshold in the McCulloch and Pitts model.

The weights are not binary, enabling flexibility in the weights influence on the output,

and can also be negative. Moreover, the existence of a weight with zero input does

not lead to a complete inhibition and this perceptron can be trained with supervised

learning to perform binary classification.

2.2.2. Detailed Models

There exist more detailed models such as the Integrate and Fire Model [33] proposed

by Louis Lapicque on 1907, that contributes to spiking neural networks (Figure 4). By

stimulating nerve fibers that typically excited the frog’s leg muscle with an electrical

pulse, Lapicque concluded that the nerve membrane is polarizable and can be

compared to an RC circuit, a resistance in parallel to a capacitor.

17

Figure 4: An Integrate and Fire model unit described as an electric circuit

A presynaptic spike 𝛿(𝑡 − 𝑡𝑗(𝑓)) is low pass filtered at the synapse and generates an

input current pulse 𝛼(𝑡 − 𝑡𝑗(𝑓)). A current 𝐼(𝑡) charges the circuit with resistance 𝑅 and

capacitance 𝐶. The voltage 𝑢(𝑡) across the capacitance is compared to a threshold ϑ

and the neuron fires if the threshold is exceeded generating an output pulse 𝛿(𝑡 −

𝑡𝑖(𝑓)).

The RC circuit works as follows:

𝐼(𝑡) =𝑢(𝑡)

𝑅+ 𝐶

𝑑𝑢(𝑡)

𝑑𝑡

And by rewriting it the membrane time constant RC is yielded:

𝑅𝐶𝑑𝑢(𝑡)

𝑑𝑡= −𝑢(𝑡) + 𝑅𝐼(𝑡)

The Hodgkin-Huxley model, proposed in 1952, described in four ordinary differential

equations that included the ionic mechanisms of sodium and potassium the alteration

of the membrane potential in the squid giant axons [34]. This work was awarded the

Nobel prize in 1963. There have been several improvements developed for this model,

for instance, introducing more ionic mechanisms discovered from experimental data.

More simplified versions of this model have been proposed, such as the FitzHugh-

Nagumo model from 1961, with only two equations [35].

18

2.2.3. Attractor neural network memory modelling

Concerning memory modelling, Hopfield Networks [36] (Figure 5) have been

suggested as models of biological memory although they are not used in many

applications today since more powerful types of networks for e.g. classification exists.

However, this type of recurrent network models have been used to model cortical

associative memory [37].

After being trained with a set of patterns, a Hopfield network can retrieve one of these

patterns after being fed a distorted version of it. Its nodes can take for example values

of 0 or 1 and there are links, symmetric connections between these nodes. They result

from a Hebbian learning rule4, i.e. the weights between neurons that are active at the

same time are strengthened during training [9].

Figure 5: A Hopfield Network

A computational study using a variant of Hopfield’s network studied how attractor neural network models can qualitatively account for basic features of memory degradation in diffuse cerebral atrophy5 and be used to predict manifestations of Alzheimer’s disease based on neurological conditions [38]. Purely correlation-based learning rules such as in the Hopfield network, lead to

catastrophic forgetting, implying the loss of all memories when their number exceed

the network capacity. To overcome that, the learning rule must exhibit palimpsest

properties, or a gradual forgetting of older memories when learning new ones. Hopfield

suggested “learning within bounds” [36], in which connection weights are bounded.

This comes with a decrease of the network capacity from of 0.137N to 0.05N, i.e. the

palimpsest property is traded against long-term capacity.

4 Hebb stated that “neurons that fire together, wire together” 5 Loss of neurons and connections between them

19

Another way of avoiding catastrophic forgetting is presented in a firing-rate Bayesian

confidence propagation neural network (BCPNN) attractor neural network with

incremental learning developed by the Lansner group at KTH [8][39][40], which is used

in this project and is explained in detail in the next chapter.

2.2.4. Other models

A computational theory of hippocampal function [41] makes use of a connectionist6

model that depicts a stimulus representation over many elements as in figure 6:

Figure 6: A generic connectionist network for associative learning [41]

The input layer is activated by the stimulus inputs. A function of the weighted sums of

input activations activate the internal later that forms a new representation of the input.

The output node is also a function of weighted sums of middle layer node activations

and output layer activations are interpreted as the network’s response. In this model

learning about stimuli is associating their representations with appropriate outputs.

Figure 7 shows the complete cortico-hippocampal connectionist model:

6 Connectionism explains mental phenomena using artificial neural networks in which mental phenomena are described by interconnected networks of simple units and learning corresponds to modifying connections strengths based on experience

20

Figure 7: Cortico-hippocampal model [41]

The hippocampal network on the right is a predictive autoencoder that learns to recode

stimulus information in the internal layer. The network on the left represents learning

in the cerebral cortex and long-term memory storage. A more complete version of the

model can have several such cortical networks modulated by a hippocampal network

(or networks).

This theory makes predictions regarding the effects of hippocampal lesions.

The Tracelink model [13] is a connectionist model composed of three systems: a trace

system, a link system and a modulatory system as depicted in Figure 8:

Figure 8: The Tracelink Model [5]

The trace system is represented by the circles and connections on the plane and represents the neocortical basis for memories. The link system is represented by the

21

six circles in the rectangle and connections from these circles to the circles on the plane and includes the hippocampus and certain other structures. Finally, the modulatory system, ∆W, includes certain basal forebrain nuclei and several areas that have a more controlling function. An encoding of a memory is depicted in Figure 9:

Figure 9: Encoding of a new memory [13] In the first stage (A), trace elements are activated by a new memory. In the second stage (B), link elements are activated and relevant trace-link connections are enhanced. The modulatory system is activated. In the third stage (C) weak trace-trace connections are forming and the modulatory system is weakly activated. In the fourth stage (D) strong trace-trace connections have been formed and trace-link connections have faded away. The modulatory system is deactivated. This model can account for many characteristics of amnesia by deactivating the link system during learning and produce normal forgetting curves. It also provides an explanation for the advantages of learning under arousal for long-term recall.

22

The Memory Chain Model [6] is composed of a cascade of memory stores as depicted in Figure 10:

F

Figure 10: A – Memory systems at different time scales, B - Schematic of the Memory Chain Model [6]

When a new memory is encoded, a certain number of representations are formed in the first memory store. With time, this number of representations declines and some are transferred to a subsequent store. Each store has its own decline rate and the stores are organized in order of decreasing decline rate, representing the consolidation of short-term memories into long-term. The strength of a memory is proportional to the number of representations it has. This model can account for a range of amnesia data namely temporal gradients7 in several animals and also datasets from human patients with several neurological diseases. There was an attempt to replicate the reminiscence bump using this model considering the differentiation of memory distribution into two separate functions, a decline function and an encoding-sampling function [42].

7 Phenomenon characteristic of retrograde amnesia which consists in greater loss of memory for occurrences from the recent past than for occurrences from long ago

https://psychologydictionary.org/retrograde-amnesia/

23

A more recent work is the Autobiographical Memory-Adaptive Resonance Theory (AM-ART) [7], depicted in figure 11:

Figure 11: AM-ART model [7]

AM-ART is a three-layer neural network. The event-specific knowledge is presented to the bottom layer F1 to encode life events in the middle layer F2 and a sequence of related events in F2 are encoded into an episode in layer F3.

Input channel 𝐹11−2 receives inputs of time and location from the entorhinal cortex,

𝐹13−4 receives input of people and activity from the fusiform gyrus and 𝐹1

5−6 receives inputs of emotion and imagery from the amygdala, as depicted by the arrows in the bottom. These inputs constitute the basis of events as represented by the connections from the bottom to the middle layer. The episodic pattern 𝑡𝑠 is formed in the events layer and is connected to the episodes layer in the hippocampus. There is a flow of memory search and readout throughout all layers as depicted by the grey arrows. This model was successfully used to model the lifespan retrieval curve originating a curve with the reminiscence bump and also childhood amnesia and recency.

24

Chapter 3

Methods

3.1. Attractor Memory Network Model

In this subsection the model used in this project is represented and explained in more

detail.

3.1.1. Modularity

The model has a specific modularity. In this network, a unit 𝜋𝑖𝑖′ (Eq. 2) corresponds to

activity in a minicolumn, which is a local group of neurons that can be considered an

elementary unit of the cortex. Minicolumns are then organized in groups, the

hypercolumns. While a hypercolumn represents a feature of a memory, its

corresponding minicolumns represent the values that the feature can take as it can be

seen in Figure 12 which provides an example of encoding of an object represented by

two features:

Figure 12: A modular attractor memory network with BCPNN learning

The hypercolumn on the left (bigger circle) represents orientation of a seen object and

each minicolumn (smaller circles) represents an angle of 0, 30, 60 or 90 degrees. The

hypercolumn on the right represents color and each of its minicolumns represents a

different color. The activity within each hypercolumn is normalized. Minicolumns

belonging to different hypercolumns have connections described by weights as it is

depicted in the figure for the connections of the 30 and 90 degrees’ minicolumns.

The network used in this project is an attractor network with BCPNN plasticity [8]. It

has 144 units, having 12 hypercolumns with 12 minicolumns each, to be able to store

the desired number of patterns.

25

3.1.2. BCPNN learning and network dynamics

The differential equations governing unit behavior in the model are:

𝜏𝑐

𝑑ℎ𝑖𝑖′(𝑡)

𝑑𝑡= 𝛽𝑖𝑖′(𝑡) + ∑ log (∑ 𝑤𝑖𝑖′𝑗𝑗′(𝑡)𝜋𝑗𝑗′

𝑀𝑖𝑗′ (𝑡)) − ℎ𝑖𝑖′(𝑡)𝑁

𝑗 (1)

𝜋𝑖𝑖′(𝑡) =𝑒

ℎ𝑖𝑖′

∑ 𝑒ℎ𝑖𝑗

𝑗

(2)

𝑑Λ𝑖𝑖′(𝑡)

𝑑𝑡= 𝛼([(1 − 𝜆0)𝜋𝑖�̂�(𝑡) + 𝜆0] − Λ𝑖𝑖′(𝑡)) (3)

𝑑Λ𝑖𝑖′𝑗𝑗′(𝑡)

𝑑𝑡= 𝛼 ([(1 − 𝜆0

2)𝜋𝑖𝑖′̂(𝑡)𝜋𝑗𝑗′̂(𝑡) + 𝜆02] − Λ𝑖𝑖′𝑗𝑗′(𝑡)) (4)

β𝑖𝑖′(𝑡) = 𝑙𝑜𝑔(Λ𝑖𝑖′(𝑡)) (5)

𝑤𝑖𝑖′𝑗𝑗′(𝑡) =Λ𝑖𝑖′𝑗𝑗′(𝑡)

Λ𝑖𝑖′(𝑡)Λ𝑗𝑗′(𝑡) (6)

A set of active units, one per hypercolumn, indexed by 𝑖, represents an activated

memory and its level of activation, indexed by 𝑖′ is a confidence estimate. The unit

support is ℎ𝑖𝑖′ and evolves according to Eq. 1. A background activity 𝜆0 is introduced

to avoid logarithms of zero in the calculations (Eq. 3 and 4).

The encoding of a memory consists in the modification of the network’s weights, 𝑤,

and biases, 𝛽, (Eq. 5 and 6) so that the configuration of unit activations corresponding

to that memory becomes an attractor state of the network. While connection strengths

between minicolumns belonging to different hypercolumns are represented by

weights, minicolumns belonging to the same hypercolumn are related to each other

via lateral inhibition, implemented with softmax, in each hypercolumn as in Eq. 2.

There is a learning rate parameter in the incremental model, 𝛼 in Eq. 3 and 4, which

can control the strength of encoding of each memory or how much it modifies the

network’s weights and biases.

Before the introduction of this incremental approach, a previous approach would

encode several memories at once [8] by estimating the weights and biases’

probabilities by counting units’ co-activations, which would result in catastrophic

forgetting.

The incremental approach differentiates from the counter approach in the way that it

estimates the weights and biases’ (Eq. 6 and 5, respectively) with exponential moving

averages Λ𝑖𝑖′ and Λ𝑖𝑖′𝑗𝑗′ of activity and co-activity of the estimated unit activations 𝜋𝑖𝑖′̂

26

(Eq. 3 and 4). The advantage is that the learning rule can be applied online and the

network exhibits palimpsest properties. It is therefore possible to mimic learning and

gradual forgetting throughout time, which suits the objective of this project.

Plasticity is governed by the whole set of differential equations. According to the first

differential equation, the support of a unit ℎ𝑖𝑖′ is affected by the weighted contributions

of presynaptic units ∑ log (∑ 𝑤𝑖𝑖′𝑗𝑗′(𝑡)𝜋𝑗𝑗′𝑀𝑖𝑗′ (𝑡))𝑁

𝑗 added with the unit bias 𝛽𝑖𝑖′. This

value is then affected by the normalization in Eq. 2 resulting in the value for the unit

activation 𝜋𝑖𝑖′ , which is used in the calculations of the exponential moving averages

in Eq. 3 and 4 that are then used to update the biases and weights. These are then

reused once again in the first differential equation. This cycle is continuously repeating

during learning. During recall the same happens but with the learning rate 𝛼 set to

zero, preventing the weights and biases from changing their values.

3.1.3. Meaning of model parameters

Plasticity Parameters

To model the degree of synaptic plasticity, there is a learning rate parameter in the

incremental model, 𝛼 in Eq. 3 and 4 [8], which can control the strength of encoding of

each memory or how much it modifies the network’s weights and biases. By

decreasing it over time during learning it can be used to represent decrease of

dopamine receptors combined with other aging phenomena, allowing the modelling of

a reminiscence bump [8].

This way, 𝛼 = 𝛼0𝑒−

𝑡

𝜏𝑠 + 𝛼𝑐𝑠𝑡, with 𝜏𝑠 being the time constant of the age-dependent

plasticity decay, that mediates this decay of dopamine receptors. Both 𝛼0 and 𝜏𝑠 are

parameters that can be varied to investigate its effect on the reminiscence bump.

While performing experiments with this model, I observed that if the learning rate

decay stopped at a certain age it would be possible to model recency, the tendency to

retrieve recent memories. This can be achieved by decomposing the evolution of the

learning rate in a constant function 𝛼𝑐𝑠𝑡 added to the latter decreasing function, a

parameter that is also important to investigate.

Neural parameters

The membrane time constant, 𝜏𝑐, can represent the RC time constant as in the

Integrate and Fire Model [20]. It thus can represent the time for the activation value to

reach about 63% of its target value, the charging of the capacitor through the

resistance, or reduce its value to about 37% in the absence of activity, being the level

of activation the voltage of the capacitor.

Other model parameters

A background noise activity 𝜆0 ≪ 1 was introduced to avoid logarithms of zero in the

calculations resulting in all minicolumns having a minimal activity.

27

Regarding lateral inhibition, the softmax factor 𝛾 that can be present in Eq. 2

multiplying ℎ in the nominator and denominator represents how concentrated the

estimate is around the larger input values.

Other model parameters that can be varied are the degree of memory cue perturbation

or the number of hypercolumn swaps in the perturbed pattern and the successful

recognition overlap threshold. The number of hypercolumn swaps would represent the

similarity of the cue to the target memory and the successful recognition overlap

threshold would represent the required vividness of the recalled memory.

Network parameters

The network size, more concretely the number 𝐻 in a network with 𝑁=𝐻 × 𝑀 units

organized in 𝐻 hypercolumns with 𝑀=𝐻 minicolumns each, represents the number of

minicolumns available to store each memory. Based on experience, it is often good to

have 𝑀=𝐻.

3.2. Simulation protocol

Training consists in sequentially clamping the activation of each pattern for a certain

duration, while letting the networks’ weights and biases evolve. During recall the

network state evolves after being presented with a perturbed version of a pattern, while

keeping the networks’ weights and biases fixed. Recall overlap is the overlap between

the actual pattern and the final state reached.

In cued recall, the perturbed version of the pattern consists in having a fixed number

of randomly chosen hypercolumns with their activated minicolumn randomly swapped.

For each pattern, the network is presented several times with a different perturbed

version of that pattern and the overlaps are calculated. The measure of interest is the

ratio of retrieval (Eq. 8).

To investigate the effect of and sensitivity to a certain parameter on the reminiscence

bump characteristics, a greedy approach is followed. All parameters are kept constant

in the simulation except the one that is being investigated.

The values of the parameter that is being subject to examination are chosen as follows:

If it is a parameter which variation results in bump horizontal translation (such as initial

plasticity and time constant of the age-dependent plasticity decay), its value is varied

from values that result from a bump significantly shifted to the left to a bump

significantly shifted to the right. The effect of these parameters is systematically

measured by fitting a function in which the middle age of the bump is dependent on

parameter value variations. If it is a parameter which variation results in general

increase of ratio of retrieval (all the other parameters) its value is varied from values

that result from very low to very high ratios of retrieval. The effect of these parameters

is systematically measured by fitting a function in which the total ratio of retrieval over

the lifespan, the area under the ratio of retrieval curve, is dependent on parameter

value variations.

28

The following values for the initial combination of parameters have been chosen since

they yield a realistic configuration of the bump, a graph similar to the experimental

lifespan retrieval curves from studies with humans:

Parameter Value

Network size 144 units in a network arranged in a 12-by-12 grid

Number of presented patterns 70

Membrane time constant, 𝑡𝑐 1 Softmax gain, 𝛾 1

Euler step, 𝑑𝑡 0.01 Initial plasticity, 𝛼0 0.3

Time constant of plasticity decay, 𝜏𝑠 10

Background activity level, 𝜆0 0.01

Learning time 1

Clamping recall time 0.1

Recall time 2

Number of swaps, 𝑠 6

Number of generated networks 100 Number of perturbed patterns presented per age in

each network 100

Overlap threshold, 𝑜𝑡 11/12 Constant plasticity, 𝛼c -

Table 2: Parameters for the initial configuration of simulations

Since the ratio of retrieval follows a Bernoulli distribution of the random variable which

takes the value 1 if the threshold for successful recall overlap is exceeded or 0

otherwise, the variance of the ratio of retrieval is its value multiplied by one minus its

value. The standard error of the mean is the square root of the variance divided by the

squared root of the sample size.

29

3.3. Analysis and evaluation

The network is presented with 70 patterns, where each pattern represents the salient

episodic memories of one year of life. The overlap 𝑜 between two patterns, 𝑝1 and 𝑝2,

is defined as

𝑜 =𝑝1.𝑝2

‖𝑝1‖‖𝑝2‖ (7)

The overlaps between pairs of patterns are mostly between 0 and 0.25. Sometimes

there is a higher overlap, but it does not constitute a problem given that the usual

overlap threshold defined for a successful recall is 0,916 (11/12) and the overlap

between a pattern and its perturbed version, the same pattern but with a certain

number of randomly chosen hypercolumns with their activated minicolumn randomly

swapped, is also high enough. These overlap values result from the total number of

patterns presented and the network size. The ratio of retrieval 𝑟 of a pattern is defined

as

𝑟 =𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑢𝑐𝑒𝑠𝑠𝑓𝑢𝑙 𝑟𝑒𝑐𝑎𝑙𝑙𝑠

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑐𝑎𝑙𝑙 𝑎𝑡𝑡𝑒𝑚𝑝𝑠 (8)

With this network size, almost all patterns in the incremental network with a low

constant learning rate of 0.01 have a ratio of retrieval of 1 and never below 0.99, which

means that the network’s capacity is not exceeded, so there is no incorrect storage of

patterns which would affect the validity of the experiments. This ratio of retrieval,

results from presenting the network with a perturbed pattern with 3 swaps. This means

that the ratio of retrieval can only increase when presenting the network with a

perturbed pattern yields an overlap increase from 0.75, at presentation time, to higher

than 0.916 after relaxation, signaling converge towards the original pattern. This

choice of overlap values is based on the experiments of Sandberg et al. (2002), Figure

4 [39], in which the ratio of retrieval increases if the overlaps increase from 0.8 (2

random hypercolumn swaps in a 100 unit network arranged in 10 hypercolumns with

10 minicolums each) to 0.85.

30

Chapter 4

Results

The primary aim of this work was to study the mechanisms underlying the modulation

of reminiscence bump obtained in long-term simulations of episodic memory retrieval

using a modular attractor network with incremental BCPNN learning [8]. To study the

effect of each parameter in the reminiscence bump the parameter is varied in a greedy

way and a curve is fitted relating the parameter variation and a quantitative measure

of its effect on the reminiscence bump.

All the parameters explained in section 2.4 are analyzed in this way. All simulations were performed in the same conditions as those used in an example demonstrated in Figure 11 and have variance of similar magnitudes. The experimental paradigm, described in more detail in section 3.2, consisted in training the network with 70 patterns, one per a simulated year within the network’s lifetime and then perform recall. In training, memories are sequentially presented to the network for 100 time steps or Euler steps. Recalling consisted in clamping the network with a perturbed pattern for 10 time steps and letting the network state evolve for 200 time steps. Recalling was performed sequentially for all the patterns after training. For each pattern, 100 recall attempts were made (trials with randomly perturbed cues) and the ratio of retrieval (Eq. 8) is shown in the graphs in this section. To evaluate the effect of a parameter on the reminiscence bump, all parameters are kept constant except for the one that is being investigated.

31

4.1. Reminiscence bump

In the first set of simulations we examined the individual effects of selected parameters

on the characteristics of the reminiscence bump.

In Figure 13 an example of cued recall given the parameters in Table 1 is shown.

Figure 13: Recall with initial configuration of parameters and standard error of the

mean

32

4.1.1. Network size

The network size, more concretely the number of hypercolumns, 𝐻, and minicolumns

per hypercolumn, 𝑀, determines the memory capacity. In this work we decided to

maintain the following relationship: 𝐻=𝑀 The larger the network, the lower the

crosstalk between the stored memory patterns and the higher the ratio of retrieval is

(Figure 14).

Figure 14: The memory retrieval performance over the network’s lifetime depending

on the network size (𝐻=𝑀={9, 10, 11, 13, 15, 17, 20})

33

In order to systematically analyze the effects of the network size in the lifespan retrieval

curve, a more detailed analysis of the relation between network size and the area

under the ratio of retrieval curve was made, which demonstrated the sigmoidal

relationship between the aforementioned entities (Figure 15).

Figure 15: Sigmoidal fit made to the simulation data explaining the dependence of

the area under the ratio of retrieval curve on the network size

The area under the ratio of retrieval curve is the sum of the ratio of retrieval for all

ages. Therefore, it should be interpreted as the total recalling during and individual’s

lifetime. It is used to account for the complete recall capability of the individual. The

type of equation to fit, sigmoidal, is chosen as the one that visually represents better

the data.

34

4.1.2. Initial plasticity level

The initial plasticity level, 𝛼0, represents the initial level of dopamine in the brain. If it

is low, the plasticity is very low at the older age of the network, resulting in recalling

only memories from early years. If it is too high only memories from the recent years

are recalled since there is a high level of plasticity even after its decay, which makes

the network adapt to the latest memories that “overwrite” the early memories. This

results in a shift of the bump mediated by the initial plasticity level (Figure 16).

Figure 8: Variation of


on the initial plasticity level (𝛼0={0.01, 0.1, 0.3, 1, 2, 10})

35

In order to systematically analyze the effects of the initial plasticity level in the lifespan

retrieval curve, a more detailed analysis of the relation between initial plasticity level

and the middle age of the bump was made, which demonstrated the exponential

relationship between the abovementioned entities (Figure 17).

Figure 17: Exponential fit made to the simulation data explaining the dependence of

the middle age of the bump on the initial plasticity level

36

4.1.3. Time constant of the age-dependent plasticity decay

The time constant of the age-dependent plasticity decay, 𝜏𝑠, mediates the time it takes

for the initial plasticity level to decrease. The higher it gets, the more the bump shifts

to the right. If it is too high, plasticity is high throughout lifespan and consequently the

more recent memories are retrieved, as it can be seen in Figure 18. It has the same

effect as varying the initial plasticity parameter (compare with Figure 16).

Interestingly, olfactory cues result in a bump in the first decade of life [30,31,32]. This

can be modelled in encoding by an accelerated decrease of plasticity of the olfactory

cued memory system by using a smaller value for 𝜏𝑠.


on the time constant of the age-dependent plasticity decay (𝜏𝑠={2, 8, 10, 15, 20, 50})

37

In order to systematically analyze the effects of the time constant of the age-dependent

plasticity decay in the lifespan retrieval curve, a more detailed analysis of the relation

between time constant of the age-dependent plasticity decay and the middle age of

the bump was made, which demonstrated the sigmoidal relationship between the

abovementioned entities (Figure 19).


the middle age of the bump on the time constant of the age-dependent plasticity

decay

38

4.1.4. Background activity level

The background noise activity 𝜆0 ≪ 1 was introduced to avoid logarithms of zero in the

calculations resulting in all the minicolumns having a minimal activity. If there was no

background activity the weights and biases would be symmetric and memories would

not be forgotten since weights grow exponentially (Eq 6) and biases decrease

exponentially (Eq 5) and their overall sum would be the same so the lower this activity,

the higher the recall. For very low values of this activity the recall decreases with the

decrease of the background activity perhaps because the background activity is not

sufficient and perturbs calculations (Figure 20).


on the background activity level ( 𝜆0={0.15, 0.12, 0.11, 0.1, 0.01, 0.001, 1e-5, 1e-10,

1e-25})

39

In order to systematically analyze the effects of the background activity level in the

lifespan retrieval curve, a more detailed analysis of the relation between background

activity level and the area under the ratio of retrieval curve was made, which

demonstrated the linear relationship between the abovementioned entities (Figure 21).

Figure 21: Linear fit made to the simulation data explaining the dependence of the

area under the ratio of retrieval curve on the background activity level

40

4.1.5. Degree of memory cue perturbation

The degree of memory cue perturbation (noisy pattern) or number of binary swaps, 𝑠,

represents the similarity between the cue and the target memory, as explained in the

subsection Meaning of Model Parameters in Methods. Thus, the higher the number of

swaps, the lower the ratio of retrieval. This is because the perturbed pattern leads the

network state to another attractor that does not correspond to the target one (Figure

22).


on the degree of memory cue perturbation (𝑠={1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12})

41

In order to systematically analyze the effects of degree of memory cue perturbation in

the lifespan retrieval curve, a more detailed analysis of the relation between degree of

memory cue perturbation and the area under the ratio of retrieval curve was made,

which demonstrated the sigmoidal relationship between the abovementioned entities

(Figure 23).


the area under the ratio of retrieval curve on the degree of memory cue perturbation

42

4.1.6. Overlap threshold

The overlap threshold determines the overlap needed for successful recall,

representing the required vividness of the recalled memory. It is noticeable that for a

threshold of 5/12 or larger, the ratio of retrieval is the same throughout lifespan (Figure

24). Since the overlap between the 70 original patterns in a 12-by-12 network patterns

is usually between 0 and 3/12 the high ratio of retrieval for thresholds between these

values is due to this overlap values. Each value selected for the threshold is an

addition of 1/12 to the previous value because the network has 12 hypercolumns and

12 minicolumns.


on the overlap threshold (𝑜𝑡={1/12, 2/12, 3/12, 4/12, 5/12, 6/12, 7/12, 8/12, 9/12,

10/12, 11/12, 12/12})

43

In order to systematically analyze the effects of overlap threshold in the lifespan

retrieval curve, a more detailed analysis of the relation between overlap threshold and

the area under the ratio of retrieval curve was made, which demonstrated the

sigmoidal relationship between the aforementioned entities (Figure 25).


the area under the ratio of retrieval curve on the overlap threshold

44

4.2. Recency

In the second set of experiments we focused on another aspect of the memory retrieval

curve over the model’s lifetime, which reflects the capability to recall recently

memorized pattens, referred to as the recency phenomenon depicted in the rise in

recall in the years above 50 in Figure 1.

I observed that if the learning rate decay stopped at a certain age it would be possible

to model recency. It turns out that by decomposing the evolution of the learning rate

as a constant plus a decreasing function, recency is achieved (Figure 26). Childhood

amnesia, the inability of adults to recall episodic memories from their early childhood,

is also observed.

The modelling of recency is the secondary contribution of this work.

The values of the parameters resulting in the recency graph (Figure 26) are the values

of Table 2 with the following changes:

• initial plasticity, 𝛼0 = 0.25

• constant plasticity, 𝛼𝑐𝑠𝑡 = 0.015

• time constant of plasticity decay, 𝑡𝑠 = 8

• number of swaps, 𝑠 = 8

This changes were made to better fit the graphs from experimental studies with

humans.

45

Figure 26: Recency with the standard error of the mean

The same experiment without the constant plasticity parameter was performed to allow

the comparison of the two graphs (Figure 27). It can be observed that the introduction

of the constant plasticity leads to a decrease in the recall of early memories, to the

desired recency effect and to a shift of the bump to the right.

Figure 27: Experiment from figure 26 without the constant plasticity

46

4.2.1. Constant Plasticity

The constant plasticity is a fixed component of plasticity level that sets the lower limit

for the decay of plasticity throughout lifetime. Thus, increasing constant plasticity

results in a bump more shifted to the right and a higher recency effect (Figure 28). If

the fixed plasticity is too high a bump forms in older ages due to the high plasticity at

that age that prevails over the decreasing plasticity function.


on the constant plasticity (𝛼c={0, 0.01, 0.015, 0.02, 0.025, 0.1})

47

4.2.2. Time constant of the age-dependent plasticity decay

The time constant of the age-dependent plasticity decay, 𝜏𝑠, mediates the time it takes

for the initial plasticity level to decrease. Thus, increasing the time constant of the age-

dependent plasticity decay results in a bump more shifted to the right and a higher

recency effect. If the time constant of plasticity decay is too high a bump forms in older

ages due to the high plasticity at that age that prevails over the fact that plasticity is

decreasing throughout time (Figure 29).


on the time constant of the age-dependent plasticity decay (𝜏𝑠={4, 8, 10, 15, 20, 50})

48

Chapter 5

Discussion

5.1. Summary of findings

The parameters that showed the most substantial effect in the bump were the initial

plasticity and time constant of the age-dependent plasticity decay because these are

the ones that set the position of the bump, namely the age at which the retrieval curve

has higher magnitude and its peak. The constant component of the plasticity value

throughout lifetime enables to model recency and also has a substantial effect on the

position of the bump and shape of the lifespan retrieval curve, when added. By tuning

this parameter a curve with a recency tail is achieved and a bump in later years can

also be achieved. The other parameters have a lower relevance since they mainly only

influence the magnitude of the retrieval curve.

5.2. Interpretation of the results and their impact

All the psychological hypotheses presented in the beginning of this work are

compatible with the neurobiological hypothesis that the mechanisms underlying the

reminiscence bump are

• the decrease of brain plasticity with aging due to dropping levels of dopamine

receptors and

• the pruning of synapses with aging

These are represented in this model by the decaying plasticity throughout time.

Dopamine D1 activation influences synaptic plasticity [43]. It can provoke neuronal

excitation or inhibition, resulting in synaptic potentiation or depression, an increase or

decrease in the efficacy of the synapses, or “connections” between neurons. It is

known that D1 decreases with aging [44].

By tuning the most important parameters, initial plasticity level, time constant of

plasticity decay, constant plasticity and some other parameters, a curve with recency

and childhood amnesia is produced. There was no need for a cascade of systems or

different encoding and forgetting functions such as in the attempt to replicate the

reminiscence bump with the Memory Chain Model [6]. The curve is similar to the curve

generated by the AM-ART model [7].

The parametrization of the curve with recency is still compatible with the

neurobiological hypothesis of decreasing dopamine receptors and pruning of

synapses with aging. The parametrization using the constant plasticity parameter

49

suggests a biologically motivated assumption that the dopamine decay throughout

lifetime has a lower limit.

5.3. Limitations

Although all results seem to be realistic using this approach that considers only long-

term memory, the interaction between different brain areas was not represented as in

the Memory Chain Model [6] and Tracelink model [13]. This could have been

performed by connecting several networks representing the different areas that deal

with memory and making use of synaptic adaptation, providing more realism to the

model although since only long-term memory is considered this approach does not

seem necessary.

Furthermore, if we had a measure of how strongly encoded a pattern is, such as a

sensitivity index [45], we could replicate experimental forgetting curves, i.e. how

strongly a pattern is encoded along time for each age. We could tune the model to

have the same forgetting rates as the experimental data and thus be more realistic.

By doing this, the values obtained would have a more relevant meaning and the

analysis could be quantitatively more realistic.

It should be made clear that the values used in the parametrization are less relevant.

The qualitative relations are expected however to be interpretable. Thus, this approach

allowed us to understand the origin of translation as well as decrease and increase of

the bump amplitude, but the precise values play little role.

A lesson learned is that since there are different models and ways in which a

phenomenon can be parametrized one has to choose the model based on the

assumptions that one is willing to make and level of realism one wants to achieve.

5.4. Social, ethical and sustainability aspects

5.4.1. Ethical aspects

This project makes use of a computational approach to study the brain. It is an

approach that helps avoiding excessive experiments with humans and animals. There

is no risk of complications or damages in the brain resulting from chemical or invasive

techniques, e.g. the manipulation of dopamine levels in the brain or stimulating

neurons. This is a safe approach and yet allowed us to formulate assumptions about

the lower limit for the decay of dopamine throughout lifetime.

This work predicts the effects that changing the level of dopamine has in the

reminiscence bump and therefore may suggest a pharmacological intervention to yield

these effects in real life. Here, we can raise an ethical question of whether using

pharmacological control over cognitive skills is a better option than natural training

methods such as mental exercises and hard work.

50

Another ethical question is why we should manipulate the natural evolution of memory

skills since its declining is a natural process of aging. So, boundaries should be traced

to determine when we should apply the pharmacological approach and to whom even

if the results seem promising.

5.4.2. Social aspects

The study of the brain has direct impact in the neurology community because the more

it is known of its functioning the better is the diagnosis and treatment of neurological

diseases. This improvement leads to an increase in the well-being of the patients and

their interpersonal relationships can be kept. This constitutes a positive social impact,

preventing isolation and marginalization of the old people and other patients with

neurological and psychiatric disorders and slowing down the degeneration process

caused by these diseases.

5.4.3. Sustainability aspects

Promoting healthy lives and well-being at all ages is a sustainable development goal

that should be taken in account in a more human society. As it was mentioned

throughout this section, studying memory can reduce the impairments caused by

diseases that affect brain function which have more incidence on the elderly and this

is important because life expectancy is continuously increasing.

From a technological and economic perspectives, the algorithmic implementation of

brain function contributes to the development of better AI technologies taking

advantage of how the brain works to improve their performance. This can increase

productivity in several applications resulting in benefits for the economy and society.

51

Chapter 6

Conclusion and Future Work

In this project, the human lifespan retrieval curve was modelled with an incremental

attractor network model and the effect of several parameters of the model were

analyzed in a systematic way. The objective was to study the mechanisms that

modulate bump characteristics, i.e. position and magnitude, in this firing-rate attractor

neural network model with BCPNN plasticity [8].

The parameters that showed the most significant effect on the bump characteristics

were the initial plasticity and time constant of the age-dependent plasticity decay that

set the position of the bump. The constant component of plasticity value also

demonstrated a significant impact on the position of the bump and shape of the

lifespan retrieval curve, when added. The network size has to be large enough for the

storage of all the patterns and the magnitude of the retrieval curve increases with the

network size. The other parameters mainly influence the magnitude of the retrieval

curve.

Despite the model’s simplicity and high level of abstraction it has demonstrated

considerable potential to simulate the human lifespan retrieval curve phenomena. This

firing-rate based attractor neural network with BCPNN plasticity [8] provides insights

into several mechanisms underlying reminiscence bump characteristics and even

recency and childhood amnesia so there does not seem to be a motivated need for

more complex or spiking models to replicate these phenomena. However, these could

be considered in order to add more biological realism to the modelling.

As for the future work, it would be interesting to study free recall in this model to find

out if the effects of varying all the parameters are kept and what parameter values

would be suitable to obtain a good starting point for varying the parameters. Some

preliminary results indicate that a softmax factor of 2 would be better than 1 used here,

since it increases significantly the ratio of retrieval.

Moreover, the effect of all the parameters could be tested in a model with recency and

reproduce modality dependent bump position.

Furthermore, parameters could be tuned to have even more realistic values. If we had

a measure of how strongly encoded a pattern is, such as a sensitivity index [45], we

could replicate experimental forgetting curves, i.e. how strongly a pattern is encoded

along time for each age, so that the model has the same forgetting rates as the

experimental data.

Finally, a more advanced parameter sensitivity analysis could be performed.

52

Bibliography

[1] Bäckman, L. and Bèackman, L., “Memory functioning in dementia”, Advances in psychology,

1992

[2] Jaušovec, N. and Jaušovec, K., “Working memory training: Improving intelligence – Changing

brain activity.” Brain and Cognition, vol. 79, no.2, pp. 96-106, 2012

[3] Wheeler M.E., Ploran E.J., “Episodic Memory” Encyclopedia of Neuroscience, pp 1167-1172,

2019

[4] Abraham W. C., Jones O. D. and Glanzman D. L., “Is plasticity of synapses the mechanism of

long-term memory storage?” Science of Learning, vol. 4, no 1., pp. 1-10, 2019.

[5] Munawar K., Kuhn S. K. and Haque S., “Understanding the reminiscence bump: A systematic

review.” Plos One, vol. 13, no. 12, 2018.

[6] Murre J.M.J., Chessa A. G and Meeter M., “A mathematical model of forgetting and amnesia.”

Frontiers in psychology, vol.4, pp.76, 2013

[7] Wang D., Tan A., Miao C. And Moustafa A., “Modelling Autobiographical Memory Loss across

Life Span”, 2019

[8] Sandberg, A., “Bayesian attractor neural network models of memory.” Dissertation,

Stockholm University, 2003.

[9] Hebb, D.O. “The organization of behavior.” New York: Wiley, 1949

[10] Lansner, A. and Holst, A., “A Higher Order Bayesian Neural Network with Spiking Units.”

International Journal of Neural Systems, pp. 115-28, 1996.

[11] Orre, R., Lansner, A., Bate, A. and Lindquist, M., “Bayesian neural networks with confidence

estimations applied to data mining.” Computational Statistics & Data Analysis vol. 34 pp. 473-

493, 2000.

[12] Janssen, S. M. J., Rubin, D. C. And Jacques P. L., “The temporal distribution of autobiographical

memory: changes in reliving and vividness over the life span do not explain the reminiscence

bump”, Memory and Cognition, vol. 39, pp. 1-11, 2011

[13] Meeter, M. and Murre J. M. J., “Tracelink: A model of amnesia and consolidation” Cognitive

Neuropsychology, vol. 22, no. 5, pp. 559-587, 2005

[14] Rubin D., Rahhal, T. and Poon, L., “Things learned in early adulthood are remembered best”

Memory & Cognition, vol. 26, no. 1, pp. 3-19, 1998.

https://www-sciencedirect-com.focus.lib.kth.se/science/article/pii/B9780080450469007609

https://kth-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=46KTH_ALMA_DS21104842180002456&context=L&vid=46KTH_VU1_L&lang=en_US&search_scope=default_scope&adaptor=Local%20Search%20Engine&tab=default_tab&query=any,contains,sandberg%20Bayesian%20Attractor%20Neural%20Network%20Models%20of%20Memory&offset=0

53

[15] Janssen, S., Gralak, A., Murre, J., ”A model for removing the increased recall of recent events

from the temporal distribution of autobiographical memory.” Behavior Research Methods,

vol. 43, no. 4, pp. 916-930

[16] Schrauf R. W. and Rubin D. C. “Effects of voluntary immigration on the distribution of

autobiographical memory over the lifespan.” Applied Cognitive Psychology, vol. 15, no. 7, pp-

S75-S88, 2001.

[17] Rubin D. C., “The Basic-Systems Model of Episodic Memory.” Perspectives on Psychological

Science, vol. 1, no. 4, pp. 277-311, 2016

[18] Janssen S.M.J., Kristo, G., Rouw R. and Murre J.M.J.,”The relation between verbal and

visuospatial memory and autobiographical memory.” Consciousness and Cognition, vol. 31,

pp. 12-23, 2015

[19] M. A., Qi W., Kazunori, H., Shamsul, H., “A Cross-Cultural Investigation of Autobiographical

Memory: On the Universality and Cultural Variation of the Reminiscence Bump.” Journal of

Cross-Cultural Psychology, vol. 36, no. 6, pp. 739-749, 2005.

[20] Holmes A. and Conway M.A., “Generation identity and the reminiscence bump: Memory for

public and private events.” Journal of Adult Development, vol 6, no. 1, pp. 21-34 1999

[21] Berntsen D. and Rubin D.C., “Cultural life scripts structure recall from autobiographical

memory.” Memory & Cognition, vol. 32, no. 3, pp. 427-442, 2004

[22] Berntsen, D. and Rubin, D.C., "Emotionally charged autobiographical memories across the life

span: The recall of happy, sad, traumatic, and involuntary memories". Psychology and Aging,

vol. 17, no. 4, pp. 636–652, 2002

[23] Janssen, S., Chessa, A. and Murre, J., “The reminiscence bump in autobiographical memory:

Effects of age, gender, education, and culture.” Memory, vol. 13, no. 6, pp. 658-668, 2005.

[24] Karrer, T. M., Josef, A. K., Mata, R., Morris, E. D. and Samanez-Larkin, G. R., “Reduced

dopamine receptors and transporters but not synthesis capacity in normal aging adults: a

meta-analysis.” Neurobiology of Aging, vol. 57, pp. 36-46, 2017.

[25] Peters, A., Sethares, C. and Luebke, J. I., “Synapses are lost during aging in the primate

prefrontal cortex.” Neuroscience, vol. 152 no. 4, pp. 970-981, 2018.

[26] Galton, F. “Psychometric experiments.” Brain, vol. 2, pp. 149-162, 1879

[27] Crovitz, H. F., and Schiffman, H. “Frequency of episodic memories as a function of their age.”

Bulletin of the Psychonomic Society, vol. 4, 1974

[28] Rubin, D. C., “One bump, two bumps, three bumps, four? Using retrieval cues to divide one

autobiographical memory reminiscence bump into many.”

Journal of Applied Research in Memory and Cognition, vol. 4, no. 1, pp. 87-89, 2015.

https://kth-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=TN_wj10.1002/acp.835&context=PC&vid=46KTH_VU1_L&lang=en_US&search_scope=default_scope&adaptor=primo_central_multiple_fe&tab=default_tab&query=any,contains,effects%20of%20voluntary%20immigration&offset=0

https://kth-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=TN_wj10.1002/acp.835&context=PC&vid=46KTH_VU1_L&lang=en_US&search_scope=default_scope&adaptor=primo_central_multiple_fe&tab=default_tab&query=any,contains,effects%20of%20voluntary%20immigration&offset=0

https://kth-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=TN_sage_s10_1177_0022022105280512&context=PC&vid=46KTH_VU1_L&lang=en_US&search_scope=default_scope&adaptor=primo_central_multiple_fe&tab=default_tab&query=any,contains,A%20CROSS-CULTURAL%20INVESTIGATION%20OF%20AUTOBIOGRAPHICAL%20MEMORY%20On%20the%20Universality%20and%20Cultural%20Variation%20of%20the%20Reminiscence%20Bump&offset=0

https://kth-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=TN_sage_s10_1177_0022022105280512&context=PC&vid=46KTH_VU1_L&lang=en_US&search_scope=default_scope&adaptor=primo_central_multiple_fe&tab=default_tab&query=any,contains,A%20CROSS-CULTURAL%20INVESTIGATION%20OF%20AUTOBIOGRAPHICAL%20MEMORY%20On%20the%20Universality%20and%20Cultural%20Variation%20of%20the%20Reminiscence%20Bump&offset=0

https://kth-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=TN_informaworld_s10_1080_09658210444000322&context=PC&vid=46KTH_VU1_L&lang=en_US&search_scope=default_scope&adaptor=primo_central_multiple_fe&tab=default_tab&query=any,contains,the%20reminiscence%20bump%20in%20autobiographical%20memory%20age%20gender%20culture&offset=0

https://kth-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=TN_informaworld_s10_1080_09658210444000322&context=PC&vid=46KTH_VU1_L&lang=en_US&search_scope=default_scope&adaptor=primo_central_multiple_fe&tab=default_tab&query=any,contains,the%20reminiscence%20bump%20in%20autobiographical%20memory%20age%20gender%20culture&offset=0

https://kth-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=TN_elsevier_sdoi_10_1016_j_neurobiolaging_2017_05_006&context=PC&vid=46KTH_VU1_L&lang=en_US&search_scope=default_scope&adaptor=primo_central_multiple_fe&tab=default_tab&query=any,contains,Karrer%20Neurobiology%20%20Aging%202017&offset=0



https://kth-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=TN_elsevier_sdoi_10_1016_j_neuroscience_2007_07_014&context=PC&vid=46KTH_VU1_L&lang=en_US&search_scope=default_scope&adaptor=primo_central_multiple_fe&tab=default_tab&query=any,contains,synapses%20aging&offset=0

https://kth-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=TN_elsevier_sdoi_10_1016_j_neuroscience_2007_07_014&context=PC&vid=46KTH_VU1_L&lang=en_US&search_scope=default_scope&adaptor=primo_central_multiple_fe&tab=default_tab&query=any,contains,synapses%20aging&offset=0

https://kth-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=TN_elsevier_sdoi_10_1016_j_jarmac_2014_12_005&context=PC&vid=46KTH_VU1_L&lang=en_US&search_scope=default_scope&adaptor=primo_central_multiple_fe&tab=default_tab&query=any,contains,one%20bump%20two%20bumps&offset=0

https://kth-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=TN_elsevier_sdoi_10_1016_j_jarmac_2014_12_005&context=PC&vid=46KTH_VU1_L&lang=en_US&search_scope=default_scope&adaptor=primo_central_multiple_fe&tab=default_tab&query=any,contains,one%20bump%20two%20bumps&offset=0

54

[29] Koppel, J. and Rubin, D.C., “Recent Advances in Understanding the Reminiscence Bump: The

Importance of Cues in Guiding Recall From Autobiographical Memory.” Current Directions in

Psychological Science, vol. 25, no. 2, pp. 135-140, 2016

[30] Larsson, M. and Willander, J. “Autobiographical odor memory.” Annals of the New York

Academy of Sciences, vol. 1170, pp. 318-323, 2009.

[31] McCulloch, W. S., and Pitts, W., “A Logical Calculus of the Ideas Immanent in Nervous Activity.”

Bulletin of Mathematical Biophysics, vol 5, pp. 115-133, 1943

[32] Rosenblatt, F., “The Perceptron - a perceiving and recognizing automaton.” Report 85 – 460 -

1, Cornell Aeronautical Laboratory, 1957

[33] Brunel, N. and van Rossum M.C.W., “Lapicque’s 1907 paper: from frogs to integrate-and-fire”

Biol Cybern, vol. 97, pp. 337-339, 2007

[34] L. Hodgkin and A. F. Huxley, “A quantitative description of membrane current and its

application to conduction and excitation in nerve.” Journal of Physiology vol. 119, no. 4, pp.

500-544, 1952

[35] FitzHugh R., “Impulses and Physiological States in Theoretical Models of Nerve Membrane."

Biophysical Journal, vol. 1, no. 6, pp 445-466, 1961

[36] Hopfield, J. J., “Neural Networks and Physical Systems with Emergent Collective

Computational Abilities.” Proceedings of the National Academy of Sciences of the United

States of America, vol. 79, no. 8, pp. 2554-2558, 1982

[37] Lansner, A., “Associative memory models: from the cell-assembly theory to biophysically

detailed cortex simulations.” Trends in Neurosciences, vol. 32 no. 3, pp.178-186

[38] Ruppin E. and Reggia J., “A Neural Model of Memory Impairment in Diffuse cerebral Atrophy”,

British Journal of Psychiatry, vol. 166, pp. 19-28, 1995

[39] Sandberg, A., Lansner, A., Petersson K.M. and Ekeberg, O., “A Bayesian attractor network with

incremental learning.” Network: Computation in Neural Systems, vol. 13, no. 2, pp. 179-194,

2002

[40] Lansner, A., Sandberg, A., Petersson, K. M., & Ingvar, M. “On forgetful attractor network

memories.” Artificial neural networks in medicine and biology: Proceedings of the ANNIMAB-

1 Conference (eds. Malmgren, H., Borga, M. & Niklasson, L.) 54–62 Springer, 2000.

[41] Gluck M. A, and Myers C. E., “Hippocampal mediation of stimulus representation: A

computational theory”, Hippocampus, vol. 3, no.4, pp. 491-516, 1993

[42] Janssen S. M. J., Chessa A. G. and Murre J. M. J., “Modelling the reminiscence bump in

autobiographical memory with the Memory Chain Model”, Constructive Memory, NBU Series

in Cognitive Science, pp. 138-147, 2003

https://kth-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=TN_sage_s10_1177_0963721416631955&context=PC&vid=46KTH_VU1_L&lang=en_US&search_scope=default_scope&adaptor=primo_central_multiple_fe&tab=default_tab&query=any,contains,recent%20advances%20in%20understanding%20reminiscence%20bump&offset=0

https://kth-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=TN_sage_s10_1177_0963721416631955&context=PC&vid=46KTH_VU1_L&lang=en_US&search_scope=default_scope&adaptor=primo_central_multiple_fe&tab=default_tab&query=any,contains,recent%20advances%20in%20understanding%20reminiscence%20bump&offset=0

https://kth-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=TN_scopus2-s2.0-68649127087&context=PC&vid=46KTH_VU1_L&lang=en_US&search_scope=default_scope&adaptor=primo_central_multiple_fe&tab=default_tab&query=any,contains,autobiographical%20odor%20memory%20larsson&offset=0

55

[43] Hagena, H., Manahan-Vaughan, D., “Dopamine D1/D5, But not D2/D3, Receptor Dependency

of Synaptic Plasticity at Hippocampal Mossy Fiber Synapses that Is Enabled by Patterned

Afferent Stimulation, or Spatial Learning. ” Frontiers in synaptic neuroscience, vol.8, pp.31,

2016.

[44] Abdulrahman, H., Fletcher, P. C., Bullmore, E., Morcom, A. M., “Dopamine and memory

dedifferentiation in aging.” Neuroimage, vol. 153, pp. 211-220, 2017.

[45] Iatropoulos, G., “Modeling the Development of Synaptic Memory: Implications for

Reminiscence Bumps and Forget Rates.” Ongoing.

https://kth-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=TN_medline27721791&context=PC&vid=46KTH_VU1_L&lang=en_US&search_scope=default_scope&adaptor=primo_central_multiple_fe&tab=default_tab&query=any,contains,dopamine%20D1/D5%20but%20not%20D2/D3&offset=0



www.kth.se

TRITA -EECS-EX-2020:445

attractor neural network modelling of the lifespan retrieval …1466428/...1 kth royal institute of...

Documents