attractor neural network modelling of the lifespan retrieval …1466428/...1 kth royal institute of...
TRANSCRIPT
IN DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING,SECOND CYCLE, 30 CREDITS
, STOCKHOLM SWEDEN 2020
Attractor Neural Network modelling of the Lifespan Retrieval Curve
PATRÍCIA PEREIRA
KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
0
1
KTH Royal Institute of Technology
School of Electrical Engineering and Computer Science
Master Programme in Systems, Control and Robotics
June 2020
Author: Patrícia Pereira, [email protected]
Supervisors: Pawel Herman, [email protected]
Anders Lansner, [email protected]
Examiner: Erik Fransén, [email protected]
2
Abstract
Human capability to recall episodic memories depends on how much time has passed
since the memory was encoded. This dependency is described by a memory retrieval
curve that reflects an interesting phenomenon referred to as a reminiscence bump - a
tendency for older people to recall more memories formed during their young
adulthood than in other periods of life. This phenomenon can be modelled with an
attractor neural network, for example, the firing-rate Bayesian Confidence Propagation
Neural Network (BCPNN) with incremental learning.
In this work, the mechanisms underlying the reminiscence bump in the neural network
model are systematically studied. The effects of synaptic plasticity, network
architecture and other relevant parameters on the characteristics of the reminiscence
bump are systematically investigated.
The most influential factors turn out to be the magnitude of dopamine-linked plasticity
at birth and the time constant of exponential plasticity decay with age that set the
position of the bump. The other parameters mainly influence the general amplitude of
the lifespan retrieval curve.
Furthermore, the recency phenomenon, i.e. the tendency to remember the most recent
memories, can also be parameterized by adding a constant to the exponentially
decaying plasticity function representing the decrease in the level of dopamine
neurotransmitters.
Keywords: reminiscence bump, attractor neural network, Bayesian Confidence
Propagation Neural Network (BCPNN), recency, synaptic plasticity, episodic memory
3
Sammanfattning
Människans förmåga att återkalla episodiska minnen beror på hur lång tid som gått
sedan minnena inkodades. Detta beroende beskrivs av en sk glömskekurva vilken
uppvisar ett intressant fenomen som kallas ”reminiscence bump”. Detta är en tendens
hos äldre att återkalla fler minnen från ungdoms- och tidiga vuxenår än från andra
perioder i livet. Detta fenomen kan modelleras med ett neuralt nätverk, sk attraktornät,
t ex ett icke spikande Bayesian Confidence Propagation Neural Network (BCPNN)
med inkrementell inlärning. I detta arbete studeras systematiskt mekanismerna bakom
”reminiscence bump” med hjälp av denna neuronnätsmodell. Exempelvis belyses
betydelsen av synaptisk plasticitet, nätverksarkitektur och andra relavanta
parameterar för uppkomsten av och karaktären hos detta fenomen.
De mest inflytelserika faktorerna för bumpens position befanns var initial
dopaminberoende plasticitet vid födseln samt tidskonstanten för plasticitetens
avtagande med åldern. De andra parametrarna påverkade huvudsakligen den
generella amplituden hos kurvan för ihågkomst under livet. Dessutom kan den s k
nysseffekten (”recency effect”), dvs tendensen att bäst komma ihåg saker som hänt
nyligen, också parametriseras av en konstant adderad till den annars exponentiellt
avtagande plasticiteten, som kan representera densiteten av dopaminreceptorer.
Nyckelord: ”reminiscence bump”, attraktorneuronnät, Bayesian Confidence
Propagation Neural Network (BCPNN), nysseffekt, synaptisk plasticitet, episodiskt
mine.
4
Acknowledgements
I would like to thank Professors Pawel Herman and Anders Lansner for their
enthusiastic supervision throughout the project.
A warm thanks to my friends and colleagues with whom I shared my academic
journey.
My most grateful thanks to my parents for their love, care and support.
5
To my dear parents.
6
“Thus, our knowledge of the world, including ourselves, is incomplete as to space and
indefinite as to time. This ignorance, implicit in all our brains, is the counterpart of the
abstraction which renders our knowledge useful” - McCulloch and Pitts
7
Contents
1. Introduction ..................................................................................................................... 8
1.1 Research question ....................................................................................................... 9
1.2 Aim and scope ............................................................................................................. 9
1.3 Thesis Outline ............................................................................................................ 10
2. Background ................................................................................................................... 11
2.1. Reminiscence bump .................................................................................................. 11
2.1.1. Psychological and biological hypotheses ............................................................. 12
2.1.2. Different ways of cuing lead to different bumps ................................................... 14
2.2. Neuronal computational models ............................................................................... 15
2.2.1. Abstract Models ...................................................................................................... 15
2.2.2. Detailed Models ....................................................................................................... 16
2.2.3. Attractor neural network memory modelling ........................................................ 18
2.2.4. Other models ........................................................................................................... 19
3. Methods ......................................................................................................................... 24
3.1. Attractor Memory Network Model ............................................................................. 24
3.1.1. Modularity ................................................................................................................ 24
3.1.2. BCPNN learning and network dynamics ............................................................... 25
3.1.3. Meaning of model parameters ................................................................................ 26
3.2. Simulation protocol ................................................................................................... 27
3.3. Analysis and evaluation ............................................................................................ 29
4. Results ........................................................................................................................... 30
4.1. Reminiscence bump .................................................................................................. 31
4.2. Recency ...................................................................................................................... 44
5. Discussion ..................................................................................................................... 48
5.1. Summary of findings.................................................................................................. 48
5.2. Interpretation of the results and their impact ........................................................... 48
5.3. Limitations .................................................................................................................. 49
5.4. Social, ethical and sustainability aspects ................................................................ 49
6. Conclusion and Future Work ....................................................................................... 51
Bibliography ...................................................................................................................... 52
8
Chapter 1
Introduction
The advancement of neuroscience is beneficial to the humankind in many ways. There
are however two main directions that have been tangibly capitalized on in recent times.
The first one is that an improved understanding of neurological and psychological
mechanisms enables the development of better medical treatments and therapies for
neurological illnesses. It can also empower society as is illustrated by the example of
headphones that maximize motor learning by applying a small electric current to the
area of the brain that controls movement1. The other way in which a deepened
understanding of the brain is beneficial is that it is a source of inspiration for algorithms
that have useful applications such as deep learning algorithms used in computer vision
and speech processing. In this direction, it contributes to the development of more
“human-like” and powerful artificial agents.
Within neuroscience, memory plays a key role. Studying memory is important because
there has been an increasing interest in tackling brain diseases of which memory
deficits are common symptoms such as Alzheimer’s disease and other types of
dementia [1]. Memory is also a key aspect of cognition fundamental for intelligent
behavior, namely in learning and decision processes [2]. In another perspective, we
can consider life as a sum of memories important to keep our identity and mental
health.
The focus of this project is on long-term memory, precisely episodic memory. Episodic
memory is a category of long-term memory which concerns events that occurred
throughout one’s life. Important personal experiences belong to this category [3].
Long-term memory concerns information stored in the brain over a long period of time.
It is established through long-term potentiation and depression, by which circuits of
neurons in the brain are strengthened or weakened resulting in the strengthening or
weakening synapses helping shape memory specific ensembles [4]. Long-term
memory is important because it concerns the ability to learn new information and to
recall that information later in time.
One such phenomenon is the reminiscence bump, the tendency for people above
middle-aged to recall more memories from their 10-30 years old than from other
periods of their life, that has consistently been observed in autobiographical memory
research [5].
This phenomenon was observed in 68 experiments between 1988 and 2017 [5].
Although the precise years in which the bump occurs vary according to the experiment,
there is a strong empirical evidence that the maximal proportion of memories comes
from adolescence and young adulthood [5].
1 www.haloneuro.com
9
Due to the importance of this phenomenon, computational models have been built to
study potential mechanisms underlying this phenomenon. The advantage of a
computational model is that it allows us to predict how each parameter corresponding
to a biological mechanism affects the reminiscence bump which otherwise would be
infeasible to test in human experiments.
Among computational attempts at modelling the phenomenon in question are the
Memory Chain Model [6] and the AM-ART model [7]. However, the focus in this work
is on one of the most successful approaches to date in memory modelling, for its direct
correspondence to neuronal circuits and interpretation of biological mechanisms [8],
Bayesian confidence propagation neural network (BCPNN), developed in various
forms at KTH Royal Institute of Technology. BCPNN employs a Hebbian learning rule
[9] derived from the Bayes rule, which allows this recurrent model to function as an
associative attractor memory network [8]. There have also been cases where BCPNN
is used in a feed-forward architecture used for classification [10] and data mining [11].
Also, despite the availability of the BCPNN learning in a more biologically plausible
spiking neural network implementation [10], in this thesis a more abstract rate based
implementation is exploited due to the long-term nature of the memory phenomenon
that is at the center of attention here.
1.1 Research question
We hypothesize that plasticity parameters play a fundamental role in modulating the
aforementioned reminiscence bump. In addition, we test the capability of the model to
account for the improved recall of the most recently encoded memories called a
recency effect [12].
1.2 Aim and scope
The primary objective of this project is to study the effect of synaptic plasticity
parameters governing the incremental BCPNN learning process of a rate based
modular attractor memory network model on the characteristics of episodic memory
recall. In addition, it is aimed to investigate how network size affects the storage of
long-term memories over the modelled lifetime.
It is expected that the network model’s mechanisms can be interpreted in the context
of neurobiological effects. Therefore this study has potential to provide an embryo for
new neurobiological hypotheses helping in understanding long-term effects of episodic
memory recall.
10
In addition, it is intended to examine other phenomena in the memory recall over the
lifetime, e.g. recency effect.
The limitation defining the scope of the project is that the model only considers formed
long-term memory and therefore does not represent the transfer of the memories from
short-term memory into long-term as for example the Memory Chain Model [6] or the
interaction between different areas of the brain such as in the Tracelink model [13].
Another limitation is the use of a firing-rate based network instead of its more realistic
spiking counterpart. Finally, the focus is on the reminiscence bump and recency
phenomena so no other episodic memory recall effects are considered in simulations.
1.3 Thesis Outline
Chapter 2 is an introduction to the reminiscence bump, the psychological and
biological hypotheses for this phenomenon and how different ways of cuing lead to
different bumps and provides an overview of neuronal computational models. Chapter
3 describes the model and explains the meaning of model parameters, describes the
simulation protocol and the analysis and evaluation. Chapter 4 presents the results.
Chapter 5 is a discussion of the results and confrontation with the hypotheses in
Chapter 2. Chapter 6 is the conclusion and a description of future work.
11
Chapter 2
Background
2.1. Reminiscence bump
The reminiscence bump is the tendency for people above middle-aged to recall more
memories from their 10-30 years old than from other periods of their life and has
consistently being observed in autobiographical memory research [5]. The
reminiscence bump from 4 different experimental studies is displayed (Figure 1).
Figure 1: Distribution of autobiographical memories from older adults
as a function of reported age at time of event [14]
12
2.1.1. Psychological and biological hypotheses
There are several psychological and biological hypotheses for the origins of this
phenomenon that are compatible and even support or complement each other.
A cognitive account [14] states that in childhood one is confronted with many novel
events but due to their rapid change they are less useful in later situations. Additionally,
since novel events are more distinct and require more effort to process, memory
organization changes constantly. On the other hand, in periods of stability starting from
young adulthood, events are not that novel so there is less encoding effort and
increased proactive interference2, resulting in poorer recall. Therefore, events from
early adulthood corresponding to the transition period of memory organization from
rapid change to stability are the ones that are more likely to be recalled since they
have strong encoding in a stable memory organization with little proactive interference.
Besides the reminiscence bump, the lifespan retrieval curve exhibits other phenomena
such as childhood amnesia, the inability of adults to recall episodic memories from
their early childhood and recency, the tendency to retrieve recent memories. There
exists work that attempts to remove recency from the experimental results, focusing
on the reminiscence bump [15].
In a more concrete case of voluntary migration, the emigration period can be
considered as a period of novelty in which the migrant is confronted with new realities
and adaptation challenges. It is then followed by a period of stability, in which the
immigrant settled down. In the light of the cognitive account, this would affect the
reminiscence bump period of individuals that experienced migration and the peak
would be expected to correspond to the migration or adaptation period. This is exactly
what is reported in experimental studies [16]. Seniors that experienced migration were
divided in different groups, according to their migration age. As expected, seniors that
migrated during the usual bump period showed a bump corresponding to both the
usual bump period that was also the migration period. More notably, seniors that
migrated after the bump period, showed a bump corresponding to the migration period
instead of the usual bump period, thus reflecting the influence of periods of novelty
and adaptation and subsequent stability on the reminiscence bump. There were no
significant feature differences in their memories neither were they more emotional than
memories from other periods, indicating a pure cognitive adaptation phenomenon.
In the basic-systems model of episodic memory [17], it is claimed that episodic
memory is formed through the interaction of other supportive cognitive systems such
as diverse sensory and action systems, memory systems, and other types of systems
that result in multiple abilities such as search and retrieval, linguistic, emotional and
2 Proactive interference the interference effect of previously encoded memories on the encoding and
retrieval of new memories. An example of proactive interference is the difficulty in remembering someone’s new phone number after having previously learned the old one.
13
narrative capabilities. In this context, episodic memory is a subtype of autobiographical
memory that concerns the salient experiences that occurred throughout one’s life.
This model can be used to support a cognitive abilities account [18] for the reminisce
bump that based on the assumption that the rise to and decline from the reminiscence
bump coincides with the evolution of other cognitive abilities. In accordance with the
basic-systems model, the bump would be a result of the level of functioning of the
other abilities.
To test this theory, experiments assessing verbal and visuospatial memory together
with autobiographical memory retention confirmed the hypothesized link between
these and the latter [18], indicating that several cognitive abilities might have a direct
influence on the reminiscence bump. However, tests addressing processing speed,
memory and intelligence showed a much more rapid ability increase and much slower
decrease that could not alone explain the bump evolution [14]. Nevertheless, this
theory is still noteworthy and is featured throughout the autobiographical lifespan
retrieval literature.
Another account based on genetic fitness [14], in line with Darwin’s theory of evolution,
states that since early adulthood is the period of reproduction an enhanced memory
would serve the purpose of boosting cognitive abilities for selecting the best mate.
Thus, a stable memory ability throughout all lifespan would be traded for an enhanced
memory to support cognitive abilities during the reproduction period.
This explanation provides no direct mechanism to be tested but can rather be viewed
as an underlying explanation for the abovementioned accounts.
There are also hypotheses based on identity formation that result in slightly different
distributions for autobiographical memory and its content across different cultures but
with small differences in the reminiscence bump [19]. According to this account, late
adolescence and early adulthood is the time when a person develops its ideals and
vocations and defines oneself socially. Thus, this is the period from which events have
a great impact and are integrated in one’s view of oneself and life story, having higher
importance in memory organization [14]. This could also motivate the cognitive and
cognitive abilities account and benefit from them. There is a bump related to social
identity, one’s association with cultural and social groups, from tens to twenties and a
bump related to personal identity, one’s formation of life objectives and significant
relationships, form twenties to thirties [20].
Lastly among the psychological hypotheses about the origin of the reminiscence bump
is a hypothesis based on the life script, i.e. key events that one expects to experience
throughout one’s life at specific ages [21] such as completing school, getting a job,
marrying or having a child. These expectations are hypothesized to guide how one
constructs and recalls one’s life story and are highly influenced by one’s culture rather
than being focused on the individual, contrasting to the previous accounts [21].
Notably, the recall of negative or traumatic events does not lead to a bump, which may
be due to the case that they are not expected since they are not present in the life
script, which is composed of positive memories [22].
14
Coherent with these accounts on stronger encoding at the bump period is the
experimental data, in which the bump is connected to the age at the time of encoding
rather than age of time of retrieval [23].
Possible neurobiological hypothesis about mechanisms underlying the reminiscence
bump are the decrease of brain plasticity with aging due to decrease of dopamine
receptors [24] and pruning of synapses with aging [25]. Such mechanisms are the
focus of this modelling work, which is an extension of previous work with a similar aim
[8].
2.1.2. Different ways of cuing lead to different bumps
There are several methods for lifespan retrieval experiments with humans, such as
olfactory cues and different types of word cues. There is the Galton cue-word
technique, that rose from his recall of memories by using objects from the environment
to his creation of lists of cue words to trigger the recall of memories, counting the time
of recall and noting their distribution in the lifespan [26]. The Crovitz and Schiffman
technique attempts to be an improvement of the Galton technique by reducing its bias
and applying this improved technique to several participants [27]
Galton states in his work “Psychometric Experiments” [26] that ideas emerge by
association with an object perceived by the senses. He used to let the mind come up
with ideas that emerged from a certain object, being careful to avoid coming up with
ideas emerging from previous ideas. Afterwards, he would collect his thoughts and
draw conclusions about them. Galton usually walked around 400 meters in Pall Mall3
and scrutinized every object he saw until one or two thoughts arose. He then took note
of them and proceeded onto the next object, never allowing the mind to ramble. He
noticed the great variety of ideas that could emerge by repeating the same walk
several times but also how repetitive they could be. He then created his method which
consisted in writing lists of words on different small pieces of paper and placing them
under a book so that after some days he could read one word without knowing what
the other words were. Then he used a chronograph to count the time from reading a
word to the emergence of an idea. He would come up with one to four ideas per word
cue. This whole process required a very calm and neutral mindset. He would also note
from which period of his life his ideas came from and concluded that half of them were
from the period after leaving college.
In fact, both location and size of the bump can vary according to the cueing method
that is used, with several accounts for that phenomena. From experiments with
different cueing methods such as asking for important memories versus providing a
word cue, it is hypothesized that cueing could be more relevant than encoding in the
bump size and location [28,29]. Word cuing permits any association between the cue
and the memory resulting in an early and smaller bump peak contrasting with
important memory cuing that produces a narrative-based search connected to a
3 Street in London
15
person’s life story and produces a higher and later peak and a second peak in older
years [29].
Olfactory cues result in a bump in the first decade of life [28,29,30]. This can be
modelled in encoding by an accelerated decrease of plasticity of the olfactory cued
memory system, although other accounts for this earlier bump exist.
2.2. Neuronal computational models
In this project, the mechanisms of a neuronal computational model underlying
reminiscence bump parametrization are studied. Thus, in this section an overview of
neuronal computational models is provided.
2.2.1. Abstract Models
The McCulloch and Pitts model from 1943 resulted in the first brain inspired network
which units would correspond to basic brain cells [31] (Figure 2).
Figure 2: A McCulloch and Pitts unit
A unit receives several binary inputs 𝐼𝑘, representing synaptic inputs on dendrites, that
are summed in the soma. The neuron fires if this summed input exceeds a threshold,
resulting in the binary output 𝑦. If one of the inputs is inhibitory, with value 0, the neuron
does not fire. Such units can be used to implement Boolean functions such as OR,
AND and NOT and several units in a network can implement more complex functions
such as division by two. However, it has its limitations namely that the functions to be
implemented need to be hard coded and that it does not allow the implementation of
functions that are not linearly separable, such as the XOR function.
16
The Rosenblatt Perceptron proposed in 1957 [32] overcomes some drawbacks of the
McCulloch and Pitts model (Figure 3).
Figure 3: A Rosenblatt perceptron
The 𝑥𝑠 are the inputs and the 𝑤𝑠 are the weights. The bias, Ɵ, is the negative of the
activation threshold in the McCulloch and Pitts model.
The weights are not binary, enabling flexibility in the weights influence on the output,
and can also be negative. Moreover, the existence of a weight with zero input does
not lead to a complete inhibition and this perceptron can be trained with supervised
learning to perform binary classification.
2.2.2. Detailed Models
There exist more detailed models such as the Integrate and Fire Model [33] proposed
by Louis Lapicque on 1907, that contributes to spiking neural networks (Figure 4). By
stimulating nerve fibers that typically excited the frog’s leg muscle with an electrical
pulse, Lapicque concluded that the nerve membrane is polarizable and can be
compared to an RC circuit, a resistance in parallel to a capacitor.
17
Figure 4: An Integrate and Fire model unit described as an electric circuit
A presynaptic spike 𝛿(𝑡 − 𝑡𝑗(𝑓)) is low pass filtered at the synapse and generates an
input current pulse 𝛼(𝑡 − 𝑡𝑗(𝑓)). A current 𝐼(𝑡) charges the circuit with resistance 𝑅 and
capacitance 𝐶. The voltage 𝑢(𝑡) across the capacitance is compared to a threshold ϑ
and the neuron fires if the threshold is exceeded generating an output pulse 𝛿(𝑡 −
𝑡𝑖(𝑓)).
The RC circuit works as follows:
𝐼(𝑡) =𝑢(𝑡)
𝑅+ 𝐶
𝑑𝑢(𝑡)
𝑑𝑡
And by rewriting it the membrane time constant RC is yielded:
𝑅𝐶𝑑𝑢(𝑡)
𝑑𝑡= −𝑢(𝑡) + 𝑅𝐼(𝑡)
The Hodgkin-Huxley model, proposed in 1952, described in four ordinary differential
equations that included the ionic mechanisms of sodium and potassium the alteration
of the membrane potential in the squid giant axons [34]. This work was awarded the
Nobel prize in 1963. There have been several improvements developed for this model,
for instance, introducing more ionic mechanisms discovered from experimental data.
More simplified versions of this model have been proposed, such as the FitzHugh-
Nagumo model from 1961, with only two equations [35].
18
2.2.3. Attractor neural network memory modelling
Concerning memory modelling, Hopfield Networks [36] (Figure 5) have been
suggested as models of biological memory although they are not used in many
applications today since more powerful types of networks for e.g. classification exists.
However, this type of recurrent network models have been used to model cortical
associative memory [37].
After being trained with a set of patterns, a Hopfield network can retrieve one of these
patterns after being fed a distorted version of it. Its nodes can take for example values
of 0 or 1 and there are links, symmetric connections between these nodes. They result
from a Hebbian learning rule4, i.e. the weights between neurons that are active at the
same time are strengthened during training [9].
Figure 5: A Hopfield Network
A computational study using a variant of Hopfield’s network studied how attractor neural network models can qualitatively account for basic features of memory degradation in diffuse cerebral atrophy5 and be used to predict manifestations of Alzheimer’s disease based on neurological conditions [38]. Purely correlation-based learning rules such as in the Hopfield network, lead to
catastrophic forgetting, implying the loss of all memories when their number exceed
the network capacity. To overcome that, the learning rule must exhibit palimpsest
properties, or a gradual forgetting of older memories when learning new ones. Hopfield
suggested “learning within bounds” [36], in which connection weights are bounded.
This comes with a decrease of the network capacity from of 0.137N to 0.05N, i.e. the
palimpsest property is traded against long-term capacity.
4 Hebb stated that “neurons that fire together, wire together” 5 Loss of neurons and connections between them
19
Another way of avoiding catastrophic forgetting is presented in a firing-rate Bayesian
confidence propagation neural network (BCPNN) attractor neural network with
incremental learning developed by the Lansner group at KTH [8][39][40], which is used
in this project and is explained in detail in the next chapter.
2.2.4. Other models
A computational theory of hippocampal function [41] makes use of a connectionist6
model that depicts a stimulus representation over many elements as in figure 6:
Figure 6: A generic connectionist network for associative learning [41]
The input layer is activated by the stimulus inputs. A function of the weighted sums of
input activations activate the internal later that forms a new representation of the input.
The output node is also a function of weighted sums of middle layer node activations
and output layer activations are interpreted as the network’s response. In this model
learning about stimuli is associating their representations with appropriate outputs.
Figure 7 shows the complete cortico-hippocampal connectionist model:
6 Connectionism explains mental phenomena using artificial neural networks in which mental phenomena are described by interconnected networks of simple units and learning corresponds to modifying connections strengths based on experience
20
Figure 7: Cortico-hippocampal model [41]
The hippocampal network on the right is a predictive autoencoder that learns to recode
stimulus information in the internal layer. The network on the left represents learning
in the cerebral cortex and long-term memory storage. A more complete version of the
model can have several such cortical networks modulated by a hippocampal network
(or networks).
This theory makes predictions regarding the effects of hippocampal lesions.
The Tracelink model [13] is a connectionist model composed of three systems: a trace
system, a link system and a modulatory system as depicted in Figure 8:
Figure 8: The Tracelink Model [5]
The trace system is represented by the circles and connections on the plane and represents the neocortical basis for memories. The link system is represented by the
21
six circles in the rectangle and connections from these circles to the circles on the plane and includes the hippocampus and certain other structures. Finally, the modulatory system, ∆W, includes certain basal forebrain nuclei and several areas that have a more controlling function. An encoding of a memory is depicted in Figure 9:
Figure 9: Encoding of a new memory [13] In the first stage (A), trace elements are activated by a new memory. In the second stage (B), link elements are activated and relevant trace-link connections are enhanced. The modulatory system is activated. In the third stage (C) weak trace-trace connections are forming and the modulatory system is weakly activated. In the fourth stage (D) strong trace-trace connections have been formed and trace-link connections have faded away. The modulatory system is deactivated. This model can account for many characteristics of amnesia by deactivating the link system during learning and produce normal forgetting curves. It also provides an explanation for the advantages of learning under arousal for long-term recall.
22
The Memory Chain Model [6] is composed of a cascade of memory stores as depicted in Figure 10:
F
Figure 10: A – Memory systems at different time scales, B - Schematic of the Memory Chain Model [6]
When a new memory is encoded, a certain number of representations are formed in the first memory store. With time, this number of representations declines and some are transferred to a subsequent store. Each store has its own decline rate and the stores are organized in order of decreasing decline rate, representing the consolidation of short-term memories into long-term. The strength of a memory is proportional to the number of representations it has. This model can account for a range of amnesia data namely temporal gradients7 in several animals and also datasets from human patients with several neurological diseases. There was an attempt to replicate the reminiscence bump using this model considering the differentiation of memory distribution into two separate functions, a decline function and an encoding-sampling function [42].
7 Phenomenon characteristic of retrograde amnesia which consists in greater loss of memory for occurrences from the recent past than for occurrences from long ago
23
A more recent work is the Autobiographical Memory-Adaptive Resonance Theory (AM-ART) [7], depicted in figure 11:
Figure 11: AM-ART model [7]
AM-ART is a three-layer neural network. The event-specific knowledge is presented to the bottom layer F1 to encode life events in the middle layer F2 and a sequence of related events in F2 are encoded into an episode in layer F3.
Input channel 𝐹11−2 receives inputs of time and location from the entorhinal cortex,
𝐹13−4 receives input of people and activity from the fusiform gyrus and 𝐹1
5−6 receives inputs of emotion and imagery from the amygdala, as depicted by the arrows in the bottom. These inputs constitute the basis of events as represented by the connections from the bottom to the middle layer. The episodic pattern 𝑡𝑠 is formed in the events layer and is connected to the episodes layer in the hippocampus. There is a flow of memory search and readout throughout all layers as depicted by the grey arrows. This model was successfully used to model the lifespan retrieval curve originating a curve with the reminiscence bump and also childhood amnesia and recency.
24
Chapter 3
Methods
3.1. Attractor Memory Network Model
In this subsection the model used in this project is represented and explained in more
detail.
3.1.1. Modularity
The model has a specific modularity. In this network, a unit 𝜋𝑖𝑖′ (Eq. 2) corresponds to
activity in a minicolumn, which is a local group of neurons that can be considered an
elementary unit of the cortex. Minicolumns are then organized in groups, the
hypercolumns. While a hypercolumn represents a feature of a memory, its
corresponding minicolumns represent the values that the feature can take as it can be
seen in Figure 12 which provides an example of encoding of an object represented by
two features:
Figure 12: A modular attractor memory network with BCPNN learning
The hypercolumn on the left (bigger circle) represents orientation of a seen object and
each minicolumn (smaller circles) represents an angle of 0, 30, 60 or 90 degrees. The
hypercolumn on the right represents color and each of its minicolumns represents a
different color. The activity within each hypercolumn is normalized. Minicolumns
belonging to different hypercolumns have connections described by weights as it is
depicted in the figure for the connections of the 30 and 90 degrees’ minicolumns.
The network used in this project is an attractor network with BCPNN plasticity [8]. It
has 144 units, having 12 hypercolumns with 12 minicolumns each, to be able to store
the desired number of patterns.
25
3.1.2. BCPNN learning and network dynamics
The differential equations governing unit behavior in the model are:
𝜏𝑐
𝑑ℎ𝑖𝑖′(𝑡)
𝑑𝑡= 𝛽𝑖𝑖′(𝑡) + ∑ log (∑ 𝑤𝑖𝑖′𝑗𝑗′(𝑡)𝜋𝑗𝑗′
𝑀𝑖𝑗′ (𝑡)) − ℎ𝑖𝑖′(𝑡)𝑁
𝑗 (1)
𝜋𝑖𝑖′(𝑡) =𝑒
ℎ𝑖𝑖′
∑ 𝑒ℎ𝑖𝑗
𝑗
(2)
𝑑Λ𝑖𝑖′(𝑡)
𝑑𝑡= 𝛼([(1 − 𝜆0)𝜋𝑖�̂�(𝑡) + 𝜆0] − Λ𝑖𝑖′(𝑡)) (3)
𝑑Λ𝑖𝑖′𝑗𝑗′(𝑡)
𝑑𝑡= 𝛼 ([(1 − 𝜆0
2)𝜋𝑖𝑖′̂(𝑡)𝜋𝑗𝑗′̂(𝑡) + 𝜆02] − Λ𝑖𝑖′𝑗𝑗′(𝑡)) (4)
β𝑖𝑖′(𝑡) = 𝑙𝑜𝑔(Λ𝑖𝑖′(𝑡)) (5)
𝑤𝑖𝑖′𝑗𝑗′(𝑡) =Λ𝑖𝑖′𝑗𝑗′(𝑡)
Λ𝑖𝑖′(𝑡)Λ𝑗𝑗′(𝑡) (6)
A set of active units, one per hypercolumn, indexed by 𝑖, represents an activated
memory and its level of activation, indexed by 𝑖′ is a confidence estimate. The unit
support is ℎ𝑖𝑖′ and evolves according to Eq. 1. A background activity 𝜆0 is introduced
to avoid logarithms of zero in the calculations (Eq. 3 and 4).
The encoding of a memory consists in the modification of the network’s weights, 𝑤,
and biases, 𝛽, (Eq. 5 and 6) so that the configuration of unit activations corresponding
to that memory becomes an attractor state of the network. While connection strengths
between minicolumns belonging to different hypercolumns are represented by
weights, minicolumns belonging to the same hypercolumn are related to each other
via lateral inhibition, implemented with softmax, in each hypercolumn as in Eq. 2.
There is a learning rate parameter in the incremental model, 𝛼 in Eq. 3 and 4, which
can control the strength of encoding of each memory or how much it modifies the
network’s weights and biases.
Before the introduction of this incremental approach, a previous approach would
encode several memories at once [8] by estimating the weights and biases’
probabilities by counting units’ co-activations, which would result in catastrophic
forgetting.
The incremental approach differentiates from the counter approach in the way that it
estimates the weights and biases’ (Eq. 6 and 5, respectively) with exponential moving
averages Λ𝑖𝑖′ and Λ𝑖𝑖′𝑗𝑗′ of activity and co-activity of the estimated unit activations 𝜋𝑖𝑖′̂
26
(Eq. 3 and 4). The advantage is that the learning rule can be applied online and the
network exhibits palimpsest properties. It is therefore possible to mimic learning and
gradual forgetting throughout time, which suits the objective of this project.
Plasticity is governed by the whole set of differential equations. According to the first
differential equation, the support of a unit ℎ𝑖𝑖′ is affected by the weighted contributions
of presynaptic units ∑ log (∑ 𝑤𝑖𝑖′𝑗𝑗′(𝑡)𝜋𝑗𝑗′𝑀𝑖𝑗′ (𝑡))𝑁
𝑗 added with the unit bias 𝛽𝑖𝑖′. This
value is then affected by the normalization in Eq. 2 resulting in the value for the unit
activation 𝜋𝑖𝑖′ , which is used in the calculations of the exponential moving averages
in Eq. 3 and 4 that are then used to update the biases and weights. These are then
reused once again in the first differential equation. This cycle is continuously repeating
during learning. During recall the same happens but with the learning rate 𝛼 set to
zero, preventing the weights and biases from changing their values.
3.1.3. Meaning of model parameters
Plasticity Parameters
To model the degree of synaptic plasticity, there is a learning rate parameter in the
incremental model, 𝛼 in Eq. 3 and 4 [8], which can control the strength of encoding of
each memory or how much it modifies the network’s weights and biases. By
decreasing it over time during learning it can be used to represent decrease of
dopamine receptors combined with other aging phenomena, allowing the modelling of
a reminiscence bump [8].
This way, 𝛼 = 𝛼0𝑒−
𝑡
𝜏𝑠 + 𝛼𝑐𝑠𝑡, with 𝜏𝑠 being the time constant of the age-dependent
plasticity decay, that mediates this decay of dopamine receptors. Both 𝛼0 and 𝜏𝑠 are
parameters that can be varied to investigate its effect on the reminiscence bump.
While performing experiments with this model, I observed that if the learning rate
decay stopped at a certain age it would be possible to model recency, the tendency to
retrieve recent memories. This can be achieved by decomposing the evolution of the
learning rate in a constant function 𝛼𝑐𝑠𝑡 added to the latter decreasing function, a
parameter that is also important to investigate.
Neural parameters
The membrane time constant, 𝜏𝑐, can represent the RC time constant as in the
Integrate and Fire Model [20]. It thus can represent the time for the activation value to
reach about 63% of its target value, the charging of the capacitor through the
resistance, or reduce its value to about 37% in the absence of activity, being the level
of activation the voltage of the capacitor.
Other model parameters
A background noise activity 𝜆0 ≪ 1 was introduced to avoid logarithms of zero in the
calculations resulting in all minicolumns having a minimal activity.
27
Regarding lateral inhibition, the softmax factor 𝛾 that can be present in Eq. 2
multiplying ℎ in the nominator and denominator represents how concentrated the
estimate is around the larger input values.
Other model parameters that can be varied are the degree of memory cue perturbation
or the number of hypercolumn swaps in the perturbed pattern and the successful
recognition overlap threshold. The number of hypercolumn swaps would represent the
similarity of the cue to the target memory and the successful recognition overlap
threshold would represent the required vividness of the recalled memory.
Network parameters
The network size, more concretely the number 𝐻 in a network with 𝑁=𝐻 × 𝑀 units
organized in 𝐻 hypercolumns with 𝑀=𝐻 minicolumns each, represents the number of
minicolumns available to store each memory. Based on experience, it is often good to
have 𝑀=𝐻.
3.2. Simulation protocol
Training consists in sequentially clamping the activation of each pattern for a certain
duration, while letting the networks’ weights and biases evolve. During recall the
network state evolves after being presented with a perturbed version of a pattern, while
keeping the networks’ weights and biases fixed. Recall overlap is the overlap between
the actual pattern and the final state reached.
In cued recall, the perturbed version of the pattern consists in having a fixed number
of randomly chosen hypercolumns with their activated minicolumn randomly swapped.
For each pattern, the network is presented several times with a different perturbed
version of that pattern and the overlaps are calculated. The measure of interest is the
ratio of retrieval (Eq. 8).
To investigate the effect of and sensitivity to a certain parameter on the reminiscence
bump characteristics, a greedy approach is followed. All parameters are kept constant
in the simulation except the one that is being investigated.
The values of the parameter that is being subject to examination are chosen as follows:
If it is a parameter which variation results in bump horizontal translation (such as initial
plasticity and time constant of the age-dependent plasticity decay), its value is varied
from values that result from a bump significantly shifted to the left to a bump
significantly shifted to the right. The effect of these parameters is systematically
measured by fitting a function in which the middle age of the bump is dependent on
parameter value variations. If it is a parameter which variation results in general
increase of ratio of retrieval (all the other parameters) its value is varied from values
that result from very low to very high ratios of retrieval. The effect of these parameters
is systematically measured by fitting a function in which the total ratio of retrieval over
the lifespan, the area under the ratio of retrieval curve, is dependent on parameter
value variations.
28
The following values for the initial combination of parameters have been chosen since
they yield a realistic configuration of the bump, a graph similar to the experimental
lifespan retrieval curves from studies with humans:
Parameter Value
Network size 144 units in a network arranged in a 12-by-12 grid
Number of presented patterns 70
Membrane time constant, 𝑡𝑐 1 Softmax gain, 𝛾 1
Euler step, 𝑑𝑡 0.01 Initial plasticity, 𝛼0 0.3
Time constant of plasticity decay, 𝜏𝑠 10
Background activity level, 𝜆0 0.01
Learning time 1
Clamping recall time 0.1
Recall time 2
Number of swaps, 𝑠 6
Number of generated networks 100 Number of perturbed patterns presented per age in
each network 100
Overlap threshold, 𝑜𝑡 11/12 Constant plasticity, 𝛼c -
Table 2: Parameters for the initial configuration of simulations
Since the ratio of retrieval follows a Bernoulli distribution of the random variable which
takes the value 1 if the threshold for successful recall overlap is exceeded or 0
otherwise, the variance of the ratio of retrieval is its value multiplied by one minus its
value. The standard error of the mean is the square root of the variance divided by the
squared root of the sample size.
29
3.3. Analysis and evaluation
The network is presented with 70 patterns, where each pattern represents the salient
episodic memories of one year of life. The overlap 𝑜 between two patterns, 𝑝1 and 𝑝2,
is defined as
𝑜 =𝑝1.𝑝2
‖𝑝1‖‖𝑝2‖ (7)
The overlaps between pairs of patterns are mostly between 0 and 0.25. Sometimes
there is a higher overlap, but it does not constitute a problem given that the usual
overlap threshold defined for a successful recall is 0,916 (11/12) and the overlap
between a pattern and its perturbed version, the same pattern but with a certain
number of randomly chosen hypercolumns with their activated minicolumn randomly
swapped, is also high enough. These overlap values result from the total number of
patterns presented and the network size. The ratio of retrieval 𝑟 of a pattern is defined
as
𝑟 =𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑢𝑐𝑒𝑠𝑠𝑓𝑢𝑙 𝑟𝑒𝑐𝑎𝑙𝑙𝑠
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑐𝑎𝑙𝑙 𝑎𝑡𝑡𝑒𝑚𝑝𝑠 (8)
With this network size, almost all patterns in the incremental network with a low
constant learning rate of 0.01 have a ratio of retrieval of 1 and never below 0.99, which
means that the network’s capacity is not exceeded, so there is no incorrect storage of
patterns which would affect the validity of the experiments. This ratio of retrieval,
results from presenting the network with a perturbed pattern with 3 swaps. This means
that the ratio of retrieval can only increase when presenting the network with a
perturbed pattern yields an overlap increase from 0.75, at presentation time, to higher
than 0.916 after relaxation, signaling converge towards the original pattern. This
choice of overlap values is based on the experiments of Sandberg et al. (2002), Figure
4 [39], in which the ratio of retrieval increases if the overlaps increase from 0.8 (2
random hypercolumn swaps in a 100 unit network arranged in 10 hypercolumns with
10 minicolums each) to 0.85.
30
Chapter 4
Results
The primary aim of this work was to study the mechanisms underlying the modulation
of reminiscence bump obtained in long-term simulations of episodic memory retrieval
using a modular attractor network with incremental BCPNN learning [8]. To study the
effect of each parameter in the reminiscence bump the parameter is varied in a greedy
way and a curve is fitted relating the parameter variation and a quantitative measure
of its effect on the reminiscence bump.
All the parameters explained in section 2.4 are analyzed in this way. All simulations were performed in the same conditions as those used in an example demonstrated in Figure 11 and have variance of similar magnitudes. The experimental paradigm, described in more detail in section 3.2, consisted in training the network with 70 patterns, one per a simulated year within the network’s lifetime and then perform recall. In training, memories are sequentially presented to the network for 100 time steps or Euler steps. Recalling consisted in clamping the network with a perturbed pattern for 10 time steps and letting the network state evolve for 200 time steps. Recalling was performed sequentially for all the patterns after training. For each pattern, 100 recall attempts were made (trials with randomly perturbed cues) and the ratio of retrieval (Eq. 8) is shown in the graphs in this section. To evaluate the effect of a parameter on the reminiscence bump, all parameters are kept constant except for the one that is being investigated.
31
4.1. Reminiscence bump
In the first set of simulations we examined the individual effects of selected parameters
on the characteristics of the reminiscence bump.
In Figure 13 an example of cued recall given the parameters in Table 1 is shown.
Figure 13: Recall with initial configuration of parameters and standard error of the
mean
32
4.1.1. Network size
The network size, more concretely the number of hypercolumns, 𝐻, and minicolumns
per hypercolumn, 𝑀, determines the memory capacity. In this work we decided to
maintain the following relationship: 𝐻=𝑀 The larger the network, the lower the
crosstalk between the stored memory patterns and the higher the ratio of retrieval is
(Figure 14).
Figure 14: The memory retrieval performance over the network’s lifetime depending
on the network size (𝐻=𝑀={9, 10, 11, 13, 15, 17, 20})
33
In order to systematically analyze the effects of the network size in the lifespan retrieval
curve, a more detailed analysis of the relation between network size and the area
under the ratio of retrieval curve was made, which demonstrated the sigmoidal
relationship between the aforementioned entities (Figure 15).
Figure 15: Sigmoidal fit made to the simulation data explaining the dependence of
the area under the ratio of retrieval curve on the network size
The area under the ratio of retrieval curve is the sum of the ratio of retrieval for all
ages. Therefore, it should be interpreted as the total recalling during and individual’s
lifetime. It is used to account for the complete recall capability of the individual. The
type of equation to fit, sigmoidal, is chosen as the one that visually represents better
the data.
34
4.1.2. Initial plasticity level
The initial plasticity level, 𝛼0, represents the initial level of dopamine in the brain. If it
is low, the plasticity is very low at the older age of the network, resulting in recalling
only memories from early years. If it is too high only memories from the recent years
are recalled since there is a high level of plasticity even after its decay, which makes
the network adapt to the latest memories that “overwrite” the early memories. This
results in a shift of the bump mediated by the initial plasticity level (Figure 16).
Figure 8: Variation of
Figure 16: The memory retrieval performance over the network’s lifetime depending
on the initial plasticity level (𝛼0={0.01, 0.1, 0.3, 1, 2, 10})
35
In order to systematically analyze the effects of the initial plasticity level in the lifespan
retrieval curve, a more detailed analysis of the relation between initial plasticity level
and the middle age of the bump was made, which demonstrated the exponential
relationship between the abovementioned entities (Figure 17).
Figure 17: Exponential fit made to the simulation data explaining the dependence of
the middle age of the bump on the initial plasticity level
36
4.1.3. Time constant of the age-dependent plasticity decay
The time constant of the age-dependent plasticity decay, 𝜏𝑠, mediates the time it takes
for the initial plasticity level to decrease. The higher it gets, the more the bump shifts
to the right. If it is too high, plasticity is high throughout lifespan and consequently the
more recent memories are retrieved, as it can be seen in Figure 18. It has the same
effect as varying the initial plasticity parameter (compare with Figure 16).
Interestingly, olfactory cues result in a bump in the first decade of life [30,31,32]. This
can be modelled in encoding by an accelerated decrease of plasticity of the olfactory
cued memory system by using a smaller value for 𝜏𝑠.
Figure 18: The memory retrieval performance over the network’s lifetime depending
on the time constant of the age-dependent plasticity decay (𝜏𝑠={2, 8, 10, 15, 20, 50})
37
In order to systematically analyze the effects of the time constant of the age-dependent
plasticity decay in the lifespan retrieval curve, a more detailed analysis of the relation
between time constant of the age-dependent plasticity decay and the middle age of
the bump was made, which demonstrated the sigmoidal relationship between the
abovementioned entities (Figure 19).
Figure 19: Sigmoidal fit made to the simulation data explaining the dependence of
the middle age of the bump on the time constant of the age-dependent plasticity
decay
38
4.1.4. Background activity level
The background noise activity 𝜆0 ≪ 1 was introduced to avoid logarithms of zero in the
calculations resulting in all the minicolumns having a minimal activity. If there was no
background activity the weights and biases would be symmetric and memories would
not be forgotten since weights grow exponentially (Eq 6) and biases decrease
exponentially (Eq 5) and their overall sum would be the same so the lower this activity,
the higher the recall. For very low values of this activity the recall decreases with the
decrease of the background activity perhaps because the background activity is not
sufficient and perturbs calculations (Figure 20).
Figure 20: The memory retrieval performance over the network’s lifetime depending
on the background activity level ( 𝜆0={0.15, 0.12, 0.11, 0.1, 0.01, 0.001, 1e-5, 1e-10,
1e-25})
39
In order to systematically analyze the effects of the background activity level in the
lifespan retrieval curve, a more detailed analysis of the relation between background
activity level and the area under the ratio of retrieval curve was made, which
demonstrated the linear relationship between the abovementioned entities (Figure 21).
Figure 21: Linear fit made to the simulation data explaining the dependence of the
area under the ratio of retrieval curve on the background activity level
40
4.1.5. Degree of memory cue perturbation
The degree of memory cue perturbation (noisy pattern) or number of binary swaps, 𝑠,
represents the similarity between the cue and the target memory, as explained in the
subsection Meaning of Model Parameters in Methods. Thus, the higher the number of
swaps, the lower the ratio of retrieval. This is because the perturbed pattern leads the
network state to another attractor that does not correspond to the target one (Figure
22).
Figure 22: The memory retrieval performance over the network’s lifetime depending
on the degree of memory cue perturbation (𝑠={1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12})
41
In order to systematically analyze the effects of degree of memory cue perturbation in
the lifespan retrieval curve, a more detailed analysis of the relation between degree of
memory cue perturbation and the area under the ratio of retrieval curve was made,
which demonstrated the sigmoidal relationship between the abovementioned entities
(Figure 23).
Figure 23: Sigmoidal fit made to the simulation data explaining the dependence of
the area under the ratio of retrieval curve on the degree of memory cue perturbation
42
4.1.6. Overlap threshold
The overlap threshold determines the overlap needed for successful recall,
representing the required vividness of the recalled memory. It is noticeable that for a
threshold of 5/12 or larger, the ratio of retrieval is the same throughout lifespan (Figure
24). Since the overlap between the 70 original patterns in a 12-by-12 network patterns
is usually between 0 and 3/12 the high ratio of retrieval for thresholds between these
values is due to this overlap values. Each value selected for the threshold is an
addition of 1/12 to the previous value because the network has 12 hypercolumns and
12 minicolumns.
Figure 24: The memory retrieval performance over the network’s lifetime depending
on the overlap threshold (𝑜𝑡={1/12, 2/12, 3/12, 4/12, 5/12, 6/12, 7/12, 8/12, 9/12,
10/12, 11/12, 12/12})
43
In order to systematically analyze the effects of overlap threshold in the lifespan
retrieval curve, a more detailed analysis of the relation between overlap threshold and
the area under the ratio of retrieval curve was made, which demonstrated the
sigmoidal relationship between the aforementioned entities (Figure 25).
Figure 25: Sigmoidal fit made to the simulation data explaining the dependence of
the area under the ratio of retrieval curve on the overlap threshold
44
4.2. Recency
In the second set of experiments we focused on another aspect of the memory retrieval
curve over the model’s lifetime, which reflects the capability to recall recently
memorized pattens, referred to as the recency phenomenon depicted in the rise in
recall in the years above 50 in Figure 1.
I observed that if the learning rate decay stopped at a certain age it would be possible
to model recency. It turns out that by decomposing the evolution of the learning rate
as a constant plus a decreasing function, recency is achieved (Figure 26). Childhood
amnesia, the inability of adults to recall episodic memories from their early childhood,
is also observed.
The modelling of recency is the secondary contribution of this work.
The values of the parameters resulting in the recency graph (Figure 26) are the values
of Table 2 with the following changes:
• initial plasticity, 𝛼0 = 0.25
• constant plasticity, 𝛼𝑐𝑠𝑡 = 0.015
• time constant of plasticity decay, 𝑡𝑠 = 8
• number of swaps, 𝑠 = 8
This changes were made to better fit the graphs from experimental studies with
humans.
45
Figure 26: Recency with the standard error of the mean
The same experiment without the constant plasticity parameter was performed to allow
the comparison of the two graphs (Figure 27). It can be observed that the introduction
of the constant plasticity leads to a decrease in the recall of early memories, to the
desired recency effect and to a shift of the bump to the right.
Figure 27: Experiment from figure 26 without the constant plasticity
46
4.2.1. Constant Plasticity
The constant plasticity is a fixed component of plasticity level that sets the lower limit
for the decay of plasticity throughout lifetime. Thus, increasing constant plasticity
results in a bump more shifted to the right and a higher recency effect (Figure 28). If
the fixed plasticity is too high a bump forms in older ages due to the high plasticity at
that age that prevails over the decreasing plasticity function.
Figure 28: The memory retrieval performance over the network’s lifetime depending
on the constant plasticity (𝛼c={0, 0.01, 0.015, 0.02, 0.025, 0.1})
47
4.2.2. Time constant of the age-dependent plasticity decay
The time constant of the age-dependent plasticity decay, 𝜏𝑠, mediates the time it takes
for the initial plasticity level to decrease. Thus, increasing the time constant of the age-
dependent plasticity decay results in a bump more shifted to the right and a higher
recency effect. If the time constant of plasticity decay is too high a bump forms in older
ages due to the high plasticity at that age that prevails over the fact that plasticity is
decreasing throughout time (Figure 29).
Figure 29: The memory retrieval performance over the network’s lifetime depending
on the time constant of the age-dependent plasticity decay (𝜏𝑠={4, 8, 10, 15, 20, 50})
48
Chapter 5
Discussion
5.1. Summary of findings
The parameters that showed the most substantial effect in the bump were the initial
plasticity and time constant of the age-dependent plasticity decay because these are
the ones that set the position of the bump, namely the age at which the retrieval curve
has higher magnitude and its peak. The constant component of the plasticity value
throughout lifetime enables to model recency and also has a substantial effect on the
position of the bump and shape of the lifespan retrieval curve, when added. By tuning
this parameter a curve with a recency tail is achieved and a bump in later years can
also be achieved. The other parameters have a lower relevance since they mainly only
influence the magnitude of the retrieval curve.
5.2. Interpretation of the results and their impact
All the psychological hypotheses presented in the beginning of this work are
compatible with the neurobiological hypothesis that the mechanisms underlying the
reminiscence bump are
• the decrease of brain plasticity with aging due to dropping levels of dopamine
receptors and
• the pruning of synapses with aging
These are represented in this model by the decaying plasticity throughout time.
Dopamine D1 activation influences synaptic plasticity [43]. It can provoke neuronal
excitation or inhibition, resulting in synaptic potentiation or depression, an increase or
decrease in the efficacy of the synapses, or “connections” between neurons. It is
known that D1 decreases with aging [44].
By tuning the most important parameters, initial plasticity level, time constant of
plasticity decay, constant plasticity and some other parameters, a curve with recency
and childhood amnesia is produced. There was no need for a cascade of systems or
different encoding and forgetting functions such as in the attempt to replicate the
reminiscence bump with the Memory Chain Model [6]. The curve is similar to the curve
generated by the AM-ART model [7].
The parametrization of the curve with recency is still compatible with the
neurobiological hypothesis of decreasing dopamine receptors and pruning of
synapses with aging. The parametrization using the constant plasticity parameter
49
suggests a biologically motivated assumption that the dopamine decay throughout
lifetime has a lower limit.
5.3. Limitations
Although all results seem to be realistic using this approach that considers only long-
term memory, the interaction between different brain areas was not represented as in
the Memory Chain Model [6] and Tracelink model [13]. This could have been
performed by connecting several networks representing the different areas that deal
with memory and making use of synaptic adaptation, providing more realism to the
model although since only long-term memory is considered this approach does not
seem necessary.
Furthermore, if we had a measure of how strongly encoded a pattern is, such as a
sensitivity index [45], we could replicate experimental forgetting curves, i.e. how
strongly a pattern is encoded along time for each age. We could tune the model to
have the same forgetting rates as the experimental data and thus be more realistic.
By doing this, the values obtained would have a more relevant meaning and the
analysis could be quantitatively more realistic.
It should be made clear that the values used in the parametrization are less relevant.
The qualitative relations are expected however to be interpretable. Thus, this approach
allowed us to understand the origin of translation as well as decrease and increase of
the bump amplitude, but the precise values play little role.
A lesson learned is that since there are different models and ways in which a
phenomenon can be parametrized one has to choose the model based on the
assumptions that one is willing to make and level of realism one wants to achieve.
5.4. Social, ethical and sustainability aspects
5.4.1. Ethical aspects
This project makes use of a computational approach to study the brain. It is an
approach that helps avoiding excessive experiments with humans and animals. There
is no risk of complications or damages in the brain resulting from chemical or invasive
techniques, e.g. the manipulation of dopamine levels in the brain or stimulating
neurons. This is a safe approach and yet allowed us to formulate assumptions about
the lower limit for the decay of dopamine throughout lifetime.
This work predicts the effects that changing the level of dopamine has in the
reminiscence bump and therefore may suggest a pharmacological intervention to yield
these effects in real life. Here, we can raise an ethical question of whether using
pharmacological control over cognitive skills is a better option than natural training
methods such as mental exercises and hard work.
50
Another ethical question is why we should manipulate the natural evolution of memory
skills since its declining is a natural process of aging. So, boundaries should be traced
to determine when we should apply the pharmacological approach and to whom even
if the results seem promising.
5.4.2. Social aspects
The study of the brain has direct impact in the neurology community because the more
it is known of its functioning the better is the diagnosis and treatment of neurological
diseases. This improvement leads to an increase in the well-being of the patients and
their interpersonal relationships can be kept. This constitutes a positive social impact,
preventing isolation and marginalization of the old people and other patients with
neurological and psychiatric disorders and slowing down the degeneration process
caused by these diseases.
5.4.3. Sustainability aspects
Promoting healthy lives and well-being at all ages is a sustainable development goal
that should be taken in account in a more human society. As it was mentioned
throughout this section, studying memory can reduce the impairments caused by
diseases that affect brain function which have more incidence on the elderly and this
is important because life expectancy is continuously increasing.
From a technological and economic perspectives, the algorithmic implementation of
brain function contributes to the development of better AI technologies taking
advantage of how the brain works to improve their performance. This can increase
productivity in several applications resulting in benefits for the economy and society.
51
Chapter 6
Conclusion and Future Work
In this project, the human lifespan retrieval curve was modelled with an incremental
attractor network model and the effect of several parameters of the model were
analyzed in a systematic way. The objective was to study the mechanisms that
modulate bump characteristics, i.e. position and magnitude, in this firing-rate attractor
neural network model with BCPNN plasticity [8].
The parameters that showed the most significant effect on the bump characteristics
were the initial plasticity and time constant of the age-dependent plasticity decay that
set the position of the bump. The constant component of plasticity value also
demonstrated a significant impact on the position of the bump and shape of the
lifespan retrieval curve, when added. The network size has to be large enough for the
storage of all the patterns and the magnitude of the retrieval curve increases with the
network size. The other parameters mainly influence the magnitude of the retrieval
curve.
Despite the model’s simplicity and high level of abstraction it has demonstrated
considerable potential to simulate the human lifespan retrieval curve phenomena. This
firing-rate based attractor neural network with BCPNN plasticity [8] provides insights
into several mechanisms underlying reminiscence bump characteristics and even
recency and childhood amnesia so there does not seem to be a motivated need for
more complex or spiking models to replicate these phenomena. However, these could
be considered in order to add more biological realism to the modelling.
As for the future work, it would be interesting to study free recall in this model to find
out if the effects of varying all the parameters are kept and what parameter values
would be suitable to obtain a good starting point for varying the parameters. Some
preliminary results indicate that a softmax factor of 2 would be better than 1 used here,
since it increases significantly the ratio of retrieval.
Moreover, the effect of all the parameters could be tested in a model with recency and
reproduce modality dependent bump position.
Furthermore, parameters could be tuned to have even more realistic values. If we had
a measure of how strongly encoded a pattern is, such as a sensitivity index [45], we
could replicate experimental forgetting curves, i.e. how strongly a pattern is encoded
along time for each age, so that the model has the same forgetting rates as the
experimental data.
Finally, a more advanced parameter sensitivity analysis could be performed.
52
Bibliography
[1] Bäckman, L. and Bèackman, L., “Memory functioning in dementia”, Advances in psychology,
1992
[2] Jaušovec, N. and Jaušovec, K., “Working memory training: Improving intelligence – Changing
brain activity.” Brain and Cognition, vol. 79, no.2, pp. 96-106, 2012
[3] Wheeler M.E., Ploran E.J., “Episodic Memory” Encyclopedia of Neuroscience, pp 1167-1172,
2019
[4] Abraham W. C., Jones O. D. and Glanzman D. L., “Is plasticity of synapses the mechanism of
long-term memory storage?” Science of Learning, vol. 4, no 1., pp. 1-10, 2019.
[5] Munawar K., Kuhn S. K. and Haque S., “Understanding the reminiscence bump: A systematic
review.” Plos One, vol. 13, no. 12, 2018.
[6] Murre J.M.J., Chessa A. G and Meeter M., “A mathematical model of forgetting and amnesia.”
Frontiers in psychology, vol.4, pp.76, 2013
[7] Wang D., Tan A., Miao C. And Moustafa A., “Modelling Autobiographical Memory Loss across
Life Span”, 2019
[8] Sandberg, A., “Bayesian attractor neural network models of memory.” Dissertation,
Stockholm University, 2003.
[9] Hebb, D.O. “The organization of behavior.” New York: Wiley, 1949
[10] Lansner, A. and Holst, A., “A Higher Order Bayesian Neural Network with Spiking Units.”
International Journal of Neural Systems, pp. 115-28, 1996.
[11] Orre, R., Lansner, A., Bate, A. and Lindquist, M., “Bayesian neural networks with confidence
estimations applied to data mining.” Computational Statistics & Data Analysis vol. 34 pp. 473-
493, 2000.
[12] Janssen, S. M. J., Rubin, D. C. And Jacques P. L., “The temporal distribution of autobiographical
memory: changes in reliving and vividness over the life span do not explain the reminiscence
bump”, Memory and Cognition, vol. 39, pp. 1-11, 2011
[13] Meeter, M. and Murre J. M. J., “Tracelink: A model of amnesia and consolidation” Cognitive
Neuropsychology, vol. 22, no. 5, pp. 559-587, 2005
[14] Rubin D., Rahhal, T. and Poon, L., “Things learned in early adulthood are remembered best”
Memory & Cognition, vol. 26, no. 1, pp. 3-19, 1998.
53
[15] Janssen, S., Gralak, A., Murre, J., ”A model for removing the increased recall of recent events
from the temporal distribution of autobiographical memory.” Behavior Research Methods,
vol. 43, no. 4, pp. 916-930
[16] Schrauf R. W. and Rubin D. C. “Effects of voluntary immigration on the distribution of
autobiographical memory over the lifespan.” Applied Cognitive Psychology, vol. 15, no. 7, pp-
S75-S88, 2001.
[17] Rubin D. C., “The Basic-Systems Model of Episodic Memory.” Perspectives on Psychological
Science, vol. 1, no. 4, pp. 277-311, 2016
[18] Janssen S.M.J., Kristo, G., Rouw R. and Murre J.M.J.,”The relation between verbal and
visuospatial memory and autobiographical memory.” Consciousness and Cognition, vol. 31,
pp. 12-23, 2015
[19] M. A., Qi W., Kazunori, H., Shamsul, H., “A Cross-Cultural Investigation of Autobiographical
Memory: On the Universality and Cultural Variation of the Reminiscence Bump.” Journal of
Cross-Cultural Psychology, vol. 36, no. 6, pp. 739-749, 2005.
[20] Holmes A. and Conway M.A., “Generation identity and the reminiscence bump: Memory for
public and private events.” Journal of Adult Development, vol 6, no. 1, pp. 21-34 1999
[21] Berntsen D. and Rubin D.C., “Cultural life scripts structure recall from autobiographical
memory.” Memory & Cognition, vol. 32, no. 3, pp. 427-442, 2004
[22] Berntsen, D. and Rubin, D.C., "Emotionally charged autobiographical memories across the life
span: The recall of happy, sad, traumatic, and involuntary memories". Psychology and Aging,
vol. 17, no. 4, pp. 636–652, 2002
[23] Janssen, S., Chessa, A. and Murre, J., “The reminiscence bump in autobiographical memory:
Effects of age, gender, education, and culture.” Memory, vol. 13, no. 6, pp. 658-668, 2005.
[24] Karrer, T. M., Josef, A. K., Mata, R., Morris, E. D. and Samanez-Larkin, G. R., “Reduced
dopamine receptors and transporters but not synthesis capacity in normal aging adults: a
meta-analysis.” Neurobiology of Aging, vol. 57, pp. 36-46, 2017.
[25] Peters, A., Sethares, C. and Luebke, J. I., “Synapses are lost during aging in the primate
prefrontal cortex.” Neuroscience, vol. 152 no. 4, pp. 970-981, 2018.
[26] Galton, F. “Psychometric experiments.” Brain, vol. 2, pp. 149-162, 1879
[27] Crovitz, H. F., and Schiffman, H. “Frequency of episodic memories as a function of their age.”
Bulletin of the Psychonomic Society, vol. 4, 1974
[28] Rubin, D. C., “One bump, two bumps, three bumps, four? Using retrieval cues to divide one
autobiographical memory reminiscence bump into many.”
Journal of Applied Research in Memory and Cognition, vol. 4, no. 1, pp. 87-89, 2015.
54
[29] Koppel, J. and Rubin, D.C., “Recent Advances in Understanding the Reminiscence Bump: The
Importance of Cues in Guiding Recall From Autobiographical Memory.” Current Directions in
Psychological Science, vol. 25, no. 2, pp. 135-140, 2016
[30] Larsson, M. and Willander, J. “Autobiographical odor memory.” Annals of the New York
Academy of Sciences, vol. 1170, pp. 318-323, 2009.
[31] McCulloch, W. S., and Pitts, W., “A Logical Calculus of the Ideas Immanent in Nervous Activity.”
Bulletin of Mathematical Biophysics, vol 5, pp. 115-133, 1943
[32] Rosenblatt, F., “The Perceptron - a perceiving and recognizing automaton.” Report 85 – 460 -
1, Cornell Aeronautical Laboratory, 1957
[33] Brunel, N. and van Rossum M.C.W., “Lapicque’s 1907 paper: from frogs to integrate-and-fire”
Biol Cybern, vol. 97, pp. 337-339, 2007
[34] L. Hodgkin and A. F. Huxley, “A quantitative description of membrane current and its
application to conduction and excitation in nerve.” Journal of Physiology vol. 119, no. 4, pp.
500-544, 1952
[35] FitzHugh R., “Impulses and Physiological States in Theoretical Models of Nerve Membrane."
Biophysical Journal, vol. 1, no. 6, pp 445-466, 1961
[36] Hopfield, J. J., “Neural Networks and Physical Systems with Emergent Collective
Computational Abilities.” Proceedings of the National Academy of Sciences of the United
States of America, vol. 79, no. 8, pp. 2554-2558, 1982
[37] Lansner, A., “Associative memory models: from the cell-assembly theory to biophysically
detailed cortex simulations.” Trends in Neurosciences, vol. 32 no. 3, pp.178-186
[38] Ruppin E. and Reggia J., “A Neural Model of Memory Impairment in Diffuse cerebral Atrophy”,
British Journal of Psychiatry, vol. 166, pp. 19-28, 1995
[39] Sandberg, A., Lansner, A., Petersson K.M. and Ekeberg, O., “A Bayesian attractor network with
incremental learning.” Network: Computation in Neural Systems, vol. 13, no. 2, pp. 179-194,
2002
[40] Lansner, A., Sandberg, A., Petersson, K. M., & Ingvar, M. “On forgetful attractor network
memories.” Artificial neural networks in medicine and biology: Proceedings of the ANNIMAB-
1 Conference (eds. Malmgren, H., Borga, M. & Niklasson, L.) 54–62 Springer, 2000.
[41] Gluck M. A, and Myers C. E., “Hippocampal mediation of stimulus representation: A
computational theory”, Hippocampus, vol. 3, no.4, pp. 491-516, 1993
[42] Janssen S. M. J., Chessa A. G. and Murre J. M. J., “Modelling the reminiscence bump in
autobiographical memory with the Memory Chain Model”, Constructive Memory, NBU Series
in Cognitive Science, pp. 138-147, 2003
55
[43] Hagena, H., Manahan-Vaughan, D., “Dopamine D1/D5, But not D2/D3, Receptor Dependency
of Synaptic Plasticity at Hippocampal Mossy Fiber Synapses that Is Enabled by Patterned
Afferent Stimulation, or Spatial Learning. ” Frontiers in synaptic neuroscience, vol.8, pp.31,
2016.
[44] Abdulrahman, H., Fletcher, P. C., Bullmore, E., Morcom, A. M., “Dopamine and memory
dedifferentiation in aging.” Neuroimage, vol. 153, pp. 211-220, 2017.
[45] Iatropoulos, G., “Modeling the Development of Synaptic Memory: Implications for
Reminiscence Bumps and Forget Rates.” Ongoing.
www.kth.se
TRITA -EECS-EX-2020:445