neural networks handling sequential patterns
TRANSCRIPT
Information Sciences 159 (2004) 141–154
www.elsevier.com/locate/ins
Neural networks handling sequential patterns
Taiga Yamasaki *, Yoshinori Kataoka 1,Katsuro Kameyama 2, Kaoru Nakano 3
Division of Biophysical Engineering, Department of Systems and Human Science, Graduate School
of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka 560-8531, Japan
Received 24 December 2000
Abstract
In order to model thinking process in human brain, it is necessary to construct neural
network models handling time-varying inputs. Such networks are required to be able to
retain information of their past behaviors. This motivates us to introduce a concept
‘‘stimulus-accumulation-effect.’’ In our models, each artificial neuron accumulates past
stimulus effect until it is excited by the influence of current input as well as the accu-
mulation. This effect makes it possible for the neural networks to scan (recall) all em-
bedded memories sequentially, and to associate temporal sequences (such as melodies)
with corresponding static patterns (their images and names).
� 2003 Elsevier Inc. All rights reserved.
Keywords: Neural network; Sequential pattern; Stimulus-accumulation-effect
* Corresponding author. Address: Department of Communication Engineering, Faculty of
Computer Science and System Engineering, Okayama Prefectural University, 111 Kuboki, Soja,
Okayama 719-1197, Japan. Tel.: +81-866-94-2094; fax: +81-866-94-2199.
E-mail address: [email protected] (T. Yamasaki).1 Current address: Sony Digital Network Applications Inc., 6-7-35 Kitashinagawa Shinaga-
waku, Tokyo 141-0001, Japan.2 Current address: Division of Neurophysiology, Osaka University Graduate School of
Medicine, Suita 565-0871, Japan.3 Current address: Department of Mechatronics, Tokyo University of Technology, 1404-1
Katakura-machi, Hachioji-shi, Tokyo 192-0982, Japan; Research Organization for Information
Science & Technology, 2-2-54 Nakameguro, Meguro-ku, Tokyo 153-0061, Japan.
0020-0255/$ - see front matter � 2003 Elsevier Inc. All rights reserved.
doi:10.1016/j.ins.2003.02.001
142 T. Yamasaki et al. / Information Sciences 159 (2004) 141–154
1. Introduction
The brain interacts with the outer world through effectors (hands, legs, etc.)
and receptors (eyes, ears, etc.) and handles time-varying information. It hasbeen considered that the brain utilizes the information to construct a copy of
the outer world ‘‘world image’’ in itself [4]. To elucidate higher level brain
functions such as motor control, phonic recognition and consciousness as well
as world image, it is important to study neural network models which can
handle sequential information (patterns).
Most of researches in the fields of artificial neural networks have been
dealing with static inputs, whereas the neural receptors receive a bunch of
inputs which vary dynamically with time. In such studies, McCulloch-Pittsmodel neuron and Hebbian rule with their extensions have been a conceptual
basis. Hebbian rule has some capability of temporal information processing
since connections of a network change due to the network’s activity. However,
the connections are usually supposed to be constant for some period, as often
seen in the recalling process of associative memories [10,11]. Outputs of
McCulloch-Pitts model neuron are determined only by current inputs irre-
spective of the input history. Hence, the network consisting of McCulloch-Pitts
model neuron and Hebbian rule has difficulty in handling temporally varyinginformation.
In order to construct the neural network models handling temporal se-
quences of patterns, we think, it is necessary to retain past information in some
form in the network. There have been actually many works [3,5–8,12,13]
treating temporal sequences. In these works, the feedback connections in
networks, the units holding past input with decay, or some special delay ele-
ments are used. In most of these models, the function is restricted to memo-
rizing or recognizing temporal sequences. But, the brain can make a wholeimage of a sequential input such as an image of a musical composition, and
also associate that image with another type of memory such as a scene or a
place where the music was played. Thus, we consider that in the brain a
temporal sequence is memorized both in its original form and in its associated
form. We wish to construct neural networks realizing such brain-like memo-
ries. This motivates the introduction of the concept ‘‘stimulus-accumulation-
effect.’’ Based on this idea, we propose an associative memory network [10,11]
with units holding past input information. This model can memorize sequentialpatterns (namely dynamic patterns), static patterns and their associations.
2. Stimulus-accumulation-effect
In general, model neurons which compose a network are excited by a lot ofcoincident inputs to them. In this process, an output (an excitation of a model
T. Yamasaki et al. / Information Sciences 159 (2004) 141–154 143
neuron) does not always reflect all the information that the inputs carry. Al-
though it has an effect of noise canceling, the system can handle only the co-
incident inputs. How can the brain handle both coincidental and temporally
changing signals?‘‘Stimulus-accumulation-effect’’ is Nakano’s idea––When a value of total
stimuli to one neuron is over its threshold, the neuron is excited and loses its
energy of stimuli. But when it is not excited, input (stimulus) energy is retained.
This idea allows either cases: that the energy decays with time or that not all the
energy is lost when the neuron is excited. (A variation of this idea is [9].) In
other words, a state of each neural unit at a given instant depends on past
inputs as well as its current inputs.
3. The model
We assume that the properties of the stimulus-accumulation unit (the arti-
ficial model neuron) are as follows (see Fig. 1):
1. When the unit is not excited for some inputs, some amount of the total stim-
uli are accumulated.
2. The accumulated stimuli as well as the current input stimuli affect an exci-tation of the unit.
3. This accumulation is cleared (discarded) when the unit gets excited.
Here we discuss a simple model, synchronous and discrete-time associative
memory [10,11] with stimulus-accumulation-effect. Each unit outputs 0 or 1
according to the equation (cf. [1,2])
xiðt þ 1Þ ¼ fXt
r¼0
Xnj¼1
gðrÞwijxjðt
� rÞ � hiðtÞ!; ð1Þ
Fig. 1. Stimulus-accumulation-effect.
144 T. Yamasaki et al. / Information Sciences 159 (2004) 141–154
f ðyÞ ¼ 1 ðy > 0Þ;0 ðy6 0Þ;
�ð2Þ
gðrÞ ¼1 ðr ¼ 0Þ;a ð16 r6 siÞ;0 ðotherwiseÞ;
8<: ð3Þ
where xiðtÞ (i ¼ 1; . . . ; n) is the output of the ith unit, wij is the connectionweight from the jth to the ith unit, hiðtÞ is the threshold of the ith unit, si is thetime elapse from the previous excitation of the ith unit to the present time, and
a (06 a < 1) is the accumulation ratio. Usually xiðtÞ is set to be 0 for t < 0.
The network is divided into three modules, each of which we call a �layer.’Using 0 < l < m6 n, units in each layer are indexed as
X1ðtÞ ¼ ðx1ðtÞ; . . . ; xlðtÞÞ;X2ðtÞ ¼ ðxlþ1ðtÞ; . . . ; xmðtÞÞ;X3ðtÞ ¼ ðxmþ1ðtÞ; . . . ; xnðtÞÞ;XðtÞ ¼ ðX1ðtÞ;X2ðtÞ;X3ðtÞÞT:
ð4Þ
We call the first layer �Layer T,’ the second layer �Layer S’ and the third layer�Layer N.’
We assume that the ‘‘total activity’’ (i.e. the number of the excited units in
each layer) is regulated to be a certain value, say Ka for Layer a, by changing
the threshold hi within the range bounded by hi P hmin (hmin: a positive con-
stant). The threshold hi is taken to be identical in each layer, and its lower limit
hmin is taken to be identical in every layer. The regulation of the total activity
means that the units accepted stronger stimuli in each layer tend to be excited.
If several units receive same strength of input stimuli, the units indexed bysmaller integers (i) are chosen.
The weight matrix W ¼ ðwijÞ is also divided into submatrices as
W ¼WTT WTS WTN
WST WSS WSN
WNT WNS WNN
24
35; ð5Þ
where W ab represents the weight matrix from Layer b to Layer a, for example,
WST ¼wlþ1;1 � � � wlþ1;l
..
. . .. ..
.
wm;1 � � � wm;l
264
375: ð6Þ
In this paper, we suppose that the weight matrix is composed of random
connections and correlative connections made by the modified Hebbian rule(see Fig. 2).
L a y e r T
L a y e r S L a y e r N
Fig. 2. The network.
T. Yamasaki et al. / Information Sciences 159 (2004) 141–154 145
The connections between Layer T and S are generated randomly. The ma-
trix WST ¼ ðwijÞ is composed of fixed (not plastic) random connections made
from uniformly distributed random variables vij 2 ½0; 1 (j ¼ 1; . . . ; l,i ¼ lþ 1; . . . ;m) as
wij ¼cvijPlj¼1 vij
; ð7Þ
where c is a positive constant, and the sum of the strength of all connections
converging on any unit in Layer S from the units in Layer T is defined to be aconstant c, i.e.,
8i 2 flþ 1; lþ 2; . . . ;mg;Xlj¼1
wij ¼ c: ð8Þ
The matrix WTS is supposed to be the transpose matrix of WST.
The connections except WST and WTS are organized through the learning
process in which the learning patterns are presented to the network in turns, forexample, �X1ðtÞ ¼ n1 ! X1ðt þ 1Þ ¼ n2 ! � � �,’ where learning pattern vectors
146 T. Yamasaki et al. / Information Sciences 159 (2004) 141–154
denoted by n, g and f are composed of 0 or 1 and the same size as X1, X2 and
X3, respectively. The connections are reinforced according to the Hebbian rule
as
wijðt þ 1Þ ¼ wijðtÞ þkab
NxiðtÞxjðt þ 1Þ þ 1� kab
Nxiðt þ 1Þxjðt þ 1Þ; ð9Þ
where kab (06 kab 6 1) decides the connection property from Layer b to Layer
a. The organized connections are auto-correlative when k ¼ 0 and cross-
correlative when k ¼ 1. wijð0Þ is set to be 0.
4. Characteristics
Due to the formulation above, when several sequential patterns sharing a
common pattern are memorized through the learning process, one excitation
pattern can be associated with several patterns (unless k ¼ 0). Such an exci-
tation pattern acts as a bifurcate point where several retrieval routes belonging
to different sequential patterns compete with each other for succeeding exci-
tation in the recalling process. In the simplest associative memory such thata ¼ 0 in Eq. (3), however, an excitation pattern followed by a given pattern is
determined definitely, i.e., the transition from the bifurcate point always pro-
ceeds to the most strongly connected route. Hence, the simplest model cannot
handle bifurcate sequences. In the stimulus-accumulation network (a 6¼ 0),
even if a certain route belonging to a sequential pattern was not chosen at the
previous recollection, probability of recalling such a sequence at the coming
recollection is higher than the previous trial because of the stimulus-accumu-
lation in the unexcited units. Thus, it is expected that the stimulus-accumula-tion network can handle bifurcate sequences and moreover it can scan all
through the memorized structure.
4.1. Symbolizing sequences
The brain memorizes two types of objects. One is what temporally transits
as music or thinking process. The other is static such as names of the objects. Inthe brain, these two memory types link each other to generate various func-
tions. Linking many kinds of memories is necessary in order to realize higher
brain functions. In this section, we introduce the conceptual network in which
temporal sequences are used as input patterns and a static pattern symbolizing
an input sequence is produced inside the network. Here we ignore the third
layer mentioned above and discuss the network of two layers: Layer T and
Layer S. The structure of this network is shown in Fig. 3.
Let us exemplify the typical process to produce a pattern referred to as the‘‘symbol pattern’’ on Layer S from an input of a temporal sequence, say
Fig. 3. Symbolizing sequences. Network structure and symbolizing process are illustrated.
T. Yamasaki et al. / Information Sciences 159 (2004) 141–154 147
�n1 ! n2 ! n3 ! n4,’ on Layer T. Layer T receives the input patterns from n1
to n4 in this order. For the input n1, the pattern n1 is excited in Layer T and
transmitted to Layer S through the random connections. We assume spatially
homogeneous inhibitory stimuli all over Layer S (or small connection strength
148 T. Yamasaki et al. / Information Sciences 159 (2004) 141–154
between the two layers) so that no units can be excited in Layer S. Hence,
transmitted stimuli are accumulated in Layer S. The same process repeats for
each input pattern from n2 to n4. The accumulations in Layer S are superim-
posed and reflect all the input patterns. Then, we input spatially homogeneousexcitatory stimuli to all the units on Layer S. They trigger a certain excitation
pattern on Layer S with the help of the competitive effect of the total activity
control. This pattern provides the symbol pattern �g1’ which can be called
a symbol of the whole sequence (see Fig. 3).
Here, we provided the trigger stimuli artificially on Layer S. But the similar
effect can naturally be realized by, for example, periodic activity of another
network or occurrence of an important event for the creature to induce exci-
tations in Layer S.
4.2. Associating sequence with symbol
In the previous section, we discussed the network that produces a static
pattern symbolizing a temporal sequence. Next, we suppose that the symbol-
izing process is also the learning process in which sequential patterns andsymbol patterns are memorized in each layer according to Eq. (9). We set the
parameter kab as
kTT kTS kTN
kST kSS kSN
kNT kNS kNN
24
35 ¼
0:9 RD –
RD 0:1 –
– – –
24
35; ð10Þ
where �RD’ means random connections. Due to this setting, the transitions ofthe sequential patterns are memorized in Layer T and the symbol patterns are
memorized as static patterns in Layer S, through the symbolizing process.
The network was composed of 400 units that were divided into Layer S (200
units) and Layer T (200 units). Memorized patterns in Layer T and S were
three cyclic sequences (e.g. one sequence is nA1 ! nA
2 ! nA3 ! nA
4 ! nA5 !
nA1 ! � � �) and three symbols (e.g. one symbol is gA), respectively (see Fig. 4).
Each pattern composing the sequence was assumed to be a randomly generated
pattern in which 10 units are excited. The total activity in each layer (i.e., KT
and KS) was set to be 10. The lower limit of threshold hmin was set to be 0.01.
The value of a was set to be 0.01 in every unit.
The recalling dynamics of this network are shown in Figs. 5 and 6. Several
values of normalization constant for the random connections were utilized
(c ¼ 0:2 in Fig. 6(a); c ¼ 0:4 in Fig. 5; c ¼ 0:6 in Fig. 6(b)). When the nor-
malization constant was large, association dynamics were unstable (Fig. 6(b)).
When the constant was small, the symbol pattern which is not associated with
the sequence was recalled (Fig. 6(a)). The dynamics when the constant wasappropriate (c ¼ 0:4) were shown in Fig. 5. The sequential pattern
(nA1 ! nA
2 ! � � �) was recalled from the input of corresponding symbol pattern
Fig. 4. Memorized patterns. Three sequences and their symbols were memorized.
T. Yamasaki et al. / Information Sciences 159 (2004) 141–154 149
(gA) in Fig. 5(a). The symbol pattern (gA) was recalled from the input of the
first pattern (nA1 ) of corresponding sequence in Fig. 5(b). That is, in both Figs.
5(a) and (b), when the patterns belonging to the cyclic sequence were recalled inLayer T, concurrently, the symbol pattern corresponding to the sequence was
recalled in Layer S. Moreover, such state changed among A-set (�nA1 !
nA2 ! � � �’ and �gA’), B-set (�nB
1 ! nB2 ! � � �’ and �gB’) and C-set (�nC
1 ! nC2 ! � � �’
and �gC’) over time. These dynamics where the state wanders among the �sets’ ofmemories were due to the accumulations of stimuli on the units.
In this network, the temporal sequence in Layer T recalls the corresponding
symbol in Layer S. Conversely, the symbol pattern recalls the corresponding
sequence. This means that the temporal sequence and the corresponding staticpattern are associated with each other. Additionally, such a state is not static,
but it wanders among memorized patterns over time.
4.3. Memorizing melodies
Considering memorization of music, we realize that we can remember actual
sounds of the music as well as whole image of the music, the name of it, the
composer and the situation at which we have listened to it. Let us simplify such
a phenomenon, and construct a neural network that can memorize sequential
patterns such as �nA1 ! nA
2 ! nA3 ! nA
4 ’ (so to speak, the simplified represen-tation of the melody), symbol patterns such as �gA’ (so to speak, the image of
the melody), and other static patterns such as �fA’ (so to speak, the name of the
melody). To do this, we suppose three layers in this section. Layer T and Layer
S are the same as the previous sections. Layer N is added to the network to
memorize the name of the melody. The structure of the network is shown in
Fig. 7, schematically.
The number of the units was set to be 144 (96 units for Layer T; 24 for Layer
S; 24 for Layer N). Fig. 8(a) shows the memorized patterns in which filledcircles and open circles represent excited units and unexcited units, respectively.
Fig. 5. Recalling dynamics (1): (a) dynamics from the initial pattern nA1 ; (b) dynamics from the
initial pattern gA. c ¼ 0:4 in (a) and (b).
150 T. Yamasaki et al. / Information Sciences 159 (2004) 141–154
Fig. 6. Recalling dynamics (2): dynamics from the initial pattern nA1 with c ¼ 0:2 in (a) and c ¼ 0:6
in (b).
T. Yamasaki et al. / Information Sciences 159 (2004) 141–154 151
Fig. 7. Memorizing melodies.
Fig. 8. Simulation results of memorizing melodies. Filled and open circles represent excited and
unexcited units, respectively. (a) shows memorized patterns, where three sets of associations
(among melody, image and name) are memorized. (b), (c) and (d) show recalled patterns. The
melody and the image were recalled from the name in (b). The image and the name were recalled
from the first note in (c). The name and the melody were recalled from the image in (d).
152 T. Yamasaki et al. / Information Sciences 159 (2004) 141–154
The total activities of each layer were regulated to KT ¼ 8, KS ¼ 6 and KN ¼ 8,
respectively. The lower limit of threshold hmin was set to be 0.05. The accu-
mulation ratio a was set to be 0.1 in every unit. The value of kab was set as
kTT kTS kTN
kST kSS kSN
kNT kNS kNN
24
35 ¼
1:0 RD 0:4RD 0:0 0:40:4 0:4 0:0
24
35: ð11Þ
In the learning process, at first the melody was presented sequentially inLayer T. After that, triggering stimuli, the first note of the melody and the name
T. Yamasaki et al. / Information Sciences 159 (2004) 141–154 153
of the melody were input to Layer S, Layer T and Layer N, respectively. Layer S
was inhibited during this process except for the triggering phase. Through the
process, connections shown in Fig. 7 were organized according to Eq. (9).
The recalling dynamics was as shown in Fig. 8(b)–(c). When we input thename of the melody into the network, after several time steps the sequence of
the melody and the image (symbol) of the melody were recalled (Fig. 8(b)).
When we input of the first note of the melody, the image was recalled, and after
some time the name was recalled (Fig. 8(c)). When we input the image, after
some time the name and the melody were recalled (Fig. 8(d)). Thus, in the
constructed neural network, sequential patterns and static patterns are mem-
orized and they are associated with one another.
5. Conclusion
The constructed neural networks could produce a symbol pattern from atime-varying input of a sequential pattern. They could memorize such a se-
quential pattern and a symbol pattern, where the patterns were mutually as-
sociated. They had the property to recall almost all the memories through the
memorized structure, even if memorized sequences have some bifurcate points.
Moreover, they showed the dynamics wandering, over time, among several
memories that were not explicitly associated with one another.
The purpose of our research is to construct complicated memories like the
brain’s. We introduced the stimulus-accumulation-effect and its applications,and indicated that the constructed networks have several properties like the
brain’s. Such model is only a metaphor of some aspects in the brain, however,
we think that the neuronal modulation of excitability by some neurotrans-
mitters may act as the stimulus-accumulation-effect.
Although the constructed networks seem at a glance to be complicated with
specific structures, the whole network is an associative memory consisting of
the identical units. The whole network consists of several layers. Each layer is
also an associative memory. The differences between layers are whether con-nections are symmetric or asymmetric. The connections between the layers
make a ‘‘coupled associative memory.’’ So, our networks are not specific, and
we think our trial leads to, in some sense, understanding the brain.
Acknowledgements
We express our hearty thanks to Dr. Taishin Nomura (Associate Prof. of
Osaka University) for helpful advices. This work was partially supported bygrant from JSPS for young researcher no. 121862.
154 T. Yamasaki et al. / Information Sciences 159 (2004) 141–154
References
[1] K. Aihara, T. Takabe, M. Toyoda, Chaotic neural networks, Phys. Lett. A 144 (1990) 333–
340.
[2] E.R. Caianiello, Outline of a theory of thought process and thinking machines, J. Theor. Biol.
1 (1961) 204–235.
[3] J.L. Elman, Finding structure in time, Cognitive Sci. 14 (1990) 179–211.
[4] S. Ikeda, K. Nakano, Y. Sakaguchi, A robot organizing purposive behavior by itself, in:
Proceedings of International Joint Conference on Neural Networks (IJCNN’92), vol. I,
Baltimore, 1992, pp. 570–575.
[5] M. Jordan, Serial order: a parallel distributed processing approach, Technical Report ICS
Report 8604, Institute for Cognitive Science, University of California, 1986.
[6] S.C. Kak, Self indexing of neural memories, Phys. Lett. 143 (1990) 293–296.
[7] D. Kleinfeld, Sequential state generation by model neural networks, Proc. Natl. Acad. Sci. 83
(1986) 9469–9473.
[8] D. Kleinfeld, H. Sompolinsky, Associative neural network model for the generation of
temporal patterns, Biophys. J. 54 (1988) 1039–1051.
[9] M. Morita, O. Nagai, I. Nakaguchi, T. Omori, K. Nakano, A neural network model for
recognition and memory of temporal sequence, in: Proceedings of 26th SICE Annual
Conference, 1987, pp. 15–17 (in Japanese).
[10] K. Nakano, Association and its application-study on associative memory, in: Proceedings of
the Conference on Information Theory Institute of Electronics and Communication Engineers
of Japan, IT 69-27, 1969 (in Japanese).
[11] K. Nakano, Association––a model of associative memory, IEEE Trans. SMC SMC-2 3 (1972)
331–338.
[12] H. Sompolinsky, I. Kanter, Temporal association in asymmetric neural networks, Phys. Rev.
Lett. 57 (1986) 2861–2864.
[13] A. Waibel, Moduler construction of time-delay networks for speech recognition, Neural
Comput. 1 (1989) 328–339.