neural networks handling sequential patterns

Information Sciences 159 (2004) 141–154

www.elsevier.com/locate/ins

Neural networks handling sequential patterns

Taiga Yamasaki *, Yoshinori Kataoka 1,Katsuro Kameyama 2, Kaoru Nakano 3

Division of Biophysical Engineering, Department of Systems and Human Science, Graduate School

of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka 560-8531, Japan

Received 24 December 2000

Abstract

In order to model thinking process in human brain, it is necessary to construct neural

network models handling time-varying inputs. Such networks are required to be able to

retain information of their past behaviors. This motivates us to introduce a concept

‘‘stimulus-accumulation-effect.’’ In our models, each artificial neuron accumulates past

stimulus effect until it is excited by the influence of current input as well as the accu-

mulation. This effect makes it possible for the neural networks to scan (recall) all em-

bedded memories sequentially, and to associate temporal sequences (such as melodies)

with corresponding static patterns (their images and names).

� 2003 Elsevier Inc. All rights reserved.

Keywords: Neural network; Sequential pattern; Stimulus-accumulation-effect

* Corresponding author. Address: Department of Communication Engineering, Faculty of

Computer Science and System Engineering, Okayama Prefectural University, 111 Kuboki, Soja,

Okayama 719-1197, Japan. Tel.: +81-866-94-2094; fax: +81-866-94-2199.

E-mail address: [email protected] (T. Yamasaki).1 Current address: Sony Digital Network Applications Inc., 6-7-35 Kitashinagawa Shinaga-

waku, Tokyo 141-0001, Japan.2 Current address: Division of Neurophysiology, Osaka University Graduate School of

Medicine, Suita 565-0871, Japan.3 Current address: Department of Mechatronics, Tokyo University of Technology, 1404-1

Katakura-machi, Hachioji-shi, Tokyo 192-0982, Japan; Research Organization for Information

Science & Technology, 2-2-54 Nakameguro, Meguro-ku, Tokyo 153-0061, Japan.

0020-0255/$ - see front matter � 2003 Elsevier Inc. All rights reserved.

doi:10.1016/j.ins.2003.02.001

mail to: [email protected]

142 T. Yamasaki et al. / Information Sciences 159 (2004) 141–154

1. Introduction

The brain interacts with the outer world through effectors (hands, legs, etc.)

and receptors (eyes, ears, etc.) and handles time-varying information. It hasbeen considered that the brain utilizes the information to construct a copy of

the outer world ‘‘world image’’ in itself [4]. To elucidate higher level brain

functions such as motor control, phonic recognition and consciousness as well

as world image, it is important to study neural network models which can

handle sequential information (patterns).

Most of researches in the fields of artificial neural networks have been

dealing with static inputs, whereas the neural receptors receive a bunch of

inputs which vary dynamically with time. In such studies, McCulloch-Pittsmodel neuron and Hebbian rule with their extensions have been a conceptual

basis. Hebbian rule has some capability of temporal information processing

since connections of a network change due to the network’s activity. However,

the connections are usually supposed to be constant for some period, as often

seen in the recalling process of associative memories [10,11]. Outputs of

McCulloch-Pitts model neuron are determined only by current inputs irre-

spective of the input history. Hence, the network consisting of McCulloch-Pitts

model neuron and Hebbian rule has difficulty in handling temporally varyinginformation.

In order to construct the neural network models handling temporal se-

quences of patterns, we think, it is necessary to retain past information in some

form in the network. There have been actually many works [3,5–8,12,13]

treating temporal sequences. In these works, the feedback connections in

networks, the units holding past input with decay, or some special delay ele-

ments are used. In most of these models, the function is restricted to memo-

rizing or recognizing temporal sequences. But, the brain can make a wholeimage of a sequential input such as an image of a musical composition, and

also associate that image with another type of memory such as a scene or a

place where the music was played. Thus, we consider that in the brain a

temporal sequence is memorized both in its original form and in its associated

form. We wish to construct neural networks realizing such brain-like memo-

ries. This motivates the introduction of the concept ‘‘stimulus-accumulation-

effect.’’ Based on this idea, we propose an associative memory network [10,11]

with units holding past input information. This model can memorize sequentialpatterns (namely dynamic patterns), static patterns and their associations.

2. Stimulus-accumulation-effect

In general, model neurons which compose a network are excited by a lot ofcoincident inputs to them. In this process, an output (an excitation of a model

T. Yamasaki et al. / Information Sciences 159 (2004) 141–154 143

neuron) does not always reflect all the information that the inputs carry. Al-

though it has an effect of noise canceling, the system can handle only the co-

incident inputs. How can the brain handle both coincidental and temporally

changing signals?‘‘Stimulus-accumulation-effect’’ is Nakano’s idea––When a value of total

stimuli to one neuron is over its threshold, the neuron is excited and loses its

energy of stimuli. But when it is not excited, input (stimulus) energy is retained.

This idea allows either cases: that the energy decays with time or that not all the

energy is lost when the neuron is excited. (A variation of this idea is [9].) In

other words, a state of each neural unit at a given instant depends on past

inputs as well as its current inputs.

3. The model

We assume that the properties of the stimulus-accumulation unit (the arti-

ficial model neuron) are as follows (see Fig. 1):

1. When the unit is not excited for some inputs, some amount of the total stim-

uli are accumulated.

2. The accumulated stimuli as well as the current input stimuli affect an exci-tation of the unit.

3. This accumulation is cleared (discarded) when the unit gets excited.

Here we discuss a simple model, synchronous and discrete-time associative

memory [10,11] with stimulus-accumulation-effect. Each unit outputs 0 or 1

according to the equation (cf. [1,2])

xiðt þ 1Þ ¼ fXt

r¼0

Xnj¼1

gðrÞwijxjðt

� rÞ � hiðtÞ!; ð1Þ

Fig. 1. Stimulus-accumulation-effect.


f ðyÞ ¼ 1 ðy > 0Þ;0 ðy6 0Þ;

�ð2Þ

gðrÞ ¼1 ðr ¼ 0Þ;a ð16 r6 siÞ;0 ðotherwiseÞ;

8<: ð3Þ

where xiðtÞ (i ¼ 1; . . . ; n) is the output of the ith unit, wij is the connectionweight from the jth to the ith unit, hiðtÞ is the threshold of the ith unit, si is thetime elapse from the previous excitation of the ith unit to the present time, and

a (06 a < 1) is the accumulation ratio. Usually xiðtÞ is set to be 0 for t < 0.

The network is divided into three modules, each of which we call a �layer.’Using 0 < l < m6 n, units in each layer are indexed as

X1ðtÞ ¼ ðx1ðtÞ; . . . ; xlðtÞÞ;X2ðtÞ ¼ ðxlþ1ðtÞ; . . . ; xmðtÞÞ;X3ðtÞ ¼ ðxmþ1ðtÞ; . . . ; xnðtÞÞ;XðtÞ ¼ ðX1ðtÞ;X2ðtÞ;X3ðtÞÞT:

ð4Þ

We call the first layer �Layer T,’ the second layer �Layer S’ and the third layer�Layer N.’

We assume that the ‘‘total activity’’ (i.e. the number of the excited units in

each layer) is regulated to be a certain value, say Ka for Layer a, by changing

the threshold hi within the range bounded by hi P hmin (hmin: a positive con-

stant). The threshold hi is taken to be identical in each layer, and its lower limit

hmin is taken to be identical in every layer. The regulation of the total activity

means that the units accepted stronger stimuli in each layer tend to be excited.

If several units receive same strength of input stimuli, the units indexed bysmaller integers (i) are chosen.

The weight matrix W ¼ ðwijÞ is also divided into submatrices as

W ¼WTT WTS WTN

WST WSS WSN

WNT WNS WNN

24

35; ð5Þ

where W ab represents the weight matrix from Layer b to Layer a, for example,

WST ¼wlþ1;1 � � � wlþ1;l

..

. . .. ..

.

wm;1 � � � wm;l

264

375: ð6Þ

In this paper, we suppose that the weight matrix is composed of random

connections and correlative connections made by the modified Hebbian rule(see Fig. 2).

L a y e r T

L a y e r S L a y e r N

Fig. 2. The network.


The connections between Layer T and S are generated randomly. The ma-

trix WST ¼ ðwijÞ is composed of fixed (not plastic) random connections made

from uniformly distributed random variables vij 2 ½0; 1 (j ¼ 1; . . . ; l,i ¼ lþ 1; . . . ;m) as

wij ¼cvijPlj¼1 vij

; ð7Þ

where c is a positive constant, and the sum of the strength of all connections

converging on any unit in Layer S from the units in Layer T is defined to be aconstant c, i.e.,

8i 2 flþ 1; lþ 2; . . . ;mg;Xlj¼1

wij ¼ c: ð8Þ

The matrix WTS is supposed to be the transpose matrix of WST.

The connections except WST and WTS are organized through the learning

process in which the learning patterns are presented to the network in turns, forexample, �X1ðtÞ ¼ n1 ! X1ðt þ 1Þ ¼ n2 ! � � �,’ where learning pattern vectors


denoted by n, g and f are composed of 0 or 1 and the same size as X1, X2 and

X3, respectively. The connections are reinforced according to the Hebbian rule

as

wijðt þ 1Þ ¼ wijðtÞ þkab

NxiðtÞxjðt þ 1Þ þ 1� kab

Nxiðt þ 1Þxjðt þ 1Þ; ð9Þ

where kab (06 kab 6 1) decides the connection property from Layer b to Layer

a. The organized connections are auto-correlative when k ¼ 0 and cross-

correlative when k ¼ 1. wijð0Þ is set to be 0.

4. Characteristics

Due to the formulation above, when several sequential patterns sharing a

common pattern are memorized through the learning process, one excitation

pattern can be associated with several patterns (unless k ¼ 0). Such an exci-

tation pattern acts as a bifurcate point where several retrieval routes belonging

to different sequential patterns compete with each other for succeeding exci-

tation in the recalling process. In the simplest associative memory such thata ¼ 0 in Eq. (3), however, an excitation pattern followed by a given pattern is

determined definitely, i.e., the transition from the bifurcate point always pro-

ceeds to the most strongly connected route. Hence, the simplest model cannot

handle bifurcate sequences. In the stimulus-accumulation network (a 6¼ 0),

even if a certain route belonging to a sequential pattern was not chosen at the

previous recollection, probability of recalling such a sequence at the coming

recollection is higher than the previous trial because of the stimulus-accumu-

lation in the unexcited units. Thus, it is expected that the stimulus-accumula-tion network can handle bifurcate sequences and moreover it can scan all

through the memorized structure.

4.1. Symbolizing sequences

The brain memorizes two types of objects. One is what temporally transits

as music or thinking process. The other is static such as names of the objects. Inthe brain, these two memory types link each other to generate various func-

tions. Linking many kinds of memories is necessary in order to realize higher

brain functions. In this section, we introduce the conceptual network in which

temporal sequences are used as input patterns and a static pattern symbolizing

an input sequence is produced inside the network. Here we ignore the third

layer mentioned above and discuss the network of two layers: Layer T and

Layer S. The structure of this network is shown in Fig. 3.

Let us exemplify the typical process to produce a pattern referred to as the‘‘symbol pattern’’ on Layer S from an input of a temporal sequence, say

Fig. 3. Symbolizing sequences. Network structure and symbolizing process are illustrated.


�n1 ! n2 ! n3 ! n4,’ on Layer T. Layer T receives the input patterns from n1

to n4 in this order. For the input n1, the pattern n1 is excited in Layer T and

transmitted to Layer S through the random connections. We assume spatially

homogeneous inhibitory stimuli all over Layer S (or small connection strength


between the two layers) so that no units can be excited in Layer S. Hence,

transmitted stimuli are accumulated in Layer S. The same process repeats for

each input pattern from n2 to n4. The accumulations in Layer S are superim-

posed and reflect all the input patterns. Then, we input spatially homogeneousexcitatory stimuli to all the units on Layer S. They trigger a certain excitation

pattern on Layer S with the help of the competitive effect of the total activity

control. This pattern provides the symbol pattern �g1’ which can be called

a symbol of the whole sequence (see Fig. 3).

Here, we provided the trigger stimuli artificially on Layer S. But the similar

effect can naturally be realized by, for example, periodic activity of another

network or occurrence of an important event for the creature to induce exci-

tations in Layer S.

4.2. Associating sequence with symbol

In the previous section, we discussed the network that produces a static

pattern symbolizing a temporal sequence. Next, we suppose that the symbol-

izing process is also the learning process in which sequential patterns andsymbol patterns are memorized in each layer according to Eq. (9). We set the

parameter kab as

kTT kTS kTN

kST kSS kSN

kNT kNS kNN

24

35 ¼

0:9 RD –

RD 0:1 –

– – –

24

35; ð10Þ

where �RD’ means random connections. Due to this setting, the transitions ofthe sequential patterns are memorized in Layer T and the symbol patterns are

memorized as static patterns in Layer S, through the symbolizing process.

The network was composed of 400 units that were divided into Layer S (200

units) and Layer T (200 units). Memorized patterns in Layer T and S were

three cyclic sequences (e.g. one sequence is nA1 ! nA

2 ! nA3 ! nA

4 ! nA5 !

nA1 ! � � �) and three symbols (e.g. one symbol is gA), respectively (see Fig. 4).

Each pattern composing the sequence was assumed to be a randomly generated

pattern in which 10 units are excited. The total activity in each layer (i.e., KT

and KS) was set to be 10. The lower limit of threshold hmin was set to be 0.01.

The value of a was set to be 0.01 in every unit.

The recalling dynamics of this network are shown in Figs. 5 and 6. Several

values of normalization constant for the random connections were utilized

(c ¼ 0:2 in Fig. 6(a); c ¼ 0:4 in Fig. 5; c ¼ 0:6 in Fig. 6(b)). When the nor-

malization constant was large, association dynamics were unstable (Fig. 6(b)).

When the constant was small, the symbol pattern which is not associated with

the sequence was recalled (Fig. 6(a)). The dynamics when the constant wasappropriate (c ¼ 0:4) were shown in Fig. 5. The sequential pattern

(nA1 ! nA

2 ! � � �) was recalled from the input of corresponding symbol pattern

Fig. 4. Memorized patterns. Three sequences and their symbols were memorized.


(gA) in Fig. 5(a). The symbol pattern (gA) was recalled from the input of the

first pattern (nA1 ) of corresponding sequence in Fig. 5(b). That is, in both Figs.

5(a) and (b), when the patterns belonging to the cyclic sequence were recalled inLayer T, concurrently, the symbol pattern corresponding to the sequence was

recalled in Layer S. Moreover, such state changed among A-set (�nA1 !

nA2 ! � � �’ and �gA’), B-set (�nB

1 ! nB2 ! � � �’ and �gB’) and C-set (�nC

1 ! nC2 ! � � �’

and �gC’) over time. These dynamics where the state wanders among the �sets’ ofmemories were due to the accumulations of stimuli on the units.

In this network, the temporal sequence in Layer T recalls the corresponding

symbol in Layer S. Conversely, the symbol pattern recalls the corresponding

sequence. This means that the temporal sequence and the corresponding staticpattern are associated with each other. Additionally, such a state is not static,

but it wanders among memorized patterns over time.

4.3. Memorizing melodies

Considering memorization of music, we realize that we can remember actual

sounds of the music as well as whole image of the music, the name of it, the

composer and the situation at which we have listened to it. Let us simplify such

a phenomenon, and construct a neural network that can memorize sequential

patterns such as �nA1 ! nA

2 ! nA3 ! nA

4 ’ (so to speak, the simplified represen-tation of the melody), symbol patterns such as �gA’ (so to speak, the image of

the melody), and other static patterns such as �fA’ (so to speak, the name of the

melody). To do this, we suppose three layers in this section. Layer T and Layer

S are the same as the previous sections. Layer N is added to the network to

memorize the name of the melody. The structure of the network is shown in

Fig. 7, schematically.

The number of the units was set to be 144 (96 units for Layer T; 24 for Layer

S; 24 for Layer N). Fig. 8(a) shows the memorized patterns in which filledcircles and open circles represent excited units and unexcited units, respectively.

Fig. 5. Recalling dynamics (1): (a) dynamics from the initial pattern nA1 ; (b) dynamics from the

initial pattern gA. c ¼ 0:4 in (a) and (b).


Fig. 6. Recalling dynamics (2): dynamics from the initial pattern nA1 with c ¼ 0:2 in (a) and c ¼ 0:6

in (b).


Fig. 7. Memorizing melodies.

Fig. 8. Simulation results of memorizing melodies. Filled and open circles represent excited and

unexcited units, respectively. (a) shows memorized patterns, where three sets of associations

(among melody, image and name) are memorized. (b), (c) and (d) show recalled patterns. The

melody and the image were recalled from the name in (b). The image and the name were recalled

from the first note in (c). The name and the melody were recalled from the image in (d).


The total activities of each layer were regulated to KT ¼ 8, KS ¼ 6 and KN ¼ 8,

respectively. The lower limit of threshold hmin was set to be 0.05. The accu-

mulation ratio a was set to be 0.1 in every unit. The value of kab was set as

kTT kTS kTN

kST kSS kSN

kNT kNS kNN

24

35 ¼

1:0 RD 0:4RD 0:0 0:40:4 0:4 0:0

24

35: ð11Þ

In the learning process, at first the melody was presented sequentially inLayer T. After that, triggering stimuli, the first note of the melody and the name


of the melody were input to Layer S, Layer T and Layer N, respectively. Layer S

was inhibited during this process except for the triggering phase. Through the

process, connections shown in Fig. 7 were organized according to Eq. (9).

The recalling dynamics was as shown in Fig. 8(b)–(c). When we input thename of the melody into the network, after several time steps the sequence of

the melody and the image (symbol) of the melody were recalled (Fig. 8(b)).

When we input of the first note of the melody, the image was recalled, and after

some time the name was recalled (Fig. 8(c)). When we input the image, after

some time the name and the melody were recalled (Fig. 8(d)). Thus, in the

constructed neural network, sequential patterns and static patterns are mem-

orized and they are associated with one another.

5. Conclusion

The constructed neural networks could produce a symbol pattern from atime-varying input of a sequential pattern. They could memorize such a se-

quential pattern and a symbol pattern, where the patterns were mutually as-

sociated. They had the property to recall almost all the memories through the

memorized structure, even if memorized sequences have some bifurcate points.

Moreover, they showed the dynamics wandering, over time, among several

memories that were not explicitly associated with one another.

The purpose of our research is to construct complicated memories like the

brain’s. We introduced the stimulus-accumulation-effect and its applications,and indicated that the constructed networks have several properties like the

brain’s. Such model is only a metaphor of some aspects in the brain, however,

we think that the neuronal modulation of excitability by some neurotrans-

mitters may act as the stimulus-accumulation-effect.

Although the constructed networks seem at a glance to be complicated with

specific structures, the whole network is an associative memory consisting of

the identical units. The whole network consists of several layers. Each layer is

also an associative memory. The differences between layers are whether con-nections are symmetric or asymmetric. The connections between the layers

make a ‘‘coupled associative memory.’’ So, our networks are not specific, and

we think our trial leads to, in some sense, understanding the brain.

Acknowledgements

We express our hearty thanks to Dr. Taishin Nomura (Associate Prof. of

Osaka University) for helpful advices. This work was partially supported bygrant from JSPS for young researcher no. 121862.


References

[1] K. Aihara, T. Takabe, M. Toyoda, Chaotic neural networks, Phys. Lett. A 144 (1990) 333–

340.

[2] E.R. Caianiello, Outline of a theory of thought process and thinking machines, J. Theor. Biol.

1 (1961) 204–235.

[3] J.L. Elman, Finding structure in time, Cognitive Sci. 14 (1990) 179–211.

[4] S. Ikeda, K. Nakano, Y. Sakaguchi, A robot organizing purposive behavior by itself, in:

Proceedings of International Joint Conference on Neural Networks (IJCNN’92), vol. I,

Baltimore, 1992, pp. 570–575.

[5] M. Jordan, Serial order: a parallel distributed processing approach, Technical Report ICS

Report 8604, Institute for Cognitive Science, University of California, 1986.

[6] S.C. Kak, Self indexing of neural memories, Phys. Lett. 143 (1990) 293–296.

[7] D. Kleinfeld, Sequential state generation by model neural networks, Proc. Natl. Acad. Sci. 83

(1986) 9469–9473.

[8] D. Kleinfeld, H. Sompolinsky, Associative neural network model for the generation of

temporal patterns, Biophys. J. 54 (1988) 1039–1051.

[9] M. Morita, O. Nagai, I. Nakaguchi, T. Omori, K. Nakano, A neural network model for

recognition and memory of temporal sequence, in: Proceedings of 26th SICE Annual

Conference, 1987, pp. 15–17 (in Japanese).

[10] K. Nakano, Association and its application-study on associative memory, in: Proceedings of

the Conference on Information Theory Institute of Electronics and Communication Engineers

of Japan, IT 69-27, 1969 (in Japanese).

[11] K. Nakano, Association––a model of associative memory, IEEE Trans. SMC SMC-2 3 (1972)

331–338.

[12] H. Sompolinsky, I. Kanter, Temporal association in asymmetric neural networks, Phys. Rev.

Lett. 57 (1986) 2861–2864.

[13] A. Waibel, Moduler construction of time-delay networks for speech recognition, Neural

Comput. 1 (1989) 328–339.

neural networks handling sequential patterns

Documents