a/prof. andrew p. paplinski, school of information technology...

A/Prof. Andrew P. Paplinski, School of Information Technology, [email protected], bldg. 63, G13

Computational Neuroscience in action

1 Introductory Concepts 11.1 Computational Neuroscience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Modelling objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Integration of letters and phonemes in the human brain — fMRI results . . . . . . . . . . . 3

2 Bottom-up: from an artificial neuron to Self-Organizing Maps/Moduls (SOMs) 52.1 A Concept Neuron — A simplistic model of a biological neuron . . . . . . . . . . . . . . . 52.2 Forming simple neural networks — Lateral inhibition . . . . . . . . . . . . . . . . . . . . 72.3 Representation of percepts/stimuli as multidimensional points/vectors . . . . . . . . . . . . 8

3 Structure of Self-Organizing Maps/Modules 93.1 Similarity layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2 General structure of a Self-Organizing Map/Module . . . . . . . . . . . . . . . . . . . . . 103.3 Example of a Self-Organizing Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.4 Learning Algorithm for Self-Organizing Maps . . . . . . . . . . . . . . . . . . . . . . . . 123.5 Example of a Self-Organizing Feature Map formation . . . . . . . . . . . . . . . . . . . . 133.6 The Multimodal Self-Organizing Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 153.7 A Simple Feedforward Multimodal Self-Organizing Network . . . . . . . . . . . . . . . . 163.8 Surfaces of post-synaptic activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.9 The unimodal visual map for letters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.10 The auditory map for phonemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.11 The bimodal map integrating phonemes and letters . . . . . . . . . . . . . . . . . . . . . . 203.12 Robustness of the bimodal percepts against unimodal disturbances . . . . . . . . . . . . . . 213.13 Feedback and its significance for auditory perception . . . . . . . . . . . . . . . . . . . . . 233.14 Re-coded Phoneme Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.15 Robustness of the re-coded phoneme map . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Comp.NeuroSci — L. 1 April 23, 2007

1 Introductory Concepts

1.1 Computational Neuroscience

• Computational Neuroscience is a name given to the effort of modelling the brain using formalmathematical and computational tools

• We can formulate models on different levels that can describe

– structures and/or

– functions

of neuronal cells, its components an/or aggregates, e.g, a synapse, dendrite, neuron, columns ofneurons, cortical areas, and so on.

• The model can use differential equation like the Hodgkin-Huxley model, or describe interconnectionof abstracted neurons (artificial neural networks or connectionist models).

• We can describe

– low level functions, like a generation of action potential, or

– high level perceptual or cognitive functions like integration of multi-modal stimuli, or attention inautism.

• Such models, if good, are then incorporated in other areas of neuroscience or psychology.

A.P. Paplinski 1


1.2 Modelling objectives

We present a model of integration of auditory and visual information as in the human cortex.

• In essence, such an integration will be described as a mapping of multi-dimensional sensory stimuli ontwo-dimensional cortical areas.

• The model is based on Multimodal Self-organizing Networks (MuSoNs) also known as ArtificialCortical Networks (ACNs)

• Such networks are formed from interconnected Self-Organizing Modules/Maps (SOMs) that, in turn,are composed of artificial neurons

• We consider

– structure andthe learning procedure

of Self-Organizing Networks.

• We concentrate on a network that forms bi-modal representation of phonemes and letters.

A.P. Paplinski 2


1.3 Integration of letters and phonemes in the human brain — fMRI results

• Bimodal and more generally multimodal integration of sensory information is important for perceptionsince many phenomena have qualities in more than one modality.

• Multimodal integration has been studied extensively, for a comprehensive review, see:

(Eds:) Gemma Calvert, Charles Spence, and Barry E. Stein, The handbook of multisensory processes, MIT Press,

Cambridge, MA, 1st ed., 2004.

• Recently, functional magnetic resonance imaging (fMRI) has made non-invasive studies of multimodalidentification and recognition by human subjects increasingly tractable and important, see:

A. Amedi, K. von Kriegstein, N. M. van Atteveldt, M. S. Beauchamp, and M. J. Naumer, “Functional imaging of human

crossmodal identification and object recognition,” Exp. Brain Res., vol. 166, pp. 559–571, 2005.

• After: N. van Atteveldt, E. Formisano, R. Goebel, and L. Blomert, “Integration of letters and speech sounds in the human

brain,” Neuron, vol. 43, pp. 271–282, July 2004.:

Nowadays most written languages are speech-based alphabetic scripts, in which speech soundunits (phonemes) are represented by visual symbols (letters, or graphemes). Learning thecorrespondences between letters and speech sounds of a language is therefore a crucial step inreading acquisition, . . .

A.P. Paplinski 3


From: N. van Atteveldt, E. Formisano, R. Goebel, and L. Blomert, “Integration of letters and speech sounds in the human brain,”

Neuron, vol. 43, pp. 271–282, July 2004.

Figure 5. Schematic Summary of Our Findings on theIntegration of Letters and Speech SoundsMultisubject statistical maps are superimposed on the foldedcortical sheet of the individual brain used as target brain forthe cortex-based intersubject alignment:unimodal visual > unimodal auditory (red), unimodal auditory> unimodal visual (light blue), congruent > incongruent (darkblue), and the conjunction of bimodal > unimodal andunimodal > baseline (C4, violet).A schematic description of our findings and interpretationbased on the functional neuroanatomical data is shown in thelower panel.Dashed lines indicate that aSTS/PP may either receive inputdirectly from the modality-specific regions or from theheteromodal STS/STG.(a)STS, (anterior) superior temporal sulcus;STG, superior temporal gyrus; PP, planum polare;/a/, speech sound a; ”a,” letter a.

Neuron278

Wright et al., 2003). Because of the intrinsic nature ofthe BOLD response and its limited dynamic range, it ispossible that a superadditive change in the response atthe neuronal level is not reflected in a similarly super-additive change in the BOLD response. Therefore, su-peradditivity is more likely to be found when the re-sponse to one or both of the unimodal conditions isweak or absent (or even negative) than when the BOLDresponse to both modalities is separately already strong.This prediction is confirmed by the findings of Wrightet al. (2003) and by the different response patterns wefind in the anterior temporal cortex, PT/HS, and the STS/STG (see Figures 2B and 3). In the PT/HS, the very weakresponse to unimodal visual stimuli leaves opportunityfor congruent bimodal stimuli to evoke a superadditiveBOLD response. In the anterior temporal cortex, neitherunimodal condition evokes a significant response, sothe response evoked by the congruent bimodal condi-tion exceeds the sum of the unimodal responses even

Figure 5. Schematic Summary of Our Findings on the Integration more extremely. Conversely, the response to both uni-of Letters and Speech Sounds

modal conditions is considerable in the STS/STG, pre-Multisubject statistical maps are superimposed on the folded corti-

venting the increased response to bimodal congruentcal sheet of the individual brain used as target brain for the cortex-stimulation from reaching superadditivity. These obser-based intersubject alignment: unimodal visual � unimodal auditoryvations clearly indicate the importance of also inspecting(red), unimodal auditory � unimodal visual (light blue), congruent �

incongruent (dark blue), and the conjunction of bimodal � unimodal the unimodal responses when interpreting interactionand unimodal � baseline (C4, violet). A schematic description of our effects. Because of the difficulties in interpreting interac-findings and interpretation based on the functional neuroanatomical tions measured with fMRI, we decided to use the con-data is shown in the lower panel. Dashed lines indicate that aSTS/ junction analyses to determine cross-modal integrationPP may either receive input directly from the modality-specific re-

at a map level. A similar approach for inferring cross-gions or from the heteromodal STS/STG. (a)STS, (anterior) superiormodal integration from fMRI data has been used bytemporal sulcus; STG, superior temporal gyrus; PP, planum polare;another recent fMRI study (Beauchamp et al., 2004)./a/, speech sound a; “a,” letter a.Based on the combination of our conjunction and con-gruency analyses we suggest that integration of lettersgesting that the influence of letters on the processingand speech sounds is performed in heteromodal regionsof speech sounds in early auditory cortex is indirect andin STS/STG and that the outcome of this integrationmay reflect an outcome of prior integration in heteromo-process is projected back to the putative “modality-

dal STS/STG rather than integration itself.specific” auditory regions in HS and PT and modulates

It should be noted that multisensory integration atthe processing of speech sounds.

the neuronal level is usually inferred on the basis of anThe idea of interplay between modality-specific and

interaction test [AV � (A � V)] (see e.g., Wallace et al., heteromodal brain areas during cross-modal integration1996). A similar method has recently been used to detect is supported by other fMRI findings (Bushara et al., 2003;integration from EEG (e.g., Fort et al., 2002), MEG (e.g., Macaluso et al., 2000; Calvert et al., 1999, 2000). TheRaij et al., 2000), and fMRI data (Calvert et al., 2000, cross-modal modulation effect we observe in phoneme-2001; Wright et al., 2003). Following this criterion, posi- processing regions in the auditory cortex is consistenttive interaction is determined by superadditivity [AV � with the suggestion that integration sites in STS/STG(A � V)] and negative interaction by subadditivity [AV � modulate processing in sensory cortices by feedback(A � V)] or response suppression (AV � [max (A, V)]), projections during audiovisual speech perception (Cal-where [max (A, V)] indicates the largest of the unimodal vert et al., 1999, 2000). We think that the enhancementresponses Calvert et al., 2001). We tested for interaction (suppression) of activity that is observed in these regionseffects using the b values for the four predictors esti- in the case of congruent (incongruent) visual presenta-mated by single-subject GLM analyses in selected ROIs tion of letters is a consequence of the presence of lan-in STS/STG and PT/HS (Figure 3). Our results show that guage-related audiovisual functional connections de-in left and right PT/HS, the b value of the congruent veloped during reading acquisition and daily used.bimodal predictor is superadditive and the b value of Whether and where in the brain similar effects are pres-the incongruent bimodal predictor shows response sup- ent for other audiovisual pairs is an interesting empiricalpression. In the left and right STS/STG, the b value of question that deserves to be investigated in future stud-the congruent bimodal predictor is significantly higher ies. In sum, our findings indicate that the integration ofthan either unimodal b value but does not reach super- letters and speech sounds is performed by a neuraladditivity. Furthermore, the b value of the incongruent mechanism similar to the mechanism for the integrationbimodal predictor is subadditive but does not show re- of speech with lip movements (or possibly an even moresponse suppression. general audiovisual integration system). This similarity

However, results from interaction analyses performed is crucial and profound, since it supports the hypothesison fMRI data should be interpreted with caution (Beau- that the efficient use of letter-sound associations is pos-

sible because the brain can make use of a naturallychamp et al., 2004; Calvert, 2001; King and Calvert, 2001;

• The model presented below is based on the following papers

– L. Gustafsson and A. P. Paplinski, “Bimodal integration of phonemes and letters: an application of multimodal self-organizing networks,” inProc. Int. Joint Conf. Neural Networks, Vancouver, July 2006, pp. 201–208.

– A. P. Paplinski and L. Gustafsson, “Feedback in multimodal self-organizing networks enhances perception of corrupted stimuli,” in Lect.Notes in Artif. Intell., vol. 4304. Springer, 2006, pp. 19–28.

A.P. Paplinski 4


2 Bottom-up: from an artificial neuron to Self-Organizing Maps/Moduls (SOMs)

2.1 A Concept Neuron — A simplistic model of a biological neuronConceptual abstraction of biological neurons:

Afferent Signals,

Excitatory SynapseInhibitory SynapseDendrite

Post−synaptic activityCell Body

Axon

Efferent Signal(Action potential)

x

jy

y j

t

t

xi

Basic characteristics of a biological neuron for modelling:• Information is coded in a form of instantaneous frequency of (action potential) pulses

• Synapses are either excitatory or inhibitory

• Signals are aggregated (“summed”) when travel along dendritic trees forming the post-synaptic activity

• The cell body generates at the axon the output pulse train of an average frequency proportional to thetotal (aggregated) post-synaptic activity.

A.P. Paplinski 5


The conceptual neuron of the above is still too complicated for most of the models and is furthersimplified to

• a single dendrite structure

• operating with abstract values rather then with pulses of varying frequency:

. . .

. . .

x1

y

w1

= [

= [ ]

] Tx

w

xx2

w2 wσ

n

nsynapsedendrite

afferent signals

weights − synaptic strenght

activity

Dendritic representation: Block−diagram representation:

post−synaptic

d yx w σnd

• A single artificial neuron consists of n synapses arranged along a linear dendrite.

• The afferent signals xi, arrive at synapses and are aggregated by the summing dendrite to form thepostsynaptic activity d.

• Each synapse is characterised by one parameter, the weight, or synaptic strength wi,

• The aggregation of signals is expressed as a linear combinationor an inner product of afferent signals and synaptic weights: d = w1 · x1 + · · · + wn · xn = w · x

• The post-synaptic activity, d, is passed through a limiting activation function,σ(·), which generates the axonal efferent signal y

y = σ(d)

A.P. Paplinski 6


2.2 Forming simple neural networks — Lateral inhibition

1-D lateral inhibition —(limulus vision model):

0k−2

uk−1

uk

uk+1

uk+2

w1w1 w0

yk−2

yk−1

yk

yk+1

yk+2

w1w1 w0

w1w1 w0

w1w1 w0

w1w1 w0

R

R

R

R

Rσ

σ

σ

σ

σ

0

u

In the simple neural network that models lateral inhibition as in limulus vision (seehttp://www.csse.monash.edu.au/courseware/cse2330/2006/Lnts/L05D.pdf for details )

• Light intensity signals arrive from a number of visual receptors marked R to synapses of neurons,

• Neuronal axons form the local feedback excitatory connection, and inhibit the neighbouring neurons(lateral inhibition)

• Such connections form a spatial high-pass filter enhancing edges of the image received from thereceptors.

A.P. Paplinski 7


2.3 Representation of percepts/stimuli as multidimensional points/vectors

• Perceptual objects can be represented as points/vectors in multidimensional space.

• Consider an example of categorization of animals.

• The following 18 features are chosen to characterize each animal:

x1 log(weight)x2 ∈ {1, 2, 3} food (herbivores, omnivores, carnivores)

x3, x4, x5, x6 ∈ {0, 1} locomotion (fins, wings, two legs, four legs)x7 ∈ {0, 1} equipped with hooves (perissodactyls) or

cloven hooves (artiodactyls)x8 ∈ {0, 1} equipped with clawsx9 ∈ {0, 1} equipped with other feet

x10 ∈ {1, 2, 3} cover (fur, feathers and scales)x11 ∈ {0, 1} colour blackx12 ∈ {0, 1} colour whitex13 ∈ {0, 1} even colouredx14 ∈ {0, 1} spotted

x15 ∈ {0, 1/4} stripedx16 ∈ {2, 4} facial feature (short faced, long faced)

x17 ∈ {1, 2, 3} aquaticx18 ∈ {1, 2, 3} social behaviour (single living, pair living,

group living).

• Therefore, each animal can be perceivedas a point in 18-dimensional space:

x = [x1, x2, . . . , x18]

• The animal kingdom will be representedby a number of points in such a space

• The 18 signals/number form the featurevector that is applied as afferent signalsto our neural networks

Details in: A. P. Paplinski and L. Gustafsson, “Multimodal Feedforward Self-Organizing Networks,” in Lect. Notes in Comp.

Sci., vol. 3801. Springer, 2005, pp. 81–88.

A.P. Paplinski 8


3 Structure of Self-Organizing Maps/Modules

Self-Organizing Maps/Modules (SOMs) also known as Kohonen maps or topographic maps were firstintroduced by von der Malsburg (1973) and in its present form by Kohonen (1982).

3.1 Similarity layer

x ]

w w w

yw w w

y1

w w

y

2

m

w

= [ 1

11

21

m1

12

m2

2

m

1

x

w

w

w

2

m

1

x2 x

..

1:

2:

m:

w

w

w

.

T

W = ...

. . .

22

σ

σ

σ

xm

σm

yW

p

1n

2n

mn

d

d

d

d

n

Dendritic graph

.

Block−diagram

.....

......

......

. . .

. . .

. . .

• In self-organizing maps/modules, the synaptic weightvector, wi: associated with each neuron approximatessome aspects of feature vectors/stimuli x applied toits synapses

• It can be shown that the post-synaptic activity di is ameasure of similarity between the synaptic weightvector wi: and the applied stimulus x

• the more similar the vectors, the larger postsynapticactivity di.

A.P. Paplinski 9


3.2 General structure of a Self-Organizing Map/Module

A Self-Organizing Map/Module (SOM) consists of three main parts/layers:

V(k)

m

l

lmx(k)

nd(k) y(k)

n

v(k)

v(k)(similarity) (competition)

WTA(position)

W V

SOMWx

n — dimensionality of the stimulil = 2 — dimensionality of the neuronal gridm — the total number of neuronsW — m× n weight matrixV — m× l position matrix

• A similarity layer that measure the similarity of the current stimulus to each synaptic weight vector.

• A competition layer that identifies the winning neuron, the one that has the synaptic weight vectormost similar to the current stimulus

• Positional information. Each neuron is located with respect to each other on a two-dimensional gridknown as the feature space. Alternatively, each neuron can store its position relative to other neuronson the grid.

• Mathematically we say that for a given matrix of neuronal weights, W , the network maps kthstimulus x(k) onto a position in the neuronal grid of the winning neuron v(k), that is, the neuronwith the weight vector most similar to the stimulus.

v(k) = g(x(k); W ) ; v ∈ Rl (1)

In other words, for each stimulus, one neuron, the winner of the competition, is most activated.

A.P. Paplinski 10


3.3 Example of a Self-Organizing Map

Consider a Self-Organizing Map/Module consisting of m = 12 neurons located on a 2-D 3× 4 = 12

neuronal grid:

similarity−measure

x3

y2,2

y1,1

y1,2

y2,1

y3,4

d1

d2

d3

d4

d5

d6

d7

d8

d9

d10

d11

d12

x2x1

Winner−Takes−All

W M V

2−D lattice of neurons

• (n = 3)-dimensional stimuli

• 3-D synaptic weight vectorfor each neuron

• Positions of each neuron inthe grid can be specified bya 2-D position vector

• The postsynaptic activity in the similarity layer indicates the distance of the current stimulus from theweight vector

• The competition layer is based on the lateral inhibition an local self-excitatory connections andenhances the response of the winning neuron suppressing suppressing activities of other neurons.

A.P. Paplinski 11


3.4 Learning Algorithm for Self-Organizing Maps

• The objective of the learning algorithm for a SOM is to capture in the synaptic weights essentialcharacteristics of the stimuli

• The learning algorithm consists of two basic steps, namely, competition and cooperation betweenneurons of the output grid.

Competition: the current stimulus x(k) is compared with each weight vector wj(k) and thewinning neuron j(k) and its position on the neuronal grid is established.

Cooperation: The winning neurons and its neighbours have their weight vectors modified pullingthem in the direction of the current stimulus.

∆W = η(k) · Λ(j, i) · (xT (k)−wj(k))

with the strength Λ(j, i) related to their distance ρ(j, i) from the winning neuron.

• The neighbourhood function, Λ(j, i), is usually a Gausssianfunction:

• The neighbourhood function should initially include all neurons andshrink when the learning progresses so that finally the neighbourhoodfunction includes only the winning neuron.

• The learning gain η is gradually reduced to stabilize the algorithm.−10

−50

510

−10−5

05

100

0.2

0.4

0.6

0.8

1

10 × 10 neuronal grid

2−D Gaussian neighbourhood

A.P. Paplinski 12


3.5 Example of a Self-Organizing Feature Map formation

• In this example we consider two-dimensional stimuligenerated by two sources marked by by o and +respectively.

• We use three groups of data each containing 10 stimuli(i.e. points in a two-dimensional space) in each source,sixty points all together.

• The sources can be thought of as producing, e.g., twodialects of a very limited protolanguage, each with threeprotophonemes.

Training stimuli:

0.5 1 1.5 2 2.50.5

1

1.5

2

2.5Stimuli: 2 sources, 3 classes, sixty points

x1

x 2

• We can imagine that a source is a representation of a parent of a child pronouncing three phonemes inten slightly different ways.

• The parallel is far from perfect but might be helpful for a conceptual understanding of the mapformation and interpretation.

• Real sensory stimuli, like the phonemes of speech are of course larger in number and dimension.

A.P. Paplinski 13


• Let our Self-Organizing Map/Module have 9 neurons organized on a 3× 3 grid

• The 3× 3 neuronal grid is to approximate 60 stimuli organised in three pairs of clusters.

• Initially, the 2-D weight vectors marked with ‘*’ in the plot, are assigned random values.

• During each learning step, the weight vector closest to the stimulus is pulled towards the stimulustogether with its closed neighbours.

• The resulting feature map assigns nine neurons tothree pairs of stimuli clusters (60 stimuli),

• Note that neurons are assigned either to data clusters,orto groups of clusters to have the best approximationof data distribution.

• Three neurons in the centre of the grid seem to be notassigned to any group.

0.5 1 1.5 2 2.50.5

1

1.5

2

2.5

x1

x 2

3 x 3 map, 100 epochs p3d29h1239

• Note that the synaptic weight vectors form an elastic net the is shaped to approximate the stimulidistribution.

• Finally observe that if the number of neurons is significantly higher than the number of stimuli, theneach stimuli will be assigned to a patch of neurons.

A.P. Paplinski 14


3.6 The Multimodal Self-Organizing Networks

• A Multimodal Self-Organizing Network (MuSON) is built from several Kohonen Self-OrganizingModuls (SOMs).

• For the purpose of modelling integration of phonemes and letters as in the cortex we consider atwo-level feedforward structure and a three-level feedback network.

Top

−D

own

Fee

dbac

k

SOMlt SOMph

phy

SOMbm

ybm

nltltx

nphphx

lty

SOMlt SOMph

phy

SOMrph

rphy

SOMbm

ybm

nltltx

nphphx

lty3

v,d

x

W V

3

x

v,dW V

W v,d V

x

3

W v,d V

x3

3

v,d

x

W V

3

x

v,dW V

3

W v,d V

x

3 • Bimodal integration takes place inSOMbm module.

• Re-coded phoneme module SOMlt

provides better interpretation ofnoisy stimuli thanks to inclusion ofthe visual information

• The sensory level consists of a visualmodule processing letters, SOMlt

and an auditory module processingphonemes, SOMph

A.P. Paplinski 15


3.7 A Simple Feedforward Multimodal Self-Organizing Network

3

SOMlt SOMph

phy

nltltx

nphphx

lty

SOMbm

ybm

3

v,d

x

W V

3

x

v,dW V

W v,d V

x

• Unisensory maps, SOMlt and SOMph, receive sensory stimuli, xlt

and xph.

• The bimodal map, SOMbm receives combined pair ofthree-dimensional outputs from sensory maps, ylt.

• The learning process can take place concurrently for each SOM,according to the Kohonen learning law.

• After self-organization each SOM performs the mapping of the form:

y(k) = g(x(k); W, V )

where x(k) represents the kth stimulus for a given map, W is the weight matrix, and V describes thestructure of the neuronal grid.

• The 3-D output signal y(k) = [d(k) v(k)] combines the 2-D position of the winner v(k) with its 1-Dpost-synaptic activity d(k).

A.P. Paplinski 16


3.8 Surfaces of post-synaptic activities

The trained map post-synaptic activity is: d(k) = W · x(k)

020

40

0

20

400.7

0.8

0.9

1 ö

Bimodal map

36 × 36 neuronal grid

• Example of the postsynaptic activity in the trainedbimodal map consisting of 36× 36 neurons when thestimuli representing letter/phoneme o are presented to:

— the visual letter map and— the auditory phoneme map

• The activity in the map for one phoneme/lettercombination shows one winning patch with activitydescending away from this patch.

• The patch of neurons representing o is clearly distinguished from other patches.

• Collections of such patches forms the final maps.

A.P. Paplinski 17


3.9 The unimodal visual map for letters

• The stimulus xlt to the visual letter map is a 22-element vector obtained by aprincipal component analysis of scanned (21×25)-element binary character images.

• After self-organization the visual letter map SOMlt is represented by patches ofhighest post-synaptic activities of a neuronal population consisting of 36×36 neurons

VSOMlt

nltltx

lty 3

x

v,dW

Letter map

a

ä

e

i

y

å

u

o

ö

sS

f

b

d

g

kl

mn

p

r

t v

• The map shows the expected similarity properties — symbols thatlook alike are placed close to each other in the map.

• For example, the cluster of characters f, t, r, i, l, k, b and p basedpredominantly on a vertical stroke.

• The neuronal populations within the patches of the letter mapconstitute the detectors of the respective letter.

A.P. Paplinski 18


3.10 The auditory map for phonemes

• The auditory material consists of twenty-three phonemes as spoken by tennative Swedish speakers.

• Each phoneme spoken by each speaker is represented by thirty-six melcepstralcoefficients.

VSOMph

phy

nphphx

3

v,d

x

W

• These feature vectors were averaged over the speakers, yielding one thirty-six element feature vectorfor each phoneme.

• This averaged set of vectors constitutes the inputs xph to the auditory map SOMph.

Phoneme map

a

ä e

i

y

å

u

o

ö

sS

f

b d

g

k

l

mn

p

r

t

v

• After the learning process we obtain a phoneme map.

• As for the letter map the patches of neuronal populations constitute thedetectors of the respective phonemes.

• Note the plosives g, k, t and p, the fricatives s, S (sh-sound) and f and

the nasal consonants m and n form three close groups on the map.

• Also vowels with similar spectral properties are placed close to eachother.

• The exact placing of the groups vary from one self-organization toanother, but the existence of these groups is certain.

A.P. Paplinski 19


3.11 The bimodal map integrating phonemes and letters

• The outputs from the auditory phoneme map and the visual letter map arecombined as 6-dimensional inputs [ylt yph] to the bimodal map SOMbm.

3

SOMbm

ybm

phylty

W v,d V

x

3

3

Bimodal map

a

äe

i

y

å

u

o

ö

s

S

f

b

d

g

k

l

mn

p

r t

v

• The patches of highest post-synaptic activity combine characteristics ofletters and phonemes.

• Note, for example, groups of

the fricative consonants s, S and f, and

the nasal consonants m and n.

• Most, but not all, vowels form a group and those who are isolated haveobviously been placed under influence from the visual letter map.

A.P. Paplinski 20


3.12 Robustness of the bimodal percepts against unimodal disturbances

• An important advantage of integration of stimuli from sensory-specific cortices into multimodalpercepts in multimodal association cortices is that even large disturbances in the stimuli may beeliminated in the multimodal percepts:

3

SOMlt SOMph

phy

nltltx

nphphx

lty

SOMbm

ybm

3

v,d

x

W V

3

x

v,dW V

W v,d V

x

Letter map

å

i

m

Phoneme map

å

0.5’å’ + 0.5’o’

i

0.5’i’ + 0.5’y’

m

0.43’m’ + 0.57’n’

• We excite the network with the three letters i, a and m which are all uncorrupted.

• The corresponding phonemes i, a and m are heavily corrupted as shown.

• These corrupted phonemes cause the activity on the phoneme map to move significantly from theoriginal positions.

A.P. Paplinski 21


3

SOMlt SOMph

phy

nltltx

nphphx

lty

SOMbm

ybm

3

v,d

x

W V

3

x

v,dW V

W v,d V

x

Letter map

å

i

m

Phoneme map

å

0.5’å’ + 0.5’o’

i

0.5’i’ + 0.5’y’

m

0.43’m’ + 0.57’n’ Bimodal map

å

i

m

• In the bimodal map the activities have moved very little.

• The recognition of these bimodal percepts is much less influenced by the auditory corruptions than therecognition in the phoneme map.

• This holds for all other letter/phoneme combinations as well.

• The above testing was done for the feedforward network

A.P. Paplinski 22


3.13 Feedback and its significance for auditory perception

The robustness of the bimodal percepts, demonstrated previously can be employed to benefit throughfeedback to enhance auditory perception.

Top

−D

own

Fee

dbac

k

SOMlt SOMph

phy

SOMrph

rphy

SOMbm

ybm

nltltx

nphphx

lty

W v,d V

x3

3

v,d

x

W V

3

x

v,dW V

3

W v,d V

x

3

• We introduce feedback in our MuSON through the re-codedphoneme map SOMrph.

• The 6-dimensional input stimuli to the re-coded phoneme map[yph ybm] is formed from the

• feedforward connection from the sensory phoneme map SOMph

and

• the top-down feedback connection from the bimodal mapSOMbm.

A.P. Paplinski 23


Top

−D

own

Fee

dbac

k

SOMlt SOMph

phy

SOMrph

rphy

SOMbm

ybm

nltltx

nphphx

lty

W v,d V

x3

3

v,d

x

W V

3

x

v,dW V

3

W v,d V

x

3

• The first phase of self-organization is identical as for thefeedforward network.

• In the second phase of self-organization the re-coded phonememap SOMrph is initialized to have the 23 winners in the samepositions as in the phoneme map SOMph.

• The weight vectors are then trained by the Kohonen rule.

• Relaxation is included between two maps in the feedback loop.

A.P. Paplinski 24


3.14 Re-coded Phoneme Map

• After self-organization the re-coded phoneme map SOMrph is seen to be similar to the phoneme mapSOMph.

Top

−D

own

Fee

dbac

k

SOMlt SOMph

phy

SOMrph

rphy

SOMbm

ybm

nltltx

nphphx

lty

W v,d V

x3

3

v,d

x

W V

3

x

v,dW V

3

W v,d V

x

3

Phoneme map

a

ä e

i

y

å

u

o

ö

sS

f

b d

g

k

l

mn

p

r

t

v

Re−coded phoneme map

a

äe

i

y

å

u

o

ö

sS

f

bd

g

k

l

mn

p

r

t

v

• The bimodal map is similar to that in the feedforward case.

A.P. Paplinski 25


3.15 Robustness of the re-coded phoneme map

• The re-coded phoneme map through its feedback connections has a dramatically different propertywhen phonemes are corrupted.

Top

−D

own

Fee

dbac

k

SOMlt SOMph

phy

SOMrph

rphy

SOMbm

ybm

nltltx

nphphx

lty

W v,d V

x3

3

v,d

x

W V

3

x

v,dW V

3

W v,d V

x

3

Phoneme map

å

0.5’å’ + 0.5’o’

i

0.5’i’ + 0.5’y’

m

0.43’m’ + 0.57’n’ Re−coded phoneme map

å

i

m

Bimodal map

å

i

m

• The map shows post-synaptic activities for the three letters and phonemes,i, a and m,

• Solid lines show the results when the auditory and visual inputs to the sensory-specific maps areperfect.

A.P. Paplinski 26


Top

−D

own

Fee

dbac

k

SOMlt SOMph

phy

SOMrph

rphy

SOMbm

ybm

nltltx

nphphx

lty

W v,d V

x3

3

v,d

x

W V

3

x

v,dW V

3

W v,d V

x

3

Phoneme map

å

0.5’å’ + 0.5’o’

i

0.5’i’ + 0.5’y’

m

0.43’m’ + 0.57’n’ Re−coded phoneme map

å

i

m

Bimodal map

å

i

m

• Dotted lines show the results when the auditory inputs to the unisensory phoneme map are heavilycorrupted

• Note that when phonemes are corrupted the activities in the re-coded phoneme map changeinsignificantly compared to the activities in the phoneme map.

• This holds for all other letter/phoneme combinations as well.

A.P. Paplinski 27


Conclusion

• With a Multimodal Self-Organizing Network we have simulated bimodal integration of audiovisualspeech elements, phonemes and letters.

• We have demonstrated that the bimodal percepts are robust against corrupted phonemes, and that

• when these robust bimodal percepts are fed back to the auditory stream the auditory perception isgreatly enhanced.

• These results agree with known results from psychology and neuroscience.

A.P. Paplinski 28

a/prof. andrew p. paplinski, school of information technology...

Documents