a neural learning algorithm for blind separation of sources based on geometric properties

17
* Corresponding author. Fax: 34-58-243230; e-mail: aprieto@ugr.es. Signal Processing 64 (1998) 315 331 A neural learning algorithm for blind separation of sources based on geometric properties Alberto Prieto*, Carlos G. Puntonet, Beatriz Prieto Departamento de Electro & nica y Tecnologı & a de Computadores, Universidad de Granada, 18071 Granada, Spain Received 17 February 1997 Abstract This paper presents a new approach to recover original signals (‘sources’) from their linear mixtures, observed by the same number of sensors. The algorithms proposed assume that the input distributions are bounded and that the sources generate certain combinations of extreme values (‘critical vectors’). The idea is very simple and is based on geometric algebra properties. We present a neural network approach to show that with two networks, one for the separation of sources and one for weight learning, running in parallel, it is possible to efficiently recover the original signals. The learning rule is unsupervised and each computational element uses only local information. Preliminary results obtained from experiments with synthetic and real signals are included to show the potential and limitations of the proced- ure. ( 1998 Elsevier Science B.V. All rights reserved. Zusammenfassung Dieser Aufsatz stellt eine neue Methode vor, urspru¨ ngliche Signale (‘‘Quellen) aus ebensovielen Linearkombinationen dieser Signale wiederzugewinnen. Der vorgeschlagene Algorithmus setzt voraus, da{ der Wertebereich jedes Eingangs- signals beschra¨nkt ist, und die Quellen auch bestimmte Kombinationen der zugeho¨rigen Extremalwerte (‘‘kritische Vektoren) erzeugen. Die zugrundeliegende Idee ist sehr einfach und stu¨tzt sich auf geometrische Eigenschaften der zugrundeliegenden algebraischen Beziehungen. Um zu zeigen, da{ mit zwei parallel arbeitenden Netzwerken, einem zur Trennung der Quellen und einem fu¨r das Lernen der Gewichte, die urspru¨nglichen Signale effizient wiedergewonnen werden ko¨nnen, stellen wir einen auf neuronalen Netzwerken basierenden Ansatz hierfu¨r vor. Die Lernregel ist nicht u¨berwacht; au{erdem verwendet jede Recheneinheit lediglich lokale Informationen. Vorla¨ufige Ergebnisse aus Ver- suchen mit synthetischen und realen Signalen werden miteinbezogen, um die Mo¨ glichkeiten und Grenzen der Vorgehen- sweise aufzuzeigen. ( 1998 Elsevier Science B.V. All rights reserved. Re´ sume´ Cet article pre´sente une nouvelle approximation pour re´composer des signaux originaux (‘sources’),a` partir de leurs me´langes line´aires observe´es par le meˆme nombre de capteurs. L’algorithme propose´ suppose que les distributions d’entre´ e sont borne´ es et que les sources engendrent quelques combinaisons avec des valeurs extre` mes (‘vecteurs critiques’). L’ide´e est tre`s simple et elle est base´e sur des propriete´es de l’alge`bre ge´ometrique. Nous pre´sentons une approximation de re´seau de neurones pour montrer qu’il est possible de re´composer efficacement les signaux originaux avec deux 0165-1684/98/$19.00 ( 1998 Elsevier Science B.V. All rights reserved. PII S0165-1684(97)00198-9

Upload: alberto-prieto

Post on 02-Jul-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

*Corresponding author. Fax: 34-58-243230; e-mail: [email protected].

Signal Processing 64 (1998) 315—331

A neural learning algorithm for blind separation of sources basedon geometric properties

Alberto Prieto*, Carlos G. Puntonet, Beatriz Prieto

Departamento de Electro& nica y Tecnologı&a de Computadores, Universidad de Granada, 18071 Granada, Spain

Received 17 February 1997

Abstract

This paper presents a new approach to recover original signals (‘sources’) from their linear mixtures, observed by thesame number of sensors. The algorithms proposed assume that the input distributions are bounded and that the sourcesgenerate certain combinations of extreme values (‘critical vectors’). The idea is very simple and is based on geometricalgebra properties. We present a neural network approach to show that with two networks, one for the separation ofsources and one for weight learning, running in parallel, it is possible to efficiently recover the original signals. Thelearning rule is unsupervised and each computational element uses only local information. Preliminary results obtainedfrom experiments with synthetic and real signals are included to show the potential and limitations of the proced-ure. ( 1998 Elsevier Science B.V. All rights reserved.

Zusammenfassung

Dieser Aufsatz stellt eine neue Methode vor, ursprungliche Signale (‘‘Quellen”) aus ebensovielen Linearkombinationendieser Signale wiederzugewinnen. Der vorgeschlagene Algorithmus setzt voraus, da{ der Wertebereich jedes Eingangs-signals beschrankt ist, und die Quellen auch bestimmte Kombinationen der zugehorigen Extremalwerte (‘‘kritischeVektoren”) erzeugen. Die zugrundeliegende Idee ist sehr einfach und stutzt sich auf geometrische Eigenschaften derzugrundeliegenden algebraischen Beziehungen. Um zu zeigen, da{ mit zwei parallel arbeitenden Netzwerken, einem zurTrennung der Quellen und einem fur das Lernen der Gewichte, die ursprunglichen Signale effizient wiedergewonnenwerden konnen, stellen wir einen auf neuronalen Netzwerken basierenden Ansatz hierfur vor. Die Lernregel ist nichtuberwacht; au{erdem verwendet jede Recheneinheit lediglich lokale Informationen. Vorlaufige Ergebnisse aus Ver-suchen mit synthetischen und realen Signalen werden miteinbezogen, um die Moglichkeiten und Grenzen der Vorgehen-sweise aufzuzeigen. ( 1998 Elsevier Science B.V. All rights reserved.

Resume

Cet article presente une nouvelle approximation pour recomposer des signaux originaux (‘sources’), a partir de leursmelanges lineaires observees par le meme nombre de capteurs. L’algorithme propose suppose que les distributionsd’entree sont bornees et que les sources engendrent quelques combinaisons avec des valeurs extremes (‘vecteurs critiques’).L’idee est tres simple et elle est basee sur des proprietees de l’algebre geometrique. Nous presentons une approximationde reseau de neurones pour montrer qu’il est possible de recomposer efficacement les signaux originaux avec deux

0165-1684/98/$19.00 ( 1998 Elsevier Science B.V. All rights reserved.PII S 0 1 6 5 - 1 6 8 4 ( 9 7 ) 0 0 1 9 8 - 9

reseaux de neurones, une pour separer les sources et l’autre pour l’apprentisage des poids, qui travaillent en parallele. Laregle d’adaptation est non supervisee, et chaque element de calcul emploit seulement d’information local. Nous avonsinclus des resultats preliminaires obtenus a partir des experiences des signaux reelles et synthetiques pour montrer lepotenciel et les limitations de ce proces. ( 1998 Elsevier Science B.V. All rights reserved.

Keywords: Signal processing; Artificial neural networks; Competitive learning; Blind separation of sources

1. Introduction

The problem of blind separation of sources in-volves obtaining the signals generated by p sources,sj, j"1, 2, p, from the mixtures detected by p sen-

sors, ei, i"1, 2, p. The mixture of the signals may

take place in the medium in which they arepropagated. One of the situations of greatest re-search interest is that of linear mixtures. In thiscase, the observed signals correspond to the orig-inal sources modified by a linear transformation,A"(a

ij). The crux of the problem is to obtain the

coefficients of the linear application, aij, and from

these to calculate the inverse of the transformation;this enables the reconstruction of the original sig-nals when only the observations are known.

The literature displays a diversity of approaches,most of which use some kind of statistical analysis,cost function, the entropy concept, etc., and someof these algorithms are implemented by artificialneural networks [1—3,5,8]. Reviews of the problemof separation of sources can be found in [4,7].The usual main assumption of the problem is thestatistical independence of the sources [6].

We have developed various algorithms to separ-ate linear mixtures, all of which are based on thegeometric and algebraic properties of the trans-formation taking place between the sources and themixtures. Characteristics of these algorithms aretheir simplicity and the fact that they enable theseparation of statistically dependent signals (seeExample 2, Section 8). Indeed, the main hypothesisof our procedure relies on the bounded condition ofthe possible values of the different sources. Thissupposition is a reasonable one as any signal pro-duced by a real source and transmitted througha physical medium must have a finite energy. If thesources produce, at least once, combinations oftheir extreme values, our procedure leads to their

demixing even when they are statistically depen-dent. The hypothesis of bounded sources impliesthat the possible source vectors, s, occupy a geo-metrically bounded zone, the source space S, withinthe space (s

1,2, s

p). If the sources were statistically

independent the source space would be, for p"2,a rectangle or, in general for any dimension, a rec-tangular hyperparallelepiped. The linear trans-formation A maps the vectors of the source spaceon space vectors (e

1,2, e

p), with the set of obtain-

able vectors e comprising a mixing space, E. It mayeasily be shown that if the source space is a hyper-parallelepiped then the mixing space is one, too,though it is not necessarily rectangular.

In [11] we proposed a method to separate binaryand multivalued signals. In the present case, if wehave p signals, each of which is m-valued, the sourcespace is composed of mp points regularly distrib-uted on the surface and within the rectangularhyperparallelepiped that comprises the sourcespace. Denominating the possible values of eachsource as 0, 1,2, m!1, the edges, s

gi, of the hyper-

parallelepiped source space are defined by the fol-lowing combinations of sources (source vectors):

sg1"(0,0,2,0,0), (0,0,2,0,1), (0,0,2,0,2) ,2,

(0,0,2,0, m!1),

sg2"(0,0,2,0,0), (0,0,2,1,0), (0,0,2,2,0) ,2,

(0,0,2, m!1, 0), (1)

22222

sgm"(0,0,2,0,0), (1,0,2,0,0), (2,0,2,2,0) ,2,

(m!1,0,2,0,0).

Also in [11], we showed that the images of thevectors obtained with transformation A are foundat the edges of the hyperparallelepiped cone that

316 A. Prieto et al. / Signal Processing 64 (1998) 315–331

contains the mixing space. Furthermore, the coor-dinates of the images of the base vectors of spaceE correspond exactly to the coefficients of matrix A.The procedure proposed in [11] obtains the vectorimages of the base, i.e. the mixing space edge vectors,detecting the p mixing vectors that fulfill the condi-tion that the vector with the greatest norm withinspace E (i.e. A(n!1,2, n!1)T) is a multiple of thesum of the images of the base vectors within sucha space, i.e.

A(n!1,2,n!1)T"(n!1)p+i/1

aki

"(n!1) A(1,2,1,2,1)T. (2)

Subsequently, [9] extended the procedure tocontinuous sources, showing that the slopes of theedges of the hyperparallelepiped cone that containthe mixing space provide a matrix that is valid forthe recovery of the original sources. In this study wepropose different algorithms to obtain such slopes.One of these algorithms projects the edges onto theplanes (i, j) (i, j3M1,2, pN) and obtains two coeffi-cients for each plane (i, j) by just detecting thosepoints (e

i, e

j) where e

i/e

jpresents a minimum.

Recently, [10,12] developed a method that con-sisted of detecting one of the vertices of thehyperparallelepiped and obtaining any set of p edgevectors incident on such a vertex, termed criticalvectors. The coordinates of these vectors, referredto the vertex, comprise a demixing matrix. To ob-tain such a set, the following cost function is used:

p"p~1+i/1

p+

j/i`1

cos(�i'�

j)'

p~1+i/1

p+

j/i`1

cos(wi'w

j)

∀�kOw

k, k3M1,2, pN, (3)

in which w represents the vectors at the edges and� represents p generic vectors within the hyper-parallelepiped.

The present paper is intended to (1) describe indetail the neuronal implementation suggested in[10,12] together with the necessary preprocessing,(2) test its behaviour with regard to the noise intro-duced into the mixed signals, and (3) show thepossibilities and drawbacks of the procedure usingsimulation results.

Throughout this paper, the following hypothesesare assumed:1. The signals s

j(t) ( j"1,2, p) and the linear

transformation are unknown.2. The sources are bounded. This restriction is

plausible due to the fact that, in practice, thephysical signals (speech, radar, sonar, biomedi-cal signals, etc.) are limited in amplitude.

3. The number of sensors is equal to the number ofsources, p.

4. The observed (sensed or mixed) signals,e"(e

i(t))T, are obtained from the sources

s"(si(t))T, by the linear transformation A"

(aij), and so

e(t)"As(t),(4)

e,s3Rp, A3Rp]p,

where e(t) and s(t) denote vectors of componentsei(t) and s

i(t), respectively, which are included in

Eq. (4) as column matrices; the linear trans-formation A is an unknown p]p matrix (mixingmatrix), in which it is assumed that all the ele-ments of the main diagonal are non-zero:

aiiO0, ∀i3M1,2,pN. (5)

This condition is a reasonable one, as it indicatesthat sensor i, associated with source i, detects thesignal from this source.Eq. (4) models an instantaneous linear mixture.The problem of separation consists of retrievingthe unknown sources s(t), only from the observa-tions e(t). If, by some procedure, A could beobtained, the original signals could be recovered,by just computing the following expression:

s(t)"A~1e(t). (6)

It is impossible to determine A exactly when onlythe mixed signals are known. However the recoveryof the signals in the form c

isi, i3M1,2, pN is also

considered a valid solution for the problem; thisrecovery consists of multiplying the original signalsby a constant c

iwhich represents an undefined and

non-zero scale factor. This factor corresponds,physically, to an amplification or an attenuation.A solution is also considered valid if the indices arepermuted with respect to the original sources. Therecovered sources that are identifiable with the

A. Prieto et al. / Signal Processing 64 (1998) 315—331 317

indicated indeterminations are termed y"(cisi)T

(transformed sources). Analytically, these indetermi-nations can be represented as follows:

y(t)"DPs(t), (7)

where P is a permutation matrix and D is a diag-onal matrix.Mathematically, the y signals can beobtained from the mixed signals by means of a lin-ear transformation W:

y(t)"W~1e(t) (8)

thus satisfying

W~1A"DP,Z. (9)

Any matrix W related to A as in Eq. (9) is said to besimilar to A, and is considered valid to obtain thetransformed sources, y(t). If the original sourceswere obtained without a scale factor or index per-mutation, the matrix Z, defined in Eq. (9), would bethe identity matrix, i.e. Z"I, where I"(d

ij), and

dij

represents the Kronecker delta:

dij"G

1 if i"j,

0 if iOj.(10)

2. Foundations of the procedure

The algorithm proposed in this paper acts in realtime, such that time is considered discretized ast"0¹, ¹, 2¹,2, n¹. The neural network inputcaptures successive samples of observation vectors:e(0), e(1),2, e(n). The set of all these possible vec-tors forms the observation space, or E space. Thevectors of the observed space are images of a sourcespace, or S space, which includes the region wheresource vectors can be generated.

Consider a set of orthogonal vectors of thesource space, with the following coordinates:

uj"(c

jdij)T, u

j3Rp, u

j"(u

ij)T, i, j3M1,2, pN,

(11)

where dij

is defined in Eq. (10).The images of these vectors in the observation

space are

wj"(Au

j)T, w

j3Rp,

wj"(w

ij)T, i, j3M1,2, pN. (12)

Bearing in mind Eq. (4), the components of thevector images w

j"(w

ij) are

wij"

p+k/1

aikukj"a

ijcj. (13)

The wj

image vectors of the orthogonal vectorscorresponding to the space S may be consideredcolumns of a matrix W3RpCp, with w

ijcompo-

nents:

¼"(wij)"AD

c, where D

c"(c

jdij). (14)

That is, from Eq. (9), the vector images of the baseof the space S form a matrix W, similar to A, withP"I and D"D~1

c. The problem, therefore, is

limited to that of locating these images.

3. Geometric interpretation

Let us suppose that the sources satisfy sj(t)3

R` ( j"1,2, p); as they are bounded, i.e.,

0)sj)SM

j. (15)

In consequence of this hypothesis, the source spacetakes the form of a p-dimensional hyperparal-lelepiped (Fig. 1(a)). As the image vectors are ob-tained by a linear transformation (4), it may bededuced that the observation space, E, forms a p-dimensional hyperparallelepiped. Fig. 1 shows thecase in which p"3. At each instant of time, t, anobservation vector, e(t), corresponds to the imageobtained by the linear transformation, A, froma source vector s(t) on R1. The exact form of thehyperparallelepiped comprising the observationspace depends on the values of the a

ijcoefficients.

As proved in Section 2, due to the linearity ofthe transformation, the base images of space S,Mu

1,2, u

j,2, u

pN, are vectors, Mw

1,2, w

j,2, w

pN,

located at the edges of the hyperparallepiped conecontaining space E (see Fig. 1).

Therefore, the problem of source separation, un-der hypothesis (15), is limited to that of obtainingthese image vectors. Each image vector of an ortho-gonal vector has p components, which comprise theelements of a column of W. If, on one edge, anothervector w*

jis obtained which is different from the

318 A. Prieto et al. / Signal Processing 64 (1998) 315–331

Fig. 1. (a) Source space (space S); (b) observation space (spaceE).

image wj, it is verified that

w*j"cw

jN w*

ij"cw

ij, c3R (16)

and so the effect of obtaining any vector corres-ponding to edge j, rather than the image of u

j, is

that column j of matrix W is multiplied by a con-stant c. In other words, the obtained matrix W isstill similar to A, as Eq. (9) is still satisfied. The goalis then limited to the search for a set of p vectorseach located on one of the edges of the hyperparal-lelepiped cone containing the observation space.

4. Procedure

According to the conclusion of the above section(Section 3), the problem of separation of sources,with hypothesis (15) being verified, is reduced toone of obtaining vectors at the edges of the hyper-parallelepiped cone that contains the observationspace. These vectors are termed, for brevity, edge-vectors. Consequently, the goal of the learning algo-rithm is to identify edge-vectors, among all theobserved vectors.

The method here proposed is based on the fact,proved in [11] that p observations located atp edges constitute a set of p vectors with a maxi-mum angular separation, within the observationspace (Fig. 1(b)).

As the components of two vectors e(u)"(ei(u))T

and e(v)"(ei(v))T, obtained at instants u and v,

respectively, are known, it is possible to computethe cosine of the angle formed, or the angular prox-imity, c

uv, from the scalar product. That is, the

following expression can be evaluated:

cuv"cos[e(u)'e(l)]"

+pi/j

ei(u)e

i(v)

De(u)D De(v)D. (17)

The angular proximity of two vectors is maximum,#1, when the two vectors coincide (0° angle) andminimum, !1, when they are furthest apart (180°angle).

Consider a set, C, of p vectors. An angular prox-imity matrix, C

C, may be defined as follows:

CC"(c

uv); u,v3C, C

C3RpCp, (18)

in which the following properties are satisfied:

cuu"1, c

uv"c

vu, u, v3M1,2, pN. (19)

The sum of the proximities of a vector ek

of setC with the remaining vectors of the set can beobtained from the following expression:

CC(k)"

p,uEk+u/1

cku. (20)

As p vectors, ¼"Mw1,2,w

pN, at the edges of the

hyperparallelepiped will be p vectors of the obser-vation space with the greatest possible separationbetween them (Fig. 1(b)), they can be located whentheir proximity c

Ctakes the minimum value; that is

cW)c

C. (21)

This section has so far considered the hypothesisthat the lower bound of the sources is 0, that issj(t)3R` ( j"1,2, p). It is clear, from a geometric

point of view and bearing in mind the linearity of A,that the more general case of signals with non-zerolower bounds, Sm, i.e., s

i(t)3[Sm

i, SM

i], with Sm

i,

SMi3R, Sm

i(SM

i, is equivalent to the previous situ-

ation after a coordinate change with a vector �,where � represents the coordinates of one of thevertices of the hyperpararallelepiped. This coordi-nate change is equivalent to translating the hyper-parallelepiped such that vertex � is located at theorigin. Thus the problem is transformed into thesi3R` case. A vertex can be obtained, for example,

by detecting a vector such that the sum of thesecomponents is minimum; i.e. �

eo"(e

1,2, e

p) is

a vertex if e1#2#e

pis a minimum.

A. Prieto et al. / Signal Processing 64 (1998) 315—331 319

5. Critical vectors

From the above section it may be concludedthat, in order to identify the medium and performsource separation, it is necessary to determine1. any vertex, �

ek, of the hyperparallelepiped, so

that the change of coordinates can be carriedout, and

2. any p vectors of each of the p edges convergingat the translation vertex.

These conditions imply the generation of certainsource combinations, without which the proposedprocedure would be unable to identify the medium.As the vertices of the observation space correspondto vertices of the source space, to obtain a vertex ofthe hyperparallelepiped of observations it is neces-sary for the sources to generate a vector such thatall its components are extremes of their respectivebounds. For example, for p"3, one of the follow-ing combinations would have to exist, correspond-ing to the coordinates of the eight vertices:

�S0"(Sm

1, Sm

2, Sm

3), �

S4"(SM

1, Sm

2, Sm

3),

�S1"(Sm

1, Sm

2, SM

3), �

S5"(SM

1, Sm

2, SM

3),

�S2"(Sm

1, SM

2, Sm

3), �

S6"(SM

1, SM

2, Sm

3),

�S3"(Sm

1, SM

2, SM

3), �

S7"(SM

1, SM

2, SM

3).

The conditions for obtaining edge vectors are lessrestrictive, as any one of the vectors of each of theedges is sufficient. Thus, if the translation is per-formed with �

S4it is only necessary to generate

three vectors in the form (s1, Sm

2, Sm

3), (SM

1, s

2, Sm

3)

and (SM1, Sm

2, s

3), where s

ican take any value,

si3[Sm

i, SM

i]. The set formed by these three vectors

and the vertex is termed critical vectors. For p sour-ces there will be p#1 critical vectors which mustbe generated by the sources. Note that there may bea statistical dependence between the sources pre-venting the generation of the critical vectors, inwhich case the proposed procedure could be invali-dated. Nevertheless, there are a great many situ-ations, even including statistical dependencies,some of which are illustrated by the examples inSection 8, where all the required assumptions aresatisfied, and thus the procedure is valid.

6. Neural network architectures

The procedure consists of two algorithms whichmay be mapped on two neural networks actingsimultaneously. One of these recursively demixesthe signals to obtain the transformed signals y(t);the other network adaptively obtains, by unsuper-vised learning, matrix W, and identifies the linearmedium. Throughout the process and over time,W is adapted, W"W(t), gradually approachinga matrix similar to A. The elements of matrixW represent the weights of the two networks.

The learning network, as conceived, obtains thenormalized weight vectors, w

j"(w

ij). This does not

present any problem, as it means that the elementsof each column of matrix W are divided by a con-stant value Dw

jD and so, as the relation between

A and W, described by expression (9), is maintained,W is still equivalent to A.

6.1. Demixing network

Expression (4) may be rewritten as follows:

ei(t)"

p+j/1

wij(t)y

j(t)"w

iiyi(t)#

p+

j/1, jEi

wij(t)y

j(t),

∀i3M1,2,pN, (22)

from which it is possible to recursively obtain oneof the sources, y

i(t), when the others, y

j(t), are

known:

yi(t)"

1

wii(t)

ei(t)!

p, jEi+j/1

wij(t)

wii(t)

yj(t),

∀ i3M1,2,pN, jOi.(23)

Expression (23) can be mapped on a recursive neu-ral network, as shown in Fig. 2, with p neurons ofseparation (NS1,2, NSp). The inputs, e

i(t), are the

sensed signals, and the outputs, yi(t), the recon-

structed signals. Note that, with this recursive net-work, it is not necessary to explicitly calculate¼~1 to obtain the sources, and that it is necessaryto verify w

iiO0, i"1,2, p, as indicated in Hy-

pothesis 4 in Section 1. In Section 9.2 we presentsome results showing how the network converges.

320 A. Prieto et al. / Signal Processing 64 (1998) 315–331

Fig. 2. Recursive network for source recovery.

Fig. 3. Network for weight learning.

6.2. Learning network

In order to simplify the notation in this sectionthe dependence on time is not explicitly included inthe variables. To facilitate the obtaining of theweights, the algorithm first normalizes the inputvectors such that +e*

i2"1, i"1,2,p. To achieve

this, the following transformation is carried out atthe input:

e*iQ

ei

DED, ∀ i3M1,2,pN. (24)

When the vectors have been translated and thenormalization has been carried out, the adaptiveand unsupervised obtaining of weights may be rep-resented by a network, as shown in Fig. 3. Theoutputs, c

01,2, c

0k,2, c

0p, of the learning neurons

(NL1,2, NLp) of the second layer are the weightedsums of the inputs, i.e.,

c01"

p+i/1

wi1e*i,2,

c0k"

p+i/1

wike*i,2, (25)

cop"

p+i/1

wipe*i.

Thus each output, c0k

, represents the scalar productof the normalized input vector, e*, with the w

k

vector. In other words, as in Eq. (17), c0k

is theangular proximity of the new observation vector tothe k edge of the hyperparallelepiped cone.

The computing element of the output (NL0) gen-erates a signal, M, to modify the weights, and ischaracterized by a threshold matrix, C. The ele-ments of this are the proximities between the edge-vectors (18), i.e.,

CW"(c

jk), j, k3M1,2,pN, (26)

where cjk

indicates the proximity between thewjand w

kvectors.

Based on the above matrix, a threshold asso-ciated with each w

kvector is defined, with its prox-

imity to the other vectors wjbeing represented by

Eq. (20):

CW(k)"

p, iEk+i/1

cki. (27)

The output element, NL0, performs three functions,which are normal in different neural networkmodels:1. It detects the winning neuron, N

k, of the second

layer; that is, the neuron k with the greatestoutput, c

0k.

2. It sums its inputs, i.e. the outputs of the neuronsof the second layer, except that of the winner

A. Prieto et al. / Signal Processing 64 (1998) 315—331 321

Fig. 4. A geometric interpretation of weight learning (the dotted zone corresponds to the current observation space).

(i.e. NLk),

Ce0k

"c01#2#c

0k~1#c

0k`1#2#c

0p.

(28)

3. If the sum falls below the threshold Ck, it updates

the threshold matrix and generates an outputvalue M"1, as in this situation the weightsmust be modified.

iff Ce0k

(Ck: c

kiQc

0i, c

ikQc

ki, MQ1. (29)

Weight modification is carried out according tothe following criteria:

iff M"1: wlkQe*

1,2,w

jkQe*

j,2,w

pkQe*

p.

(30)

As vector e* is normalized, the wkvectors are nor-

malized too.Summarizing, the network substitutes the edge-

vector that is closest to the input vector for thelatter, if this process results in a lower proximityvalue of the new set of edge-vectors.

Note, according to the topology illustrated inFig. 3, each neuron only performs local operations.The second-layer neurons (Nlk) just carry out theweighted summing of their inputs, while the out-put-layer neuron determines the maximum value ofits inputs, obtains their sum and compares it with

the threshold values. These values may then bemodified, taking into account only the input datafor this neuron. The output signal of the network(M3M1,0N) acts on all the connections betweenlayers 1 and 2, indicating whether or not theweights corresponding to the winner should bechanged for the input signals that locally crossthese connections.

Fig. 4 illustrates a geometric interpretation of theprocedure. The extremes of the weight and inputvectors, being normalized, are distributed over thesurface of a p-dimensional hypersphere of unitaryradius. Fig. 4 assumes p"3 and therefore there arethree weight vectors, w

1, w

2and w

3; C

1represents

the proximity of w1

to w2

and w3, C

2represents the

proximity of w2

to w1

and w3, while C

3represents

the proximity of w3

to w1

and w2. When a new

input stimulus, e, is analysed, the network deter-mines the closest weight vector (w

3in Fig. 4(a),

and w2

in Fig. 4(b) and substitutes for e theclosest weight vector to it, if by this procedure alower proximity value is obtained. In Fig. 4(a),with e a lower proximity value is derived(c

01#c

02(C

3), and so e is substituted by w

3.

However, this situation does not arise in Fig. 4(b),as substituting w

2for e does not lead to a lower

proximity value (c01#c

03'C

2), and so the

w2

vector is not updated.

322 A. Prieto et al. / Signal Processing 64 (1998) 315–331

Fig. 5. (a) When a vertex is incorrect, a greater error is produced by vectors with a smaller norm; (b) if the noise is bounded, it willtranslate both the vertex and the points of the edges and so the influence of the error is compensated.

The geometric interpretation described bearssome resemblance to the way in which competitivelearning is geometrically interpreted [13] In thepresent case, as in competitive learning, eachweight vector represents a cluster and whenevera stimulus vector, e, is presented, the unit whoseweight vector is closest to that stimulus vector onthe hypersphere wins the competition. In the rule ofcompetitive learning, whenever a unit wins thecompetition, that weight vector is moved toward thecurrent stimulus, while in the procedure proposedhere the weight vector is substituted by the stimulusvector if the new distance between weights increases.

7. Implementation

We have developed a simulation environment totest the new procedure, using the Matlab package.This simulator functions in three phases:1. Mixing. It captures or generates the sources and

performs the mixes.2. Separation. The neural network captures the

mixing vectors one by one, and between twosuccessive operations adapts the network weights(if necessary) and generates the reconstructedsources (y). Three stages can be identifed:

2.1. Preprocessing. Preprocessing, in additionto adaptively obtaining vertex v

eoand

normalizing the input vectors to the learn-ing network, reduces the effects of the inex-act locating of the vertex and the noise inthe mixed signals. As shown in Fig. 5, whichshows two consecutive edges (i, j) in theplane containing them, the detection ofa vertex that is distinct from the theoreticalone means that the edge coordinates (andthus the elements of the column corre-sponding to matrix W ) are erroneous. It caneasily be shown that the greater the moduleof the vector detected as weight vector thesmaller is this error. The same is true forthe noise in mixed signals. Assuming thatthe noise is additive and bounded, thena noise-corrupted edge e(i)(c) is displacedwithin an interval, as shown in Fig. 5(b).Obviously, the noise will also affect vertex�0

in the figure, displacing it from its orig-inal position to that represented by �(c)

0. As

the vectors considered by the networks referto the vertex, the model may be describedby the following equation:

w(i)(#)"e(i)(#)!v(#)0

"(e(i)#ni)!(v

0#n

0) (31)

"w(i)#(ni!n

0),

where ni

and n0

represent the noise atpoints e(i) and �

0, respectively.

A. Prieto et al. / Signal Processing 64 (1998) 315—331 323

If the noise is bounded and uniform (i.e.with equal effect on all the points in themedium) and as extreme values, both of thevertex and the vector edge, are detected, thenoise tends to be compensated, n

i+n

o.

From Eq. (31), in this case, the weight ob-tained would be correct, w(i)(c)+w(i). Obvi-ously, however, more time is required tolocate extreme values with noise. FromEq. (31), furthermore, the effect of noise di-minishes as the norm of w(i) with respect toni—n

0increases. In our implementation, in

addition to the coordinate-change vertex,�0, the opposite vertex, �

1, is also obtained.

Indeed, in order to consider an observationas a candidate for edge vector, the followingvalue is used:

n—r"Dv1!v

0D/2. (32)

As the vertices are adaptively obtained,they are close together at the start of theprocess. To compensate for this effect, onlythose observations, e(t), whose norms aregreater than the following value, are con-sidered candidates to be edge vectors:

n—test"(n—0!n—r)e~t@q#n—r, (33)

where n—0 and q are two parameters repres-enting the norm of the initial test and a timeconstant. Section 9.3 presents the results ofa demixing simulation with different levelsof noise.

2.2. ¸earning. The network described in Sec-tion 6.2 is implemented.

2.3. Demixing. The equations represented byEq. (23) are recursively applied for eachvalue of t. The indetermination producedby index permutation (Section 1) preventsthe correct application of expression (23), asin every iteration each e

imust be associated

with the corresponding (non-permuted) co-efficients w

iiand e

ij( j"1,2, p). To over-

come this problem, DaiiD'Da

ijD for all

i, j3M1,2, pN is always defined in the mix-ing matrices A. Thus the simulator, at theinstant when it obtains the signals y(t), reor-ders the matrix W(t), such that the elementwith the highest absolute value in each col-

umn is in the principal diagonal. Note thatthis restriction is not applied in order toobtain the demixing matrix, W, but to re-cover the sources and to be able to elimin-ate one of the indeterminations typicallyencountered when resolving the problem ofsource separation (Section 1). Furthermore,the imposed condition physically indicatesthe fact that each sensor i is nearer or moresensitive to one of the sources i (a

iicoeffic-

ient) than to the others ( jOi).In the iterative application of expression

(23), y0(t)"e(t) is taken as the initial value

for each value of t. When the difference inthe values of y in two successive iterations isless than the convergence parameter e, theiteration ends. Obviously, the smaller thevalue of the convergence parameter, themore iterations must be performed to ac-cept the value obtained. Section 9.2 pres-ents results of network convergence fordifferent values of e.

3. Graphic representations. This phase has a menuwith which it is possible to display the differentsignals acting in the process, s(t), e(t), and y(t), interms of time or as vectors of the S, E and½ spaces, respectively. There also exist optionsto visualize the variation over time of the perfor-mance index, mean error, quadratic error, andcrosstalk.

8. Some examples of simulation

This section presents results obtained from ex-periments to show the potential and limitations ofthe proposed procedure.

The first example (Example 1, Fig. 6) shows theresults of the separation of five synthetic signals:sine, triangular, and the following three syntheticsignals suggested by Amari et al. [1]:

s3"n(t),

s4"5sinA

2p

100tBcosA

2p

800tB,

s5"10signCsinA

2p40

t#9cosA2p400

tBBD,(34)

324 A. Prieto et al. / Signal Processing 64 (1998) 315–331

Fig. 6. Example 1. (a) Original signals; (b) mixed signals;(c) separate signals during the learning process.

where n(t) is a source of random noise uniformlydistributed in the range [0,10].The original matrixA is the following:

A"C1.00 0.48 0.52 0.38 0.42

0.47 0.90 0.53 0.45 0.39

0.46 0.51 0.85 0.39 0.43

0.47 0.52 0.47 0.95 0.38

0.38 0.46 0.54 0.50 0.93 D . (35)

This matrix, when normalized by columns is

AN"C0.75 0.36 0.39 0.30 0.34

0.35 0.67 0.40 0.35 0.32

0.34 0.38 0.64 0.30 0.35

0.35 0.39 0.35 0.74 0.31

0.28 0.34 0.40 0.39 0.75 D . (36)

After reordering, the following weight matrix isobtained:

W"C0.72 0.37 0.39 0.31 0.36

0.36 0.66 0.40 0.36 0.32

0.36 0.38 0.63 0.32 0.35

0.35 0.39 0.35 0.71 0.31

0.31 0.36 0.42 0.41 0.74 D . (37)

Fig. 6 includes representations of the original sour-ces (Fig. 6(a)), mixed sources (Fig. 6(b)), and separ-ate sources (Fig. 6(c)). Fig. 7 shows the variationover time of a performance index (PI-I), discussedin Section 8.1. This index represents a measure ofthe quality of the separation, indicating that suchquality rises as the index value approaches zero. Inthe present case, the performance index falls until,at t"19 000, it reaches the value PI-1(19 000)"0.5521.

Example 2 in Fig. 8 shows the results of theseparation of two statistically dependent signals,whose input vectors correspond to a circumference:

s1"5cos(2pn(t)),

s2"5sin(2pn(t)),

(38)

where n(t) is a source of random numbers uniformlydistributed in the range [0,1]. The mixing matrix,

A. Prieto et al. / Signal Processing 64 (1998) 315—331 325

Fig. 7. Example 1. Evolution of performance index during the learning process.

A, when normalized by columns, is as follows:

AN"C!0.8077 0.3304

!0.5896 0.9439D. (39)

The performance index at t"499 gives the valuePI-1(499)"2.291]10~4, producing the followingweight matrix:

W"C0.3303 !0.8077

0.9439 !0.5896D. (40)

Note that the columns of W obtained by the net-work are in a different order from those of theoriginal mixing matrix, A.

Fig. 8 includes representations of the originalsources (Fig. 8(a)), a bidimensional representationof the vectors of the source space (Fig. 8(b)) and ofthe mixed space (Fig. 8(c)), together with the vec-tors of the recovered sources (Fig. 8(d)).

The third example (Example 3) corresponds tothree real signals: these are the Spanish words

‘cuerpo’ (‘body’) (s1), ‘mano’ (‘hand’) (s

2), and

‘mun8 eca’ (‘doll’) (s3). These sources were captured

by a 14 bit A/D converter at a frequency off4"8 kHz, and a signal-to-noise ratio of 24 dB.Now the mixing matrix, A, when normalized by

columns, is

AN"C0.8355 0.3932 0.4175

0.3927 0.8191 0.4255

0.3843 0.4177 0.8029 D . (41)

The performance index at t"1100, equivalentto a time of 1.375 s, gives the value PI-1(11 000)"0.6901, and the resulting weight matrixis as follows:

W"C0.4021 0.4743 0.8287

0.7865 0.4466 0.3952

0.4687 0.7587 0.3964 D . (42)

326 A. Prieto et al. / Signal Processing 64 (1998) 315–331

Fig. 8. Example 2. (a) Original signals; (b) source space; (c) observation space and (d) separate signal space.

Fig. 9 includes representations of the originalsources (Fig. 9(a)) and separate sources (Fig. 9(b)).Fig. 10 shows the variation over time of the perfor-mance index (PI-1), quadratic error and crosstalk.

9. Discussion of the results obtained and the featuresof the procedure

9.1. Performance indices

Amari, et al. have proposed [1] a performanceindex (PI—1) which makes it possible to determine

the quality of source separation with a single value,taking into account the stated indeterminations inSection 1. This index is obtained from the followingexpression:

PI—1"p+i/1A

p+j/1

DzijD

maxkDzikD!1B

#

p+j/1

Ap+i/1

DzijD

maxkDzkjD!1B,

(43)

where Z"(zij) is defined in Eq. (9). The lower the

value of the index, the better the performance, andseparation is correct when PI"0 (W similar to A).

A. Prieto et al. / Signal Processing 64 (1998) 315—331 327

Fig. 9. Example 3. (a) Original signals; (b) separate signalsduring the early learning process.

Fig. 10. Example 3. Evolution, during the learning process, of(a) performance index; (b) quadratic error, and (c) crosstalk.

One advantage of this index is that it does notrequire matrix W to be reordered or normalized.

Another performance index used in the presentstudy (PI—2) represents the mean quadratic error ofthe elements of matrix W, reordered with respect tothe elements of A.

9.2. Convergence

To study the dependence of the convergence ofthe iterative process of signal reconstruction using

convergence parameter e, Example 3 (real signals)was performed with different values of this para-meter. The results of this are summarized inTable 1. Fig. 11 includes a histogram showing thefrequency of the number of iterations obtained forthe different time intervals in the case e"0.0001.Similar representations were obtained for all theexamples. The only problems of convergencearose in a few cases during the initial timeperiods (t(1000), when the network was still farfrom having obtained the approximate weights(PI—1'4).

328 A. Prieto et al. / Signal Processing 64 (1998) 315–331

Fig. 11. Example 2. Histogram of the number of iterations for the reconstruction of signals.

Table 1Number of iterations

e Mean value

0.1 6.350.01 9.070.001 11.360.0001 13.760.00001 20.79

9.3. Noise

In order to test the behaviour of the networkwith respect to noise, together with the conceptsintroduced in Section 7, various demixing simula-tions were performed using the three signals de-scribed in expression (34). Fig. 12(a) and 12(b)represent the variation of the performance indicesduring the learning process, without noise (N"0)and with the addition of different levels of noisein the mixed signals, obtaining corrupted mixed

signals with signal-noise ratios of 20, 14, 10 and8 dB. In every case, the noise was modelled witha random signal uniformly distributed in the inter-val [!N

0, N

0], where N

0is determined by the

required SNR.The results obtained confirm that the adaptive

translation of the vertex compensates for the addednoise, while the tendency of the curves in Fig. 12shows that an increase in time can enable the ob-taining of performance indices similar to thoseobtained with noise-free signals. Table 2 shows thetranslations automatically obtained for t'1000 inthe above-mentioned cases.

10. Conclusions

This paper presents a neural network approachfor the blind separation of linear mixtures of sour-ces. Two networks are used, one for weight learning(identified with the coefficients of the demixingmatrix) and the other to recover the original

A. Prieto et al. / Signal Processing 64 (1998) 315—331 329

Fig. 12. Evolution of performance indices during the learningprocess with different noise levels in the mixed-signals: (a) per-formance index 1; and (b) performance index II.

Table 2Translations in the mixed signals of Example 2, t values with translations

N"0 SNR"20 dB SNR"14 dB SNR"10 dB SNR"8 dB

n—0"5; q"10 000 n—0"5; q"10 000 n—0"5; q"10 000 n—0"5; q"10 000 n—0"5; q"10 000

1009 1009 1012 1007 193186610 4210 3227 10117009 7010 11810 4967

14611 6610

signals. The learning network acts in an unsuper-vised way, and recovery is performed recursively.Separation and weight learning are performed si-multaneously, i.e. there are no differentiated phasesof learning and demixing.

The most important limitations of the procedurelie in the fact that it may only be applied to linearmedia; furthermore, for good source separation it isnecessary to obtain a vertex and p vectors that mapat the p edges of the observation space. This latterlimitation has two consequences: that convergencedepends on the probability of there existing pointsclose to the hyperparallelepiped edges, and that thealgorithm is sensitive to noise, which produces a de-gree of error in the estimation of the mixing matrix.The former consequence is not a significant one, asin practice, with real signals such as voice, biomedi-cal sonar or images, combinations that produce therequired vector mixes can be obtained within ac-ceptable times. Thus, Section 8 (Example 3) showsthat, for voice signals, an adequate demixing matrixis obtained in less than 2 s. As concerns the noiseeffect, we have shown that if it is bounded anduniform throughout the medium, the only conse-quence is the greater time taken to obtain thedesired performance, as the coefficients are derivedby differences in the observations, which over timecompensate for the effect. Thus Section 9.3 showsthat, for synthetic test signals and addinga bounded white noise to the mixed signals toobtain an SNR of 8 dB, the mean quadratic error ofthe coefficients of the demixing matrix with respectto the mixing matrix is less than 0.015, after havingprocessed 17 000 samples (see Fig. 12(b)).

330 A. Prieto et al. / Signal Processing 64 (1998) 315–331

Another characteristic of the procedure is theirapplicability to statistically dependent signals,when such dependence does not imply the non-generation of signal vectors that map at the phyperparallelepiped edges that converge at thetranslation vertex (see Section 8, Example 2). Thepaper also presents experimental results with voicesignals that show the acceptable convergence of therecursive neural network used to reconstruct theoriginal sources. We consider the approach pro-posed here presents a new perspective to solve theproblem of the separation of sources.

Acknowledgements

We are grateful for the comments and sugges-tions of Manuel Rodrıguez-Alvarez, Pedro MartinSmith and Julio Ortega, and also to the referees fortheir constructive remarks.

References

[1] S. Amari, A. Cichocki, H.H. Yang, A new learning algo-rithm for blind signal separation, in: D.S. Touretzky, M.C.Mozer, M.E. Hasselmo (Eds.), Advances in Neural In-formation Processing Systems (1995), MIT Press, Cam-bridge, MA, 1996, Vol. 8, pp. 757—763.

[2] A.J. Bell, T.J. Sejnowski, An information-maximisationapproach to blind separation and blind deconvolution,Neural Comput. 7 (1995) 1129—1159.

[3] A. Cichocki, R. Unbehauen, Robust neural networks withon-line learning for blind identification and blind separ-

ation of sources, IEEE Trans. Circuits Systems—I, 43(November 1996) 894—906.

[4] P. Comon, Independent component analysis, A new con-cept?, Signal Processing 36 (1994) 287—314.

[5] C. Jutten, J. Herault, P. Comon, E. Sorouchiary, Blindseparation of sources, Parts I, II and III, Signal Processing,24 (July 1991) 1—29.

[6] C. Jutten, From source separation to independent com-ponent analysis, in: Proc. European Symp. on ArtificialNeural Networks (ESANN’97), Brudges, 16—18 April 1997,D facto, Brussels, 1997, pp. 243—248.

[7] J. Karhunen, Neural approaches to independent compon-ent analysis and source separation, in: Proc. EuropeanSymp. on Artificial Neural Networks (ESANN’96),Brudges, April 1996, D facto, Brussels, 1996, pp. 249—266.

[8] K. Matsuoka, M. Ohya, M. Kawamoto, A neural net forblind separation of nonstationary signals, Neural Net-works 8 (3) (1995) 411—419.

[9] C. Puntonet, A. Prieto, Geometric approach for blindseparation of signals, Electron. Lett. 33 (10) (1997) 835—836.

[10] C. Puntonet, A. Prieto, Neural net approach for blindseparation of sources based on geometric properties,Neurocomputing 18 (1997) to appear.

[11] C.G. Puntonet, A. Prieto, C. Jutten, M. Rodrıguez-Al-varez, J. Ortega, Separation of sources: A geometry-basedprocedure for reconstruction of n-valued signals, SignalProcessing 46 (3) (1995) 267—284.

[12] A. Prieto, C.G. Puntonet, B. Prieto, M. Rodrıguez-Alvarez,A competitive neural network for blind separation of sour-ces based on geometric properties, in: Internat. WorkConf. on Artificial and Natural Neural Networks(IWANN’97), Lanzarote, Spain, 4—7 June 1997, LectureNotes in Computer Science, Vol. 1240, Springer, Berlin,pp. 1095—1106.

[13] D.E. Rumelhart, D. Zipser, Feature discovery by competi-tive learning, in: D.E. Rumelhart, J.L. McClelland (Eds.),Parallel Distributed Processing, Vol. 1, MIT Press, Cam-bridge, MA, 1986, pp. 151—193.

A. Prieto et al. / Signal Processing 64 (1998) 315—331 331