a neural network approach to source localization

A neural network approach to source localization Ben Zion Steinberg, Mark J. Beran, •) Steven H. Chin, and James H. Howard, Jr. The Catholic University of America, Washington, DC20064

(Received 9 August 1990; revised 25 April 1991; accepted 3 May 1991 )

The use of neural network techniques to localize an acoustic point source in a homogeneous medium is demonstrated. The input data are the cosines of the phase difference measurements at an array with N detectors. Only the most fundamental types of neural network systems will be considered. Use will be made of linear and sigmoid-type neurons in a single-layer network. The performance of the single-layer network is very satisfactory for a wide range of configuration parameters if the resolution and sampling conditions are satisfied. Once the parameters of the neural network are determined, the computational effort to determine a new source location is minimal. However, when a source/detector configuration is considered that does not satisfy the resolution and sampling conditions, the single-layer network will not consistently perform well.

PACS numbers: 43.30.Wi, 43.60.Lq

INTRODUCTION

Neural network (NN) systems constitute a new, rapidly growing area of research. A specific NN technique will be applied to solve an acoustic field-inversion problem. The de- termination of the depth coordinate of a point source in an infinite homogeneous medium will be considered.

The concepts of adaptive filtering and learning ma- chines constitute the basis of our approach. Fundamental realizations of these concepts had been proposed and developed as early as the beginning of the 1960s. Classical exam- ples are: (a) the "adaptive linear element," also called ada- line, and (b) the perceptron with a unit-step nonlinearity. 1.2 In neural network studies, the neuron is used as a basic unit characterized by a smoothly varying montonic nonlinear transfer function. TM

The development and investigation of a variety of NN schemes, ranging from a single neuron to heavily interconnected configuration of neurons, has been reported. 5'6 The purpose of this article is not to cover all the vast possibilities that these new techniques offer to the wave propagation community nor to perform a detailed comparison with other techniques, but rather to examine a specific type of NN scheme and to demonstrate its potential utility. Moreover, only the most basic realization of the NN concept, namely, the neuron in a single-layer system will be considered. This will be reviewed in Sec. I. A general overview of the NN approach and its various realizations have been discussed in the literature. 3'7

Neural networks have been previously applied in array signal processing applications. In particular, a Hopfield neural network architecture, which is used as a classifier, has been applied to the estimation of amplitude, phase, and frequency of an incoming signal. 8 The network consisted of 100 neurons, which was used to classify sinusoids in the range from 0-0.99 Hz, spaced 0.1 Hz apart. Simulation results were provided which demonstrated that frequency classifi-

a) Permanent address: Tel Aviv University, Israel.

cation was possible under a specific set of signal-to-noise ratios and frequency resolution spacing. The technique was refined to include gain annealing and iterated descent, which provides supc•im convergence results. 9

Several difficulties exist when the Hopfield network is applied to the source localization problem. The Hopfield network is most appropriate when exact binary representa- tions are possible. 7 Source localization should not be limited by this constraint. In addition, it has been shown that the necessary number of input nodes and connection weights increase dramatically as the number of classes increase. This limits the number of possible classes to a relatively small quantity. Last, the Hopfield network requires that probabi- listic methods be used (i.e., annealing) at successive iterations to achieve convergence to a global minima. These considerations add to the implementation complexity.

ANN approach will be developed in this article that differs in several significant aspects with previous studies. A single layer, feed-forward network employing the backpro- pagation learning algorithm, which allows for the continuous estimation of the relevant parameter (i.e., depth), will be used. This approach allows a NN architecture which is mathematically tractable. Therefore, a complete analysis of the basic conditions required for proper localization based on the system topology can be performed.

The NN systems that shall be used have two modes: training and usage. In the first stage, the system is fed with input-training data and its actual output is compared with a "correct" (or "desired") previously known result that corresponds to the input. An adaptive feedback algorithm is then used to modify the system parameters in order to give the desired output. The adaptation process is repeated iteratively during the training phase, and the convergence of the actual output to the desired one terminates the training process. In the second stage, after training, the system can reproduce the input-output relations for which it has been trained. It can also yield correct outputs for inputs that it has never previously encountered. The training stage of the neu-

2081 J. Acoust. Soc. Am. 90 (4), Pt. 1, October 1991 0001-4966/91/102081-10500.80 © 1991 Acoustical Society of America 2081

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 84.88.136.149 On: Fri, 05 Dec 2014 09:32:28

ral network may require extensive computations. However, once the system has been properly trained, the computational effort to obtain new results in the second stage is negli- gible. This property is of fundamental importance since the training can often be done in a laboratory prior to the network's use in a real-time application.

The training data can be generated either by a direct solution of the corresponding wave equation, or by experi- mental means. In this article, direct calculations will be used. It is anticipated in future studies, however, that experimen- tal data will be used to train the network. The NN system may be trained to yield the correct inverse solution for a variety of wave-medium interactions.

In Sec. I, we review and discuss the neuron and single- layer NN systems and present a "numerical" interpretation of the neuron dynamics that clarifies some of its basic properties. Performance results based on numerical simulation

are provided in Sec. II for the single-layer structure perform- ing source localization. Finally, conclusions are given in Sec. III.

I. NEURAL NETWORK ARCHITECTURE--NEURON AND

SINGLE-LAYER SYSTEMS

The concepts used in the NN-based localization algorithm will be discussed, and the basic governing equations will be derived. Much of the material to be presented may be found in the NN literature. 3 Whenever possible, we have attempted to interpret the NN methods in terms of more familiar concepts that are not confined to the "learning machine" approach.

A. The neuron and single-layer systems

The neuron, as depicted in Fig. 1, is the simplest realization of a learning machine. Its input and actual output are an N-component vector V = (v• ,V2,...,VN) and a scalar x, respectively. We define the net input to the system IN as the inner product of ¾ and a vector of coefficients W = (w• ,w2,...,w•),

Iw = w.v. (1)

The coefficients, wi, are the weights. The actual output of the neuron, x, is related to the input vector via

X =f(IN), (2)

where f(.) is the activation function of the neuron. Despite its simplicity, the neuron is capable of perform-

ing relatively complicated tasks. The key here is the training

T x=f(v .w3

FIG. 1. The neuron.

procedure, in which we "teach" the network to yield previously known "correct" outputs T,. (teachers) for a set of corresponding (input) training vectors Vi = (vii ,v,:2,...,Viw). A set of M pairs will be used for the training. The set of pairs ( Ti,Vi ), i = 1,...,M can be, for ex- ample, the depths of a point source and the corresponding fields or phase differences measured by an array of N detectors. Although the depth of the source is assumed to be a continuous variable, the network is only trained for, at most, M discrete depths.

Training is performed by the following steps: ( 1 ) Set the weights wi, i -- 1, 2, ...., N at random values; (2) Randomly choose a teacher and its corresponding input vector, say (Tk,Vk); (3) Calculate the actual output of the neuron xk = f(V • .W ) [ see Eqs. ( 1 ) and ( 2 ) ]; (4) Adapt the coefficients w• according to the difference T• --x• using some correction rule; and (5) Repeat steps 2-4 until the actual output x• converges to the desired output (teacher) Tk, for every pair ( Tk,V• ).

Any pair may be used many times in the training process. However, as we have observed in our numerical simulations, unless the pairs are used in a random order, convergence for all pairs may not occur. Convergence is defined to have occurred if the standard deviation between the correct

outputs and the predicted outputs is small according to some criterion determined by the physics of the problem. The criterion used in this article is that the standard deviation is less

than the separation between adjacent locations. The system will be trained to solve the inverse problem

T(V• ), using data obtained by solving the forward problem ¾(Ti) analytically, numerically or experimentally. After convergence, the system can reproduce the "trained" relations Ti = T(V•), and also generalize, to some extent, the input-output relations T = T(V). That is, the system will yield "correct" outputs for inputs that had not been included in the training data.

A gradient descent correction for the coefficients (de- creasing the error I Tk -- xkl ) is given by the delta rule, 3

AW j -- 7IV k ( T k -- x k )f'(VkoWJ), (3a)

W j+ • = W j d- AWJ, (3b)

wheref' (.) denotes derivative with respect to the argument,j is the index of the training iteration, and k is the index of the (randomly chosen) training pair. Note that generally j-• k. Rather, k is a random function of the training iteration index j. The scalar v/is termed the learning rate. A discussion of the choice of v/will be provided later.

Various types of activation functions have been proposed in the literature. The most commonly used are the linear and the sigmoid-type activations, respectively, given as

f(y) = y, (4a)

f(y) = tr(y) = 1/( 1 d- e -y) - •. (4b)

The linear activation yields the Arialine, 1 whereas the unit-step perceptton 2 is obtained by setting f(y) = tr(ay) with a-, oo. In the unit-step perceptron, the delta rule does not apply, due to the irregularity at the origin. However, we shall not deal with this situation here. Our main concern will

2082 J. Acoust. Soc. Am., Vol. 90, No. 4, Pt. 1, October 1991 Steinberg ot a/.: Neural network localization 2082


be the application of the linear and sigmoid activations only and a comparison of their performances.

The literature cites many other nonlinear functions that have been used in place of the sigmoid function in Eq. (4b). To avoid trapping the system in local minima during training, one generally requires f(y) to be a nondecreasing function ofy. There is no theory of which we are aware that may be used to determine which nonlinear activation should be

used in a particular application. (See, however, two recent articles by H. White •ø'• ). It will be assumed that the results obtained using Eq. (4b) are typical of those that we would obtain with other activations of the same type. As will be shown, the nonlinear activation sometimes gives a network with performance superior to the linear network when convergence of the linear network is marginal. To assess the effectiveness of nonlinear systems, numerical simulations will be performed.

B. Single-layer configuration

The single-layer configuration is depicted in Fig. 2. The input vector V is fed to a series of the neurons, each having its own weight-vector W, teacher, and activation. There are no interconnections between the neurons, and each unit can be considered as operating separately. Accordingly, the inter- pt analysis of the single neuron, as •,,•s•n•u •n Sec. I C, also can be applied here.

C. Numerical interpretation o• t• n•uron

In Sec. I A, the neuron was viewed as a learning machine. In this section, another interpretation of the neuron is provided that might be helpful in understanding its basic

convergence properties. First, as assumed in most practical situations, a finite amount of independent training data exists: (Ti,V i), i = 1,2, ..., M. That is, M training pairs of teachers and input vectors are available. If each training pair is considered only once, training the system until convergence is achieved can be interpreted as seeking the "correct" vector of coefficients W c defined as the solution of the follow-

ing set of simultaneous equations:

V i,WC _ ui, i = 1,2,...,M, (5)

where the right-hand elements are given by the inverse of the activation function, evaluated at the teacher. For linear activation,

ui = Ti. (5a)

For the sigmoid activation,

u, = ln[ Ti/( 1 -- T, ) ]. (5b) In matrix notation,

A

VW c = u, (5c)

where the rows of the training matrix V are the training vectors V i. The training phase may now be interpreted as an iterative numerical solution of the set in Eq. (5) where each training pair is used many times. A single training iteration is manifested by applying the delta rule [ Eqs. (3a) and (3b) ] to a randomly chosen row in Eq. (5).

Special care must be taken in using the term "solution" for the weights W c. The training matrix has M rows and N columns, corresponding to the number of independent training pairs and the number of elements in each training vector, respectively. Three distinct cases for the values of M and N will be examined.

Ti

FIG. 2. The single-layer configuration.

L Case 1: M=N

In this case, V is a square, nonsingular matrix. The exact solution exists and the neuron seeks for it iteratively. Since the entire process can now be simply viewed as solving a matrix equation, one'need not use a neural network to perform the inversion. However, the case M = N sheds light on the case M -• N, whose results are of most interest. We have found from experience that, whenever the matrix is efficiently inverted by iteration, the case M -• N (with similar parameters) also proceeds efficiently. It is further noted that the neuron is capable of solving Eq. (5) with ill conditioned training matrices. A brief discussion of the convergence properties of the neuron is now provided.

Let us define the difference between the coefficients at

iterationj and the correct coefficients as

lY=Wc-w •. (6)

By substituting Eq. (6) into Eq. (3), and making a first- order approximation of the activation function around the correct coefficients, a recursive set of equations for the difference IY can be found where

IY= lZl I•D ø, (7) /=1

with

2083 J. Acoust. Soc. Am., Vol. 90, No. 4, Pt. 1, October 1991 Steinberg eta/.' Neural network localization 2083


1•I'-- i- 'f/,•'k(,)[•'k(..,)]T, (7a) /7, = r/[ f'(Vk(,,) .W c) ]2, (7b)

where i is the unity matrix. Here, '•k(i) represents an N X 1 matrix whose single column is the randomly chosen training

ß

vector at the training iteration l. (That is, k is a random function of the iteration index l). We note that, for the linear case, the first-order approximation is exact.

Equation (7) describes the evolution of D J in terms of a series product of the elementary matrices l•I( It has been shown that a sufficient condition for the convergence of the product

J

II fi' 1=1

to the null matrix, as j-, oo, is •2

o < < 2/[ Therefore, if the training vectors and the activation function are bounded, one may set a value for the learning rate r/that will guarantee the convergence of Eq. (7) to the null matrix and, accordingly, the convergence of Eq. (7) to W J to W c.

2. Case 2: M> N

Since the training vectors are assumed to be independent, an exact solution may not exist. The neuron then converges to a vector W* for which

A

VW* -- u -- • •0 (9)

It is well known that the optimal approximate solution, in terms of the minimization of the norm of • is given by

A

Wo:pt : V + U, (10) A A

where V + is known as the pseudo inverse of V. Various al- gorithms were developed for the calculation of pseudo in-

A

verses, but the'quantity of most interest is V + u. The linearized delta rule can be identified as an iterative method for

A

finding V + u. 4

3. Case 3: M<'N

In this case, the problem is overdetermined from the point of view of matrix inversion. If, however, the system is noiseless, any Mcomponents of the input vector may be used and the matrix inversion proceeds exactly as in the case M = N. For the neural-network iteration procedure, use is made of all N components of the input. When noise is present, this added information is expected to give a more robust estimate of the true parameter value. However, as will be

SOURCE LOCATIONS DETECTORS l ß 1 NEURON

WI

ß

L L

Me N

FIG. 3. Configuration of the one-dimensional localization system.

2084 J. Acoust. Soc. Am., Vol. 90, No. 4, Pt. 1, October 1991

seen in Sec. II, the extra information does not necessarily give a better prediction for the source locations for which the network has not been trained.

Finally, we note that the algorithm presented in this section is useful in the presence of noise or indeterminancy in the location of the training points. It is desirable to train the network by having it experience the full range of noise or location variations. The indeterminancy can be simulated by letting the training source coordinates vary randomly, in ac- cordance with a uniform distribution, in small neighbor- hoods around the source points during the training phase. •3 The noise can be simulated by adding independent Gaus- sian-distributed variates to the training vectors during the training phase. 7

II. SIMULATIONS

A. Single-unit system

A single neuron was trained to evaluate the depth coordinate of a point source. The configuration is shown in Fig. 3. A one-dimensional array of N equally spaced detectors is connected to the neuron, each detector via a corresponding weight. The training data is generated by a 1-D array of M equally spaced sources located a distance "z" away. These locations are the training points. The lengths of the source and detector arrays are Ls and Ld, respectively. The acoustic medium is assumed to be infinite and to have a constant

sound speed. The complex acoustic field at the detector array is given

by

•u = C exp[ i(kR u -- rot) ]/R u, ( 11 ) where

Ri J __ [z2 _}_ (s i _ dj )2] ,/2. (12) Here, C is a constant term, k is the wave number, and R u is the distance from source location si to detector location dj.

Instead of using the field directly, we use the field •u only to determine the relative phases. The phase-difference data is closely related to the coherence function:

Fi(dj,d,,z) = (•ij(t)•i• (t) ), (13) where the brackets (') indicate a time average. The function I' i (dj,d I ,g) is the coherence of the fields at detector points '•/" and "1" as a result of a source at location "t". If the amplitude is normalized to unity at each detector, the real part of F i (dj,dl ,z), denoted Fri (dj,d• ,z), yields

I'• ri (dj,d I ,z) = cos (gb/j). (14) We shall use Fri(dj,d• ,z) for the actual inputs to the

neural network. Consistent with the adopted notation,

V i --- [ Vil ,Vt2 ,...,ViN ], (15a)

v u = cos(•u )' (15b) For a general source location-detector configurations,

ambiguities can arise in training the network unless sin (•u) is also used as an input (which is the imaginary part of the normalized coherence function). To avoid the ambiguity, the configuration is chosen so that the uppermost source location and the uppermost detector are at the same depth. For convenience, we set d• = st = 0.

Steinberg eta/.' Neural network localization 2084


A single training iteration consists of choosing a source location (e.g., the ith with depth si ) at random. The vector V given in Eqs. (15a) and (15b) is used as an input to the system. The actual output of the system is compared with the teacher Ti and the coefficients are modified according to the delta rule. Normalized depth coordinates are used. Accord- ingly, the teacher Ti is given by

Ti - s•/Ls. (16)

TABLE I. Configuration of the simulation runs.

Experiment M N Violation

1 51 51

2 35 51

3 25 51

4 51 35

5 51 31

6 351 35

(1)

(3) (Weak violation )

(2)

B. Resolution and sampling conditions

The detector array shown in Fig. 3 must meet several conditions in order to (1) detect sources and (2) resolve sources from adjacent positions. Intuitively, it is also clear that if the system is expected to properly interpolate (to correctly locate sources other than those of the training set), the distance between the training points must be small. These requirements can be expressed quantitatively by the inequalities,

•rA-•--< <2LsLa Az 1 + •< (N- 1 ), (17)

where the first, second and third inequalities (counted from left to right) represent the interpolation requirement, the resolution power of the detector array, and the Nyquist sampling condition for the field incident upon the detector array, respectively. A detailed derivation and the underlying as- sumption are found in the appendix. Equation (17) is based purely on classical wave propagation and sampling theory considerations. The network that is connected to the detec-

tors may introduce additional considerations, such as the value of the learning rate or the number of hidden units in cases of multilayer networks, as discussed in Sec I.

To define a source location-detector configuration, three nondimensional parameters are necessary. We have chosen Os = Ls/z, a• = Ls/La, and 0:2 -- 2LsLa/Az as the parameters. When Os • 1 and La/z• 1, then the lowest-order approximation to •0 is given as

Oij -- -- ksidj/z -•- kd f /z. ( l 8) If, in addition, kdJ/z • 1, then the only nondimensional parameter of importance is a 2. This situation reduces to the problem of determining the direction 0• = s•/z of a plane wave impinging on the array. To satisfy the conditions in Eq. (17), only the specification of a2, O s, M, and N are necessary.

C. Results

Six different configurations were simulated. Three cases satisfy all the inequalities in Eq. (17), and three violate a single inequality. For all cases, a2 = 33.33, z = 5400 m, A = 1 m, Ls = 1500 m (©s = 0.28), and La = 60 m. The configurations are provided in Table I. The number of the inequality that is being violated, counted from left to right in Eq. (17), is clearly indicated. Thus ( 1 ) indicates a violation of the interpolation condition, (2) indicates a violation of

the resolution condition, and (3) indicates a violation of the sampling condition.

Each configuration was applied to both linear and sigmoid neurons. After every 1000 training iterations, we checked the ability of each system to predict the normalized depth of a point source located on the training points and on the interpolation points (exactly between the training points). An iteration is defined here to be one presentation of a single training pair. The performance of the linear and sigmoid neurons, measured in terms of the standard deviation between the exact and predicted depths as a function of the training iterations, are shown in Fig. 4 (a) for configuration 1. Solid and dashed lines represent the trained and interpolated points, respectively, and curves marked with an as- terisk correspond to the sigmoid neuron. In all curves, the standard deviation tr is calculated according to the formula

or2-- 1 m M • (dp, r,) 2 (19) where the correct normalized depth is given by the teacher Ti [cf., Eq. (16) ], and dv• is the normalized depth as predicted by the system. The normalized separation between adjacent source locations is 1/(M- 1 ).

The simulation results in Fig. 4(a) show that after 2000 training iterations, both the linear and sigmoid neurons achieve standard deviations on the order of the (normalized) source separation (0.02) and below. Additional iterations generally improve the prediction accuracy, although it is noted that in the linear network, a has achieved a local minimum. It should be emphasized that this behavior does not cause the results to diverge. We have generally observed that whenever the conditions in Eqs. (8) and (17) are met, the system will not diverge even though a temporary increase in a may take place.

Different learning rates, r/r -- 0.02 and r/s = 0.35, were used for the networks using the linear and sigmoid activation function, respectively. These learning rates were found to give the best results after 5 000 training iterations. It is observed also that r/r satisfies the convergence condition in Eq. ( 8 ). [ The elements of the training vectors in Eqs. (15a) and (15b) are bounded by unity]. For larger values of r/L (ap- proximately 0.04), the linear neuron diverges. The nonlinear neuron is less sensitive to the choice of its learning rate, r/•. In addition, since the initial weights during the first few training cycles are far from the "correct weights" defined by Eq. ( 5 ), the linearized analysis leading to Eq. (8) is not valid at the early stage of the training phase of the sigmoid neuron.

2085 J. Acoust. Soc. Am., Vol. 90, No. 4, Pt. 1, October 1991 Steinberg eta/.: Neural network localization 2085


Linear and $igmoid(,) Neurons solid lines: trained points

o.o2o dashed lines: interpolation points •/ lin. = .02 •/ sig. = :35 .

• M = 51 (source locations) 1.2o • N = 51 (detectors)

ø'ø15 I L"=1500m' L'=e0m' z=5400m' )'=lm • I, • o.8o

0.010 ..5%

• 0.40

0 0.005 --

•' '" •' *" '-' •' --- •. --.. •-0.00

0.000 ......... I ......... i ......... 0 10 20 5•0 -0.40

Training iterations (Thousands)

Linear and Sigmoid (,) Neurons Predicted locations after 50000 training iterations 25 interpolated points, 25 extrapolated points 51 detectors

L,=1500m, Ld=60m, z=5400m, 7•=1m.

I Interpolation Extrapolation - region • I •region

Source location number

e 0.80 N

._

E o •- 0.60

o

E• 0.40 o

(9 0.20 ,_

0.00

Linear and Sigmoid (,) Neurons. Predicted locations after 50000 training iterations 51 trained points, 50 interpolated points 51 detectors

1.00 - L,=1500m, ¬=60m, z=5400m, :X=lm.

hi I , T I [ I [ ! I I I I I [ [ ] ! I I [ [ [ ] [ [ ] [ [ [ I [ I I [ ! I I I I I

0 20 40 60 80 100


FIG. 4. Simulation results for configuration 1. (a) standard deviations, (b) prediction of the normalized depth after 30 000 training iterations, for the training and interpolation points, and (c) extrapolation test after 30 000 training iterations.

Indeed, best convergence at this stage was achieved for a much higher rate. However, this does not make the result in Eq. (8) useless for the sigmoid neuron, since one may still use it as an ad hoc "sufficient but not necessary" convergence condition. Finally, we suspect that the optimal choice of •/,• and •/s may be very "case sensitive." Nevertheless, we shall use the •/,• = 0.02 and •/s = 0.35 for the simulations since satisfactory results were obtained when Eq. (17) was not violated.

Figure 4(b) presents the predicted results of the (normalized) depths of 2M-- 1 = 101 source locations distributed evenly over the source array, as obtained by the system after 30 000 training iterations (solid line and solid line with * corresponds to the linear and sigmoid activations, respectively). Odd- and even-numbered source location numbers correspond to trained and interpolated points, respectively.

A perfectly accurate prediction would appear as a straight line ranging from 0-1. It is generally observed that the system performs better at the inner region of the prediction zone.

The single-neuron network is not capable of extrapolation, as the simulation results in Fig. 4(c) suggest. The linear and sigmoid NN systems obtained after training with configuration 1 were used to estimate the depths of 50 source location points. The first 25 source locations are in the interpolation zone and the rest are in the extrapolation zone.

The network is trained on a smaller number of source

locations in configurations 2 and 3. The interpolation inequality of Eq. (17) is satisfied for configuration 2, but not configuration 3. Satisfactory results are obtained for configuration 2, as shown in Fig. 5. However, the results for configuration 3 show that the interpolated points of the linear

2086 J. Acoust. Soc. Am., Vol. 90, No. 4, Pt. 1, October 1991 Steinberg et al.: Neural network localization 2086


0.020

0.015

0.010

O.O05

0.000

Linear and Sigmoid(,) Neurons solid lines: trained points

- • dashed lines' interpolation points \\ •? lin. = .02 •1 sig. = .55

, M = 35 tc•ource locations) \ N = 51 erectors) \L.=1500m, Ld=60m, z=5400m, X=lm

\\\

.

! ! i ! ! i ! ! ! I i i ! i i ! i ! ! I ! ! ! ! ! ! ! ! i

0 10 20 5•0 Treinin9 iterations (Thousands)

0.080

0.060

O' 0,040

0.020

0.000

Lineor ond Sigmoid(,) Neurons solid lines: troined points dosbed lines: interpolotion points •/ lin. = .02 •/ sig. = .55 M = 25 (source locations) N = 51 (detectors) L, =1500m, Ld=60m, z=5400m, X=lm

i i i i i i I i i bl i i i i i i i I [ i i I i i i i I i I 0 1 20 50

Trainin9 itctel. ions (Thousands)

FIG. 5. Simulation results for configuration 2. standard deviations.

network no longer perform satisfactorily. For the nonlinear network, the interpolated points are marginally satisfactory as depicted in Fig. 6(b), and significantly worse than those in configuration 2. (Note the difference in the scales of the graphs). Reducing M still further yields rapidly deteriorat- ing performance results for both the linear and nonlinear networks.

Figures 7 and 8 (a) depict the simulation results for configurations 4 and 5, respectively. In configuration 4, the sampling inequality is satisfied while in configuration 5, it is not. The increase in •r from configuration 4 to 5 is dramatic. Re- ferring to Fig. 8(b), it is observed that the increase in •r occurs for values of si that are close to Ls. That is, the radi- ation from a source location at large angles is not adequately sampled by the detector array.

Figure 9 depicts the simulation results for configuration 6, where the resolution condition is violated. The results are similar to those found in Fig. 7. In this case, however, the nondimensional source location separation distance is 0.003, and not 0.02. The extra training locations did not improve the network performance. The relative error (the ratio between the standard deviation and the training source separa- tions) increases.

Additional simulation results were obtained for configurations with wider angle coverage (Ls = 3000, Ld = 30). The performance results obtained were essentially the same.

D. Multilayer network

We have only discussed a single-layer neural network thus far, which is consistent with our goal to solve the source localization problem using one of the simplest networks available. However, it is natural to ask if improved results could be obtained using multilayer networks governed by the back-propagation algorithm. A multilayer network is a generalization of the single-layer network and is discussed at

1.00

0.80

&o.6o

'•0 0.40 0

-• 0.20

0.00

Linear and Sigmoid (,) Neurons. Predicted locations offer 30000 training iterations 25 trained points, 24 interpolated points 51 detectors

_ L,=1500m, L,=60m, z=5

0 10 20 50 40 50


FIG. 6. Simulation results for configuration 3. (a) standard deviations. (b) prediction of normalized depth.

length by Rumelhart, et al. 3 A series of experiments were run using a two-layer network with 8 nodes in the hidden layer. Using the phase difference data of Eq. (15b) as input produced similar performance results when comparing the single- and two-layer networks.

Additional experiments were performed using the normalized field quantity,

•) ij = COS ( kR ij )' ( 20 ) as the input in place of the phase difference data. The performance results were often unsatisfactory unless (Ds ,• 1. An expansion of Ri• from Eq. (12) yields

ks, 2. kd ] ks ,dj (21) kRi• = kz + •--} 2z z '

2087 d. Acoust. Sac. Am., Vol. 90, No. 4, Pt. 1, October 1991 Steinberg et aL' Neural network localization 2087


0.020 -

0.015

o' O.OLO

0.005

0.000

Linear and Sigmoid(.) Neurons solid lines: trained points dashed lines: interpolation points •/ lin. = .02 •/ sig. = .35 M = 51 (source Iqcations) N = 35 (detectors)

tlxx L, =1500m, Ld=60m, z=5400m, ),--lm

[ ! ! i ! i ! i i ioi [ i [ ! i ] i i I i i i ! i i i i ! ] o 1 20 30



We found that if kL s•/2z > 5, the performance of the single- layer network began to significantly degrade. However, use of a two-layer network with 8 nodes in the hidden unit produced satisfactory results for kL s:/2z as high as 40.

E. The bearing problem

When ©s is not small, R e may still be expanded in a binomial series provided Ld/z,• 1. Thus

kR e -- kRi + kd]/2Ri -- k sin O•d;, (22) where R• -- (z 2 + s•)•/2 and sin(0•) = s•/Ri.

To determine the bearing 0• of a source, the input to the network is usually specified as

v e = exp( -- ik sin O•d• ). (23)

The effect of the term kR• -F kd•/2R• is not considered. While in many cases, kd 2 •/Ri ,• 1 and thus can be neglected, R• can never be replaced by z when Os is not small. In many analyses, the effect of the kR• term is omitted by considering phase differences, as we did in our analysis [cf, Eq. (15b) ]. Based on our experience, we expect that if the term R• is retained in the input to the network, it will seriously degrade the network performance.

In our analysis, we retained the spherical nature of the phase front impinging on the detectors. The distance R e was not expanded in a binomial series to simplify the analysis. However, we emphasize again that phase differences were used in our analysis with the single-layer network.

III. CONCLUSIONS

A very simple neural network has been successfully used to solve the inverse problem of determining the source location from measurements of the acoustic phase difference, where the medium was assumed infinite with a constant

0.020 -

0.015

Q' 0.010

0.005

0.000

/\

Linear and Sigmoid(.) Neurons solid lines: trained points dashed lines: interpolation points •/ lin. = .02 ß / sig. = .35 M = 51 (source locations) N = 31 [,detectors) Ls =1500m, Ld=60m, z=5400m, X=lm

! i i I i i i i i I i i ! [ [ ! ! i [ bl ! i [ i ! i [ i I 0 10 2 30


1.00 '•----•0.80

N ._

E o c: 0.60

c-

o U 0.40 o

o 0.20 ._

0.00

Lineor ond Sigmoid (,) Neurons. Predicted locations after .30000 training iterations 51 trained points, 50 interpolated points 31 detectors L,=1500m, Ld=60m, z=5400m,

o ........ •z'o' ....... :•'• ....... b'• ....... g'o' ....... •'•o Source location number

FIG. 8. Simulation results for configuration 5. (a) standard deviations. (b) prediction for normalized depth.

sound speed. The network was trained using M distinct source locations and then used to predict the trained and interpolated locations. The network performed very well when the inequalities given in Eq. (17) were satisfied. If nonhomogeneous media are to be treated, these conditions must be modified to account for the curved ray trajectories.

We have previously reported •3 a preliminary investigation in which we used a similar method to calculate the in-

tensity distribution of a linear array of incoherent sources from coherence measurements at a number of detectors. The

method was successful for both the constant sound-speed case and for the case when the sound-speed profile was quadratic. In the latter case, the network could also predict the



0.020

0.015

O' 0.010

0.005

0.000

Lin and Sigmoid(,) Neurons sol•grlines: trained points - dashed lines: interpolation points

•/ lin. = .02 •/ sig. = .,55 M = 351. (source locations) N = 55 (detectors)

_ / •\ OOm, Ld=60m, z=5400m, X=lm _

: --'--'--'- _

! ! i ! ! i ! i i0! ! ! i i i ! ! I I ! ! ! ! i I ! i i I 1 20 30



profile curvature. Since the ray trajectories for the quadratic profile form simple wavefronts, satisfactory results were easily obtained.

At this stage, it is difficult to make general statements about the advantages of the linear versus the sigmoid activation function in a single-layer network. However, it has been observed that the use of the sigmoid activation function allows the neural network to degrade more slowly as the conditions in Eq. (17) are violated.

ACKNOWLEDGMENTS

The authors are grateful to the Associate Editor and the reviewers for their helpful comments. This research has been supported in part by the Office of Naval Research, under contract N00014-14-89-J- 1666.

APPENDIX: DERIVATION OF THE RESOLUTION

CONDITIONS FOR THE SOURCE AND DETECTOR ARRAYS

Any detection system is usually required to meet three resolution conditions:

( 1 ) The distance between the source locations is within the resolving power of the detector array.

(2) The system is able to interpolate and predict correctly the depth of a point source located between the trained points. That is, it is necessary to avoid the ambiguity caused by detector sidelobes.

(3) The field samples created by the detectors array are dense enough to correctly measure the highest spatial frequency that may be created by the source array.

We shall assume throughout that the detecting system measures the phase difference

½(s,d) = k{[z 2 + (s- d)2] '/2- (z 2 + s2) '/2}

O<S<Ls; 0<d<Ld, (A1)

where, for convenience, the source location coordinate s and the detector coordinate d may vary continuously. Note that, with the source location-detector configurations that are considered in this article, the plane-wave approximation is not valid. The spherical shape of the wave must be taken into account.

The conditions are now treated separately. Condition a: The phase change along the entire detector

array due to a small shift As in the source location is given by

c9½(S,Ld ) A½s = As = k [sin O(s,La) -- sin O(s,O) ]As,

(A2)

where 0(s,d) is the inclination angle of the line connecting the source location s and the detector location d,

sin O(s,d) = (s-d)/[z 2 + (s-d)2] •/2. (A3)

The function A½s depends upon the source location s. The minimum value ofthe magnitude of A½s for Ls > La (typical of the configurations we used) occurs when s = Ls. Thus

min[A½s [ = kAsl(Ls -- L•)/[z • + (Ls -- Ld)2] 1/2

-- Ls/[Z 2 +L•]'/2. (A4) For the source location shift As to be resolvable requires

that A½s be measurable. In practice, this depends strongly on the noise level and on the sensitivity of the detecting system. As an order ofmagnitude criterion, we require that the'phase changes be larger than or equal to one radian, although in numerical simulations the phase change may be much smaller than that value. This point is explored in the numerical simulation.

Since the order of magnitude condition is not expected to impose sharp boundaries on the system performance, Eq. (A4) may be simplified. Use is made of the fact that in the configurations explored, z>> Ld. When z>> L d, L d may be set to zero in Eq. (A4). This yields

mini A½s I •kAsLd/(Z 2 + L s 2) 1/2> 1. (A5) Condition b: To avoid ambiguity that might be caused

when the system is required to interpolate, one must also require that the maximum value of IA½s I will not be larger than ½r. One may verify that the largest value occurs when s = Ld/2. Therefore, we require

maxl,X½ I- kAsLa/[z 2 + L ,•/4] '/2•kAsLa/z<•r, (A6)

where we have again assumed that L d ,•z. Condition c: The spatial frequency of an incident wave,

as measured by the detector array, is given by

f= &b(s,d) = _ k sin O(s,d). (AT) c9d

The.corresponding spatial period is 2•r/f This period should be twice as large as the distance between the detectors Ad. This is recognized as the Nyquist sampling condition. There- fore, it is required that

min[A/I sin O(s,d)I ]>2ad. (A8)



Since the maximum value of sin O(s,d) occurs when d = 0 and s = L s, this yields

2 [z 2 + L s • ] •/2/Ls•2Ad. (A9) Substituting As = Ls/(M -- 1 ) into Eqs. (AS) and (A6), and Ad = La/(N-- 1 ) into Eq. (A9), we finally obtain

2L•La (M-- 1) 2L•La •< < <(N- 1).

•cAz •c Az[ 1 + (L•/z) • ] 2/2 (A10)

Equation ( A 10) is denoted Eq. ( 17 ) in the main body of the paper.

B. Widrow, "Generalization and information storage in networks of Ada- line neurons," in Self-Organizing Systems 1962, edited by M. C. Yovits, G. T. Jacobi, and G. D. Goldstein (Spartan, Washington, DC, 1962). F. Rosenblatt, Principles of Neurodynamics.' Perceptrons and the Theory of Brain Mechanisms (Spartan, Washington, DC, 1961 ). D. E. Rumelhart, J. L. McClelland, and the PDP Research Group, Paral-

lel Distributed Processing, Exploration in the Microstructure of Cognition, (MIT, Cambridge, 1986), Vol. 1. T. Kohonen, Self-Organization and Associative Memory (Springer Ver- lag, New York, 1989), 3rd edition. Proceedings of the IEEE, Special Issue on Neural Networks, I: theory and modeling, September, 1990. Proceedings of the IEEE, Special Issue on Neural Networks, II: math- ematical analysis, related topics, implementations, and applications, Oc- tober, 1990.

R. P. Lippmann, "An introduction to computing with Neural Nets," IEEE ASSP Mag. (1987). R. Rastogi, P. Gupta, and R. Kumaresan, "Array signal processing with interconnected neuron-like elements," Proc. ICASSP, 2328-31 (1987). S. Jha, R. Chapman, and T. Currani, "Bearing estimation using neural networks," IEEE (1988). H. White, "Some asymptotic results from learning in single hidden-layer feedforward network models," J. Am. Stat. Assoc. 84, 1003-1013 (1989). H. White, "Neural network learning and statistics," AI Expert, Decem- ber, 48-52 (1989). E. Oja, "On the construction of projectors using products of elementary matrices," IEEE Trans. Comp. C-27, 1 (1979). M. J. Beran and B. Z. Steinberg, "The use of linear and non-linear activation in neural networks developed for field reconstruction," J. Acoust. Soc. Am. Suppl. 1 87, S 154 (1990).

2090 J. Acoust. Soc. Am., Vol. 90, No. 4, Pt. 1, October 1991 Steinberg et al.: Neural network localization 2090


a neural network approach to source localization

Documents