approximation capability to functions of several variables

25

Upload: dotu

Post on 07-Jan-2017

236 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Approximation Capability to Functions of Several Variables

Accepted for Publication by IEEE Transactions on Neural Networks.Approximation Capability to Functions of SeveralVariables, Nonlinear Functionals and Operators by RadialBasis Function Neural NetworksTianping Chen1 and Robert Chen2AbstractThe purpose of this paper is to explore the representation capability ofradial basis function (RBF) neural networks. The main results are: (1) Thenecessary and su�cient condition for a function of one variable to be quali�edas an activation function in RBF network is that the function is not an evenpolynomial. (2) The capability of approximation to nonlinear functionals andoperators by RBF networks is revealed, using sample data either in frequencydomain or in time domain, which can be used in system identi�cation by neuralnetworks.Key words and phrases: Approximation theory, neural networks, systemidenti�cation, radial basis networks, functionals, operators, wavelets.1The author is with the Department of Mathematics, Fudan University, Shanghai, P.R.China.His work has been supported in part by NSF of China, Shanghai NSF, the \Climbing" Project inChina, Grant NSC 92092, and the Doctoral Program Foundation of Educational Commission inChina.2The author is with VLSI Libraries, Inc., 1836 Cabrillo Ave., Santa Clara, CA 95050, USA.

Page 2: Approximation Capability to Functions of Several Variables

1 IntroductionThere have been several recent works concerning the representation capabilitiesof multilayer feedforward neural networks. For example, in the past few years, sev-eral papers ([1]-[7] and many others) related to this topic have appeared. They allclaimed that a three-layered neural network with sigmoid units on the hidden layercan approximate continuous or other kinds of functions de�ned on compact sets inRn. In many of those papers, sigmoidal functions need to be assumed to be con-tinuous or monotone. Recently [9] [10], we pointed out that the boundedness of thesigmoidal function plays an essential role for its being an activation function in thehidden layer, i.e., instead of continuity or monotonity, the boundedness of sigmoidalfunctions ensures the network capability.In addition to sigmoidal functions, many other functions can be used as activationfunctions of neural networks. For example, in [12], we proved that for a functiong 2 C(R1) \ S 0(R1) to be an activation function in feedforward neural networks, thenecessary and su�cient condition is that the function is not a polynomial (also see[5]).The above papers are all signi�cant advances towards solving the problem ofwhether a function is quali�ed as an activation function in neural networks. How-ever, they only dealt with a�ne-basis-function neural networks (ABF), also calledmultilayer perceptrons (MLP) and the goal there is to approximate continuous func-2

Page 3: Approximation Capability to Functions of Several Variables

tions by the family NXi=1 ci g(yi � x+ �i)where yi 2 Rn, ci; �i 2 R1 and yi � x denotes the inner product of y and x.Among the various kinds of promising neural networks currently under activeresearch, there is another type called radial-basis-function (RBF) networks [13] (alsocalled localized receptive �eld network [16]), where the activation functions are radiallysymmetric and produce a localized response to input stimulus. For a survey, see, forexample, [15]. A block diagram of an RBF network is shown in Fig. 1. One of thespecial basis functions that are commonly used is a Gaussian kernel function. Usingthe Gaussian basis function, RBF networks is capable of forming an arbitrarily closeapproximation to any continuous functions, as shown by [17] [18] [19].More generally, the goal here is to approximate functions of a �nite number ofreal variables by NXi=1 ci g(�ikx� xikRn)where x 2 Rn and kx � xikRn is the distance between x and xi in Rn. Here, theactivation function g is not necessarily Gaussian. In this direction, several resultsconcerning RBF neural networks were obtained [13] [14]. In [14], Park and Sandbergproved the following important theorem:Let K : Rn ! R be a radial symmetric, integrable, bounded function such thatK is continuous almost everywhere and RRn K(x)dx 6= 0, then the familyNXi=1 ci g(kx� xikRn� )3

Page 4: Approximation Capability to Functions of Several Variables

is dense in Lp(Rn), where g(kxkRn) = K(x).In [23], Park and Sandberg discussed several related results on the L1, L2 approx-imation. For example, they proved the following very interesting theorem:Theorem A.Assuming that K: Rn ! R1 is a square-integrable function, then the familyNXi=1 ci g(kx� xikRn� )is dense in L2(Rn) if and only if K is pointable.In [24], we generalized this result and provedTheorem B.Suppose that g: R+ ! R1, such that g(kxkRn) 2 L2(Rn), then the family offunctions NXi=1 ci g(kx� xikRn�i )is dense in L2(Rn).It is now natural to ask the following questions: (1) What is the necessary andsu�cient condition for a function to be quali�ed as an activation function in RBFneural networks? (2) How to approximate nonlinear functionals by RBF neural net-works? (3) How to approximate nonlinear operators (eg. the output of a system)by RBF neural networks, using sample data in frequency (phase) domain or in time(state) domain?The purpose of this paper is to give strong results in answering those questions.4

Page 5: Approximation Capability to Functions of Several Variables

This paper is organized as follows. In Section 2, we list our symbols, notationsand review some de�nitions. In Section 3, we show that the necessary and su�cientcondition for a continuous function in S 0(R1) to be quali�ed as an activation functionin RBF networks is that it is not an even polynomial. In Section 4, we show thecapability of RBF neural networks to approximate nonlinear functionals and operatorson some Banach space as well as on some compact set in C(K), where K is a compactset in any Banach space. Furthermore, we establish the capability of neural networksto approximate nonlinear operators from C(K1) to C(K2). Approximations usingsamples in both frequency domain and time domain are discussed. Examples aregiven, which includes the use of wavelet coe�cients to the approximation. It is alsopointed out that the main results in Section 4 can be used in computing the outputsof nonlinear dynamical systems, thus identifying the system. We conclude this paperwith Section 5. 2 Notations and De�nitionsWe list here the main symbols and notations that will be used throughout thispaper.X: some Banach space with norm k � kX.Rn: Euclidean space of dimension n with norm k � kRn .K: some compact set in a Banach space.5

Page 6: Approximation Capability to Functions of Several Variables

C(K): Banach space of all continuous functions de�ned on K, with normkfkC(K) = maxx2K jf(x)j. V : some compact set in C(K).S(Rn): All Schwartz functions in distribution theory, i.e., all the in�nitely di�er-entiable functions, which are rapidly decreasing at in�nity.S0(Rn): All the distributions de�ned on S(Rn), i.e., all the linear continuousfunctionals de�ned on S(Rn).C1(Rn): All in�nitely di�erentiable functions de�ned on Rn.C1C : all in�nitely di�erentiable functions with compact support in Rn.We review the following de�nitions.De�nition 1. A function � : R1 ! R1 is called a (generalized) sigmoidalfunction, if it satis�es ( limx!�1 �(x) = 0 ;limx!1 �(x) = 1 : (1)De�nition 2. Let X be a Banach space with norm k �kX . If there are elementsxn 2 X, n = 1; 2; : : :, such that for every x 2 X there is a unique real numbersequence an(x), such that x = 1Xn=1 an(x)xn ;where the series converges in X, then fxng1n=1 is called a Schauder basis in X and Xis called a Banach space with Schauder basis.6

Page 7: Approximation Capability to Functions of Several Variables

De�nition 3. Suppose that X is a Banach space, V � X is called a compactset in X, if for every sequence fxng1n=1 with all xn 2 V , there is a subsequence fxnkgwhich converges to some element x 2 V . It is well known that if V � X is a compactset in X, then for any � > 0, there is a �-net N(�) = fx1; : : : ; xn(�)g, with all xi 2 V ,i = 1; : : : ; n(�), i.e. for every x 2 X, there is some xi 2 N(�) such that kxi�xkX < �.3 Characteristics of Continuous Functions asActivation in RBF NetworksIn this section, we show that for a continuous function to be quali�ed as anactivation function in RBF networks, the necessary and su�cient condition is thatit is not an even polynomial and we prove two approximation theorems by RBFnetworks.More precisely, we proveTheorem 1 Suppose that g 2 C(R1)\S 0(R1), i.e., all those continuous functionssuch that sR1 g(x)s(x) dx makes sense for all s 2 S(R1), then the familyNXi=1 cig(�ikx� yikRn)is dense in C(K), if and only if g is not an even polynomial, where K is a compactset in Rn, yi 2 Rn, ci, �i 2 R1, i = 1; : : : ; N .Theorem 2 Suppose that g 2 C(R1) \ S 0(R1) and is not an even polynomial, Kis a compact set in Rn, V is a compact set in C(K), then for any � > 0, there are7

Page 8: Approximation Capability to Functions of Several Variables

a positive integer N , �i 2 R1, yi 2 Rn, i = 1; : : : ; N , which are all independent off 2 V and constants ci(f) depending on f , i = 1; : : : ; N , such thatjf(x)� NXi=1 ci(f) g(�ikx� yikRn)j < � (2)holds for all x 2 K, f 2 V . Moreover, every ci(f) is a continuous functional de�nedon V .Remark 1. It is worth noting that �i and yi are all independent of f in Vand ci(f) are continuous functionals. This fact will play an important role in theapproximation to nonlinear operators by RBF networks.To prove Theorem 1, we need to prove the following Lemma which is of greatsigni�cance by itself.Lemma 1 Suppose that h(x) 2 C(Rn) \ S 0(Rn), then the familyNXi=1 ci h(�i�i(x) + yi)is dense in C(K) if an only if h is not a polynomial, where �i are rotations in Rn,yi 2 Rn, �i 2 R1, i = 1; : : : ; N .Proof. Su�ciency. Suppose that the linear combinations Pni=1 ci h(�i�i(x) + �i)are not dense in C[a; b], then by Hahn-Banach extension theorem of linear functionalsand Riesz representation theorem (see [6]), we conclude that there is a bounded signedBorel measure d� with supp(d�) � K andZ ba h(� �(x) + y) d�(x) = 0 (3)8

Page 9: Approximation Capability to Functions of Several Variables

for all � 2 R1, y 2 Rn and all rotation �. Pick any w 2 S(Rn), thenZRn w(y) dy ZRn h(��(x) + y) d�(x) = 0 (4)Changing the order of integral, we haveZRn h(u)du ZRn w(y) d�(��1(u� y� )) = 0 (5)which is equivalent to hh(t); w(t)d�(���1(t))i = 0 (6)for all � 2 R1, � 6= 0, rotations �, where h represents the Fourier transform of h inthe sense of distribution.In order for (6) to make sense, we have to show thatw(t)d�(���1(t)) 2 S 0(Rn) :In fact, w(t) 2 S(Rn). Moreover, since supp(d�) � K, it is easy to verify thatd�(t) = R e�it�x d�(x) 2 C1(Rn) and there are constants ck, k = 1; 2; : : :, such thatj@kd�(t)j � ck : (7)Thus, w(t)d�(t) 2 S(Rn).Since d� 6= 0 and d�(t) 2 C1(Rn), there is t0 2 Rn, t0 6= 0 and a neighborhoodO(t0; �) = fx : kx� t0kRn < �g such that jd�(t)j > c > 0 for all t 2 O(t0; �).Pick t1 2 Rn, t1 6= 0 arbitrarily. Let t0 = ��(t1), where � is a rotation in Rn.Then jd�(��(t))j > c for all t 2 O(t1; �=�).9

Page 10: Approximation Capability to Functions of Several Variables

Let w 2 C1c (O(t1; �=�)), then w(t)=d�(��(t)) 2 C1c (O(t1; ��)), because jd�(��(t))j >� and d�(��(t)) 2 C1(Rn). Therefore,hh(t); w(t)i = hh(t); w(t)d�(��(t)) d�(��(t))i = 0 : (8)Previous argument shows that for any t� 2 Rn, t� 6= 0, there is a neighborhood oft�: O(t�; �) such that hh(t); w(t)i = 0 (9)holds for all w with support supp(w) � O(t�; �), which means that supp(g) � f0g. Itis well known that a distribution is the Fourier transform of a polynomial if and onlyif its support is a subset of f0g: (see [24, Section 7.16]). Thus g is a polynomial.Necessity. If g is a polynomial of degree m, then all the functionsNXi=1 cig(�i�i(x) + yi)are polynomials of x1; : : : ; xn with total degree m, which, of course, are not dense inC(K). Lemma 1 is proved.Proof of Theorem 1.Suppose that g 2 C(R1) \ S 0(R1), then h(x) = g(kxkRn) 2 S 0(Rn) \ C(Rn) andNXi=1 ci g(�ikx� yikRn)= NXi=1 ci g(�ik�(x)� �(yi)kRn)= NXi=1 ci g(k�i�(x)� zikRn) (10)where zi = �i�(yi). 10

Page 11: Approximation Capability to Functions of Several Variables

From Lemma 1, we see that the family PNi=1 ci g(�ikx� yikRn) is dense in C(K),if and only if g(kxkRn) is not a polynomial in Rn, which is equivalent to that g is notan even polynomial. Theorem 1 is proved.Lemma 2 [20] Let V � C(K), then V is a compact set in C(K), if and only if (i)V is a closed set in C(K); (ii) there is a constant M such that kfkC(K) �M for allf 2 V ; (iii) for any � > 0, there is � > 0, such that jf(x0)� f(x00)j < � for all f 2 V ,provided that x0, x00 2 K and kx0 � x00kX < �.We are now ready to prove Theorem 2.Proof of Theorem 2.Let h(x) = e�kxkRn and h�(x) = ��nh(x� ) and(f � h�)(x) = ZK f(t)h�(x� t) dt : (11)It is easy to prove that for any � > 0, there is � > 0 such thatj(f � h�)(x)� f(x)j < �=3 (12)holds for all x 2 K and f 2 V .Writing (f � h�)(x) in Riemann sum approximately, we see that there are positiveinteger N , xj 2 Rn, constants c�j (f) depending on f continuously, i = 1; � � � ; N suchthat j NXj=1 c�j(f)e� 1� kx�xjk2Rn � (f � h�)(x)j < �=3 (13)for all x 2 K and f 2 V . 11

Page 12: Approximation Capability to Functions of Several Variables

Now, for every j = 1; : : : ; N , by Theorem 1 we can �nd another integer Nj,constants dij , �i, y(j)i 2 Rn, i = 1; : : : ; Nj such thatj NjXi=1 dijg(�ikx� y(j)i kRn)� e� 1� kx�xjk2Rn j < �3L (14)for all x 2 K, where L = supf2V PNj=1 jc�j (f)j.Combining inequalities (12) (13) and (14), we conclude that there are integer N ,vectors yi 2 Rn, ci; �i 2 R1, i = 1; : : : ; N such thatjf(x)� NXi=1 ci(f)g(�ikx� yikRn)j < � (15)for all x 2 K and f 2 V .Moreover, for every i, ci(f) is a continuous functional de�ned on V . Theorem 2is proved.4 Approximation to Nonlinear Functionals andOperators by RBF Neural NetworksIn this section, we show some results concerning capability of RBF neural networksin approximating nonlinear functionals and operators de�ned on some Banach space,which can be used to approximate outputs of dynamical systems using sample datain either frequency (phase) domain or time (state) domain.We �rst introduce one of our results.Theorem 3 Suppose that g 2 C(R1) \ S 0(R1) and is not an even polynomial, Xis a Banach space with Schauder basis fxng1n=1, K � X is a compact set in X, f is12

Page 13: Approximation Capability to Functions of Several Variables

a continuous functional de�ned on K. Then for any � > 0, there are positive integersM , N , yMi 2 RM , constants ci; �i 6= 0, i = 1; : : : ; N , such thatjf(x)� NXi=1 ci g(�ikxM � yMi kRM )j < � (16)holds for all x 2 K, where xM = (a1(x); : : : ; aM(x)) 2 RM , x =P1n=1 an(x)xn.To prove Theorem 3, we need the following two lemmas.Lemma 3 [21] Suppose that X be a Banach space with Schauder basis fxng1n=1,then K � X is a compact set in X if and only if the following two conditions aresatis�ed simultaneously: (1) K is a closed set in X; (2) for any � > 0, there is apositive integer M such that k 1Xn=M+1 an(x)xnkX < �holds for all x 2 K.Lemma 4 Suppose that K is a compact set in a Banach space X with Schauderbasis fxng1n=1. De�ne Kn = fPni=1 ai(x)xi; x 2 Kg and K� = K [S1n=1Kn, then Knis a compact set in Rn and K� is a compact set in X.Proof. It is easy to verify that Kn is a compact set in Xn (also in X), providedthat K is a compact set in X.Now, suppose fung1n=1 is a sequence in K�, then one of the following two casesoccurs: (i) There is a subsequence funkg1k=1 of fung1n=1 with all elements being in Kor in some �xed Kn; (ii) There is no such subsequence.13

Page 14: Approximation Capability to Functions of Several Variables

In case (i), it is obvious there is another subsequence of funkg1k=1, which convergesto some element u in K or in Kn, because K and Kn are compact sets in X.In case (ii), there is a sequence vn = P1i=1 ai(vn)xi and integers Mn tending toin�nity as n!1, such that un = PMni=1 ai(vn)xi. By taking a suitable subsequence,without loss of generality, we can assume that vn converges to some v 2 K as n!1.By Lemma 3, un � vn ! 0 as n!1. Thus un converges to v.Combining the two cases, we conclude that K� is a compact set in X. Lemma 4is proved.Proof of Theorem 3. By Tietze extension theorem, we can extend f to a con-tinuous functional f� de�ned on K�.Since f� is a continuous functional de�ned on the compact set K�, then for any� > 0, there is a � > 0 such that jf(x0)� f(x00)j < �=2 provided that x0; x00 2 K� andkx0 � x00k < �.By Lemma 3, there is an integer M such thatkx0 � xMkX < �for all x 2 K.Therefore jf(x)� f�(xM)j < �=2 (17)for all x 2 K. KM is homomorphic to some compact set KMR in RM by the map:f MXn=1 an(x)xn : x 2 Kg ! fxM = (a1(x); : : : ; aM(x)) 2 RM : x 2 Kg:14

Page 15: Approximation Capability to Functions of Several Variables

Now, f� is a continuous functional de�ned on every KMR . Applying Theorem 1,we can �nd an integer N , yMi 2 RM , ci; �i 2 R1, i = 1; : : : ; N , such thatjf�(xM)� NXi=1 ci g(�ikxM � yMi kRM )j < �=2 (18)for all xM 2 KM .Combining (17) and (18), we conclude thatjf(x)� NXi=1 ci g(�ikxM � yMi kRM )j < � (19)for all x 2 K. Theorem 3 is proved.To illustrate the applications of Theorem 3, we now give some examples.Example 1. Let H be a Hilbert space, K � H be a compact set in H. As weshowed in [22], K can be considered as a compact set in a Hilbert space H1 withcountable basis fxng1n=1. Thus, Theorem 3 can be applied and we conclude thatevery continuous functional can be arbitrarily well approximated by the sumNXi=1 ci g(�ikxM � yMi kRM )Example 2. Let X = L2[0; 2�], then f1; fcos nxg1n=1; fsinnxg1n=1g is a Schauderbasis and an(x) are just the Fourier coe�cients correspondingly. Thus, we can ap-proximate every nonlinear functional de�ned on some compact set in L2[0; 2�] byRBF neural networks using sample data in frequency domain.Example 3. Let X = L2(R1) and f j;kg1j;k=1 be wavelets in L2(R1). Then wecan approximate continuous functionals de�ned on a compact set in L2(R1) by RBFneural network using wavelet coe�cients as sample data (also in frequency domain).15

Page 16: Approximation Capability to Functions of Several Variables

Remark 2. Since in Theorem 3 we only require that fxng1n=1 is a Schauder basis(no orthogonal requirement is imposed), therefore we do not require that f j;kg1j;k=1be orthogonal wavelets. This is a property of signi�cant advantage, for non-orthogonalwavelets are much easier to construct than orthogonal wavelets.The following theorem shows the possibility of approximating functionals by RBFneural networks using sample data in time (or state) domain.Theorem 4 Suppose that g 2 C(R1) \ S 0(R1) is not an even polynomial, X isa Banach space, K � X is a compact set, V is a compact set in C(K), f is acontinuous functional de�ned on V , then for any � > 0, there are positive integers N ,M , x1; : : : ; xM 2 K and �i; ci 2 R1, �i = (�i1; : : : ; �iM ) 2 RM , i = 1; : : : ; N such thatjf(u)� NXi=1 ci g(�ikuM � �Mi kRM )j < � (20)for all u 2 V , where uM = (u(x1); : : : ; u(xM)) 2 RM .Proof of Theorem 4. We follow a similar line as used in [12]. Pick a sequence�1 > �2 > � � � > �n ! 0, then we can �nd another sequence �1 > �2 > � � � > �n ! 0,such that jf(u) � f(v)j < �k for all f 2 V , whenever u; v 2 V and ku� vkC(K) < �k.Moreover, we can �nd �1 > �2 > � � � �n ! 0, such that ju(x0) � u(x00)j < �k for allu 2 V , whenever x0; x00 2 K and kx0 � x00kX < �k.By induction and rearrangement, we can �nd a sequence fxig1i=1 with each xi 2 Kand a sequence of positive integers n(�1) < n(�2) < : : : < n(�k) !1, such that the�rst n(�k) elements N(�k) = fx1; : : : ; xn(�k)g is an �k-net in K.16

Page 17: Approximation Capability to Functions of Several Variables

For each �k-net, de�ne functionsT ��k;j(x) = ( 1� kx�xjkk�k if kx� xjkX < �k0 otherwise (21)and T�k ;j(x) = T ��k;j(x)Pn(�k)j=1 T ��k;j(x) (22)for j = 1; : : : ; n(�k). It is easy to verify that fT�k;j(x)g is a partition of unity, i.e.8><>: 0 � T�k;j(x) � 1Pn(�k)j=1 T�k;j(x) � 1 for x 2 KT�k;j(x) = 0 if kx� xjkX > �k : (23)For each u 2 V , de�ne a functionu�k(x) = n(�k)Xj=1 u(xj)T�k ;j(x) (24)Moreover, let V�k = fu�k : u 2 V g and V � = V [ ([1k=1V�k ). We then have thefollowing conclusion:31. For any �xed k, V�k is a compact set in a space of dimension n(�k) in C(K).2. For every u 2 V , there holds ku� u�kkC(K) < �k (25)3. V � is a compact set in C(K).Now, similar to the proof of Theorem 3, we can extend f to a continuous functionalf� on V �, such that f�(x) = f(x) if x 2 V : (26)3For a proof of these three propositions, see Lemma 7 [12] or the Appendix of this paper.17

Page 18: Approximation Capability to Functions of Several Variables

Now, for any � > 0, we can �nd a � > 0 such that jf�(u)�f�(v)j < �=2 provided thatu; v 2 V � and ku� vkC(K) < �.Let k be �xed such that �k < �, then by (24), for every u 2 V ,ku� u�kkX < �k (27)which implies jf�(u)� f�(u�k)j < �=2 (28)for all u 2 V .By the argument used in the proof of Theorem 3, letting M = n(�k), there isan integer N , �i; ci 2 R1, �Mi = (�Mi1 ; : : : ; �MiM ) 2 RM , i = 1; : : : ; N and M pointsx1; : : : ; xM 2 K such thatjf�(u�k)� NXi=1 ci g(�ikuM � �Mi kRM )j < �=2 (29)which, combined with (28), leads tojf(u)� NXi=1 ci g(�ikuM � �Mi kRM )k < � (30)for all u 2 V . This completes the proof of Theorem 4.To conclude this section, we construct an RBF neural network to approximatenonlinear operators. More precisely, we proveTheorem 5 Suppose that g 2 C(R1)\S 0(R1) and is not an even polynomial, X isa Banach space, K1 � X, K2 � Rn are two compact sets in X and Rn respectively, Vis a compact set in C(K1), G is a nonlinear continuous operator, which maps V into18

Page 19: Approximation Capability to Functions of Several Variables

C(K2), then for any � > 0, there are positive integers M;N;m, constants cki ; �k; �i 2R1, k = 1; : : : ; N , i = 1; : : : ;M , m points x1; : : : ; xm 2 K1, !1; : : : ; !N 2 Rn, suchthat jG(u)(y)� MXi=1 NXk=1 cki g(�ikum � �mikkRm) g(�kky � !kkRn)j < � (31)for all u 2 V and y 2 K2, where um = (u(x1); : : : ; u(xm)), �mk = (�mk1; : : : ; �mk;m),k = 1; : : : ; N .Proof of Theorem 5. Since G is a continuous operator which maps a compactset V in C(K1) into C(K2), then the range G(V ) = fG(u) : u 2 V g is also a compactset in C(K2). By Theorem 2, for any � > 0, there are a positive integer N , !k 2 Rn,�k 2 R1, ck(G(u)) 2 R, k = 1; : : : ; N , such thatjG(u)(y)� NXk=1 ck(G(u)) g(�kky � !kkRn)j < �=2 (32)for all u 2 V and y 2 K2, and all ck(G(u)), k = 1; : : : ; N are continuous functionalde�ned on V . Carefully examing the proof of Theorem 5, we can �nd a common m,integer Nk, constants cki , �i, �mik 2 Rm, i = 1; : : : ; Nk, m points x1; : : : ; xm 2 K1,um = (u(x1); : : : ; u(xm)) 2 Rm, such thatjck(G(u))� NkXi=1 cki g(�ikum � �mikkRm)j < �2L (33)holds for all k = 1; : : : ; N , u 2 V , whereL = NXk=1 supy2K2 jg(�kky � !kkRn)j : (34)Substituting (33) into (32), we obtainjG(u)(y)� NXk=1 NkXi=1 cki g(�ikum � �mikkRm) g(�kky � !kkRn)j < � (35)19

Page 20: Approximation Capability to Functions of Several Variables

for all u 2 V and y 2 K2.Let M = maxkfNkg and let cki = 0 for Nk < i � M . Thus (35) can be rewrittenas jG(u)(y)� NXk=1 MXi=1 cki g(�ikum � �mikkRm) g(�kky � !kkRn)j < �for all u 2 V and y 2 K2. Theorem 5 is proved.Remark 3. Theorem 5 shows the capability of RBF networks in approximat-ing nonlinear operators using sample data in time (or state) domain. Likewise, byTheorem 3, we can construct RBF networks using sample data in frequency (or phase)domain.Remark 4. We can also construct neural networks, where a�ne basis functionsare mixed with radial basis functions. For example, we can approximate G(u)(y) byNXk=1 MXi=1 cki g(�ikum � �mikkRm) g(!k � y + �k) :The details are omitted here.Remark 5. In engineering, a system can be viewed as a operator (linear ornonlinear). Theorem 5 shows the capability of RBF neural networks in identifyingsystems, as compared with the Cybenko's theorem, which shows the capability ofABF neural networks in pattern recognition.5 ConclusionIn this paper, the problem of approximating functions of several variables, functionalsand nonlinear operators by radial basis function neural networks are studied. The20

Page 21: Approximation Capability to Functions of Several Variables

necessary and su�cient condition for a continuous function to be quali�ed as anactivation function in RBF networks is given. Results on using RBF neural networksfor computing the output of dynamical systems by sample data in frequency domainor time domain are given.Acknowledgements. The authors wish to thank Prof. R.-W. Liu of Universityof Notre Dame and Prof. I.W. Sandberg of University of Texas-Austin for bringingsome of the papers in this area to their attention.References[1] A. Wieland and R. Leighten, \Geometric analysis of neural network capacity",IEEE First ICNN. 1, pp. 385-392 (1987).[2] B. Irie and S. Miyake, \Capacity of Three-layered Perceptrons," in IEEE ICNN1, pp. 641-648 (1988).[3] S. M. Carroll and B. W. Dickinson, \Construction of Neural Nets using RadonTransform," in IJCNN Proc. I, pp. 607-611 (1989).[4] K. Funahashi, \On the Approximate Realization of Continuous Mappings byNeural Networks," Neural Networks, pp. 183-192, Vol. 2, (1989).[5] H.N. Mhaskar, C.A. Micchelli, \Approximation by Superposition of Sigmoidaland Radial Functions," Advances on Applied Mathematics, pp. 350-373, vol. 13,(1992).[6] G. Cybenko, \Approximation by Superpositions of a Sigmoidal Function," inMath. of Control, Signals and Systems, Vol. 2, No. 4, pp. 303-314 (1989).21

Page 22: Approximation Capability to Functions of Several Variables

[7] Y. Ito, \Representation of Functions by Superpositions of a Step or SigmoidalFunction and Their Applications to Neural Network Theory", Neural Networks,Vol. 4, pp. 385-394 (1991).[8] K. Hornik, \Approximation Capabilities of Multilayer Feedforward Networks,"Neural Networks, Vol. 4, pp. 251-257 (1991).[9] T. Chen, H. Chen and R.-W. Liu, \Approximation Capability in C(Rn) by Mul-tilayer Feedforward Networks and Related Problems," IEEE Trans. on NeuralNetworks, accepted.[10] T. Chen, H. Chen and R.-W. Liu, \A Constructive Proof of Cybenko's Ap-proximation Theorem and Its Extensions," pp. 163 - 168 in Computing Scienceand Statistics (editors LePage and Page), Proc. of the 22nd Symposium on theInterface (East Lansing, Michigan), May 1990.[11] T. Chen and H. Chen, \Approximation to Continuous Functionals by NeuralNetworks with Application to Dynamic Systems," IEEE Trans. on Neural Net-works (in press).[12] T. Chen and H. Chen, \Universal Approximation to Nonlinear Operators byNeural Networks with Arbitrary Activation Functions and Its Application toDynamic Systems," submitted for publication.[13] R.P. Lippmann, \Pattern Classi�cation using Neural Networks," IEEE Commun.Magazine, pp. 47-64, Vol. 27 (1989).[14] J. Park and I.W. Sandberg, \Universal Approximation Using Radial-Basis-Function Networks," Neural Computation, pp. 246-257, Vol. 3 (1991).[15] D. Hush and B. Horne, \Progress in Supervised Neural Networks," IEEE SPMagazine, Jan. (1993). 22

Page 23: Approximation Capability to Functions of Several Variables

[16] J.E. Moody and C.J. Darken, \Fast Learning in Networks of Locally-tuned Pro-cessing Units," Neural Computation, 1:281-293 (1989).[17] F. Girosi and T. Poggio, \Networks and the Best Approximation Property,"Arti�cial Intelligence Lab, Memo 1164, MIT (1989).[18] E.J. Hartman, J.D. Keeler and J.M. Kowalski, \Layered Neural networks withGaussian Hidden Units as Universal Approximators," Neural Computation,pp. 210-215, Vol. 2(2) (1990).[19] S. Lee and R.M. Kil, \A Gaussian Potential Function Network with Hierarchi-cally self-organizing learning," Neural Networks, pp. 207-224, Vol. 4 (1991).[20] J. Diedonne, Foundation of Modern Analysis, Academic Press : New York andLondon (1969), p. 142.[21] L.A. Liusternik and V.J. Sobolev, Elements of Functional Analysis (3rd ed.,translated from Russian 2nd. ed.), Wiley : New York (1974).[22] T. Chen, \Approximation to Nonlinear Functionals in Hilbert Space by Super-position of Sigmoidal Functions," Kexue Tongbao (1992).[23] J. Park and I.W. Sandberg, \Approximation and Radial-Basis-function Net-works," Neural Computation, Vol 5 (1993).[24] T. Chen and H. Chen, \L2(Rn) Approximation by RBF neural Networks," Chi-nese Annals of Mathematics, (in press).[25] W.Rudin, Functional Analysis, McGraw-Hill, New York, 1973.AppendixProof of The Three Propositions in the Proof of Theorem 4:We will prove the three propositions individually as follows.23

Page 24: Approximation Capability to Functions of Several Variables

1. For a �xed k, let u(j)�k , j = 1; 2; : : : ; be a sequence in V�k and u(j) be the sequence inV , such that u(j)�k = n(�k)Xj=1 u(j)(xj)T�k;j(x) : (36)Since V is compact, there is a subsequence u(jl)(x) which converges to some u 2 V ,it then follows that u(jl)�k (x) converges to u�j(x) 2 V�k , which means that V�k is acompact set in C(K).2. By the de�nition and the property of unity partition, we haveu(x)� un(�k)(x) = n(�k)Xj=1 [u(x)� u(xj)]T�k;j(x)= Xkx�xjkX��k[u(x)� u(xj)]T�k;j(x) : (37)Consequently, ku(x)� un(�k)(x)kX � �k n(�k)Xj=1 T�k;j(x) = �k (38)for all u 2 V .3. Suppose fujg1j=1 is a sequence in V �. If there is a subsequence fujlg1l=1 of fujg1j=1with all ujl 2 V , l = 1; : : : ; then by the fact that V is compact, there is a subsequenceof fujlg1l=1 which converges to some u 2 V . Otherwise, to each uj , there correspondsa positive integer k(j) such that uj = vjn(�k(j)). There are two possibilities: (i) Wecan �nd in�nite jl such that ujl 2 V�k0 for some �xed k0. By proposition 1. provedin this Appendix, Vn(�k0) is a compact set, hence there is a subsequence of fvjn(�k(j))gwhich converges to some v 2 V�k0 , i.e. a subsequence of fujg converging to v. (ii)There are sequences j1 < j2 < : : : ! 1 and k(j1) < k(j2) < : : : ! 1 such thatujl 2 V�k(jl) . Let vjl 2 V be such thatujl(x) = n(�k(jl))Xi=1 vjl(xi)T�k(jl);i(x) : (39)Since V jl 2 V and V is compact, we observe that there is a subsequence of fvjlg1l=1,which converges to some v 2 V . By the proposition 2. proved in this Appendix, thecorresponding subsequence of fujlg1l=1 also converges to v. Thus the compactness ofV � is proved. 24

Page 25: Approximation Capability to Functions of Several Variables

NETWORKOUTPUTS

NETWORKINPUTS

..

.

..

.

..

.

KERNEL NODESFigure 1: A Radial Basis Function Network25