neural networks - iitkhome.iitk.ac.in/~lbehera/files/lecture5_rbfn.pdf · neural networks radial...
TRANSCRIPT
Neural Networks
Radial Basis Function Networks
Laxmidhar Behera
Department of Electrical EngineeringIndian Institute of Technology, Kanpur
Intelligent Control – p.1/??
Radial Basis Function Networks
Radial Basis Function Networks (RBFN) consists of 3 layers
an input layer
a hidden layer
an output layer
The hidden units provide a set of functions that constitutean arbitrary basis for the input patterns.
hidden units are known as radial centers andrepresented by the vectors c1, c2, · · · , ch
transformation from input space to hidden unit spaceis nonlinear whereas transformation from hidden unitspace to output space is linear
dimension of each center for a p input network is p × 1Intelligent Control – p.2/??
Network Architecture
c1
c2
ch
x1
x2
x3
x4
xp
φ1
φh
φ2
w1
w2
wn
y
Inputlayer
Hidden layerof Radial
Basis functions
Outputlayer
..................
��
��
�
Figure 1: Radial Basis Function Network
Intelligent Control – p.3/??
Radial Basis functions
The radial basis functions in the hidden layer produces asignificant non-zero response only when the input fallswithin a small localized region of the input space.
Each hidden unit has its own receptive field in input space.
An input vector xi which lies in the receptive field for centercj, would activate cj and by proper choice of weights thetarget output is obtained. The output is given as
y =h∑
j=1
φjwj, φj = φ(‖x − cj‖)
wj: weight of jth center, φ: some radial function.Intelligent Control – p.4/??
Contd...
Different radial functions are given as follows.
Gaussian radial function φ(z) = e−z2/2σ2
Thin plate spline φ(z) = z2logzQuadratic φ(z) = (z2 + r2)1/2
Inverse quadratic φ(z) = 1(z2+r2)1/2
Here, z = ‖x − cj‖
The most popular radial function is Gaussianactivation function.
Intelligent Control – p.5/??
RBFN vs. Multilayer Network
RBF NET MULTILAYER NETIt has a single hidden layer It has multiple hidden layers
The basic neuron model as The computational nodes
well as the function of the of all the layers are similar
hidden layer is different
from that of the output layer
The hidden layer is nonlinear All the layers are
but the output layer is linear nonlinear
Intelligent Control – p.6/??
Contd...
RBF NET MULTILAYER NETActivation function of the hidden Activation function com-
unit computes the Euclidean putes the inner product
distance between the input vec- of the input vector and
tor and the center of that unit the weight of that unit
Establishes local mapping, Constructs global appro-
hence capable of fast learning ximations to I/O mapping
Two-fold learning. Both the cen- Only the synaptic
ters (position and spread) and weights have to be
the weights have to be learned learnedIntelligent Control – p.7/??
Learning in RBFN
Training of RBFN requires optimal selection of theparameters vectors ci and wi, i = 1, · · ·h.
Both layers are optimized using different techniquesand in different time scales.
Following techniques are used to update the weightsand centers of a RBFN.
Pseudo-Inverse Technique (Off line)
Gradient Descent Learning (On line)
Hybrid Learning (On line)
Intelligent Control – p.8/??
Pseudo-Inverse Technique
This is a least square problem. Assume a fixed radialbasis functions e.g. Gaussian functions.
The centers are chosen randomly. The function isnormalized i.e. for any x,
∑
φi = 1.
The standard deviation (width) of the radial function isdetermined by an adhoc choice.
The learning steps are as follows:1. The width is fixed according to the spread of centers
φi = e(− h
d2‖x−ci‖2), i = 1, 2, · · ·h
where h: number of centers, d: maximum distancebetween the chosen centers. Thus σ = d√
2h. Intelligent Control – p.9/??
Contd...
2. From figure 1, Φ = [φ1, φ2, · · · , φh]
w = [w1, w2, · · · , wh]T
Φw = yd, yd is the desired output
3. Required weight vector is computed as
w = (ΦTΦ)−1
ΦT yd = Φ
′yd
Φ′ = (ΦT
Φ)−1Φ
T is the pseudo-inverse of Φ.
This is possible only when ΦTΦ is non-singular. If this
is singular, singular value decomposition is used tosolve for w.
Intelligent Control – p.10/??
Illustration: EX-NOR problem
The truth table and the RBFN architecture are given below:
x1 x2 yd
0 0 10 1 01 0 01 1 1
c1
c2
φ1
φ2
����
��
x1
x2
w1
w2y
bias = +1
θ
Choice of centers is made randomly from 4 input patterns.
c1 = [0 0]T and c2 = [1 1]T
φ1 = φ(‖x − c1‖) = e−‖x−c1‖2
Similarly, φ2 = e−‖x−c2‖2
, x = [x1 x2]T
Intelligent Control – p.11/??
Contd...
Output y = w1φ1 + w2φ2 + θ
Applying 4 training patterns one after another
w1 + w2e−2 + θ = 1 w1e
−1 + w2e−1 + θ = 1
w1e−1 + w2e
−1 + θ = 1 w1e−2 + w2 + θ = 1
Φ =
1 0.1353 1
0.3679 0.3679 1
0.3679 0.3679 1
0.1353 1 1
, w =
w1
w2
θ
, yd =
1
0
0
1
Using w = Φ′yd, we get w =
[
2.5031 2.5031 −1.848]T
Intelligent Control – p.12/??
Gradient Descent Learning
One of the most popular approaches to update c and w, issupervised training by error correcting term which isachieved by a gradient descent technique. The update rulefor center learning is
cij(t + 1) = cij(t) − η1∂E
∂cij
, for i = 1 to p, j = 1 to h
the weight update law is
wi(t + 1) = wi(t) − η2∂E
∂wi
where the cost function is E = 12
∑
(yd − y)2
Intelligent Control – p.13/??
Contd...
The actual response is
y =h∑
i=1
φiwi
the activation function is taken as
φi = e−z2
i /2σ2
where zi = ‖x − ci‖, σ is the width of the center.Differentiating E w.r.t. wi, we get
∂E
∂wi
=∂E
∂y×
∂y
∂wi
= −(yd − y)φi
Intelligent Control – p.14/??
Contd...
Differentiating E w.r.t. cij, we get
∂E
∂cij
=∂E
∂y×
∂y
∂φi
×∂φi
∂cij
= −(yd − y) × wi ×∂φi
∂zi
×∂zi
∂cij
Now,∂φi
∂zi
= −zi
σ2φi
and∂zi
∂cij
=∂
∂cij
(∑
j
(xj − cij)2)1/2
= −(xj − cij)/ziIntelligent Control – p.15/??
Contd...
After simplification, the update rule for center learning is:
cij(t + 1) = cij(t) + η1(yd − y)wi
φi
σ2(xj − cij)
The update rule for the linear weights is:
wi(t + 1) = wi(t) + η2(yd − y)φi
Intelligent Control – p.16/??
Example: System identification
The same Surge Tank system has been taken forsimulation. The system model is given as
h(t + 1) = h(t) + T
(
−√
2gh(t)√
3h(t) + 1+
u(t)√
3h(t) + 1
)
t : discrete time stepT : sampling timeu(t) : input flow, can be positive or negativeh(t) : liquid level of the tank (output)g : the gravitational acceleration
Intelligent Control – p.17/??
Data generation
Sampling time is taken as 0.01 sec, 150 data have beengenerated using the system equation. The nature of inputu(t) and y(t) is shown in the following figure.
0 50 100 1500
1
2
3
4
5
6
time step
I/O
da
ta
input u(t)output h(t)
Intelligent Control – p.18/??
Contd...
The system is identified from the input-output datausing a radial basis function network
Network parameters are given below:
Number of inputs : 2 [(u(t), h(t)]
Number of outputs : 1 [target: h(t + 1)]
Units in the hidden layer : 30
Number of I/O data : 150
Radial Basis Function : Gaussian
Width of the radial function (σ) : 0.707
Center learning rate (η1) : 0.3
Weight learning rate (η2) : 0.2
Intelligent Control – p.19/??
Result: Identification
After identification, the root mean square error error isfound to be < 0.007. Convergence of mean square error isshown in the following figure.
0 200 400 600 800 10000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
epochs
Me
an
sq
ua
re e
rro
r
Intelligent Control – p.20/??
Model Validation
After identification, the model is validated through a set of100 input-output data which is different from the data setused for training. The result is shown in the following figurewhere left figure represents the desired and network outputand right figure shows the corresponding input.
0 10 20 30 40 50 60 70 80 90 1000
0.2
0.4
0.6
0.8
1
time step
Ou
tpu
t y
d,
ya
DesiredActual
1 20 40 60 80 100−4
−2
0
2
4
6
8
time step (t)
Inp
ut
u(t
)
Intelligent Control – p.21/??
Hybrid Learning
In hybrid learning, the radial basis functions relocate theircenters in a self-organized manner while the weights areupdated using supervised learning.
When a pattern is presented to RBFN, either a new centeris grown if the pattern is sufficiently novel or the parametersin both layers are updated using gradient descent.
The test of novelty depends on two criteria:
Is the Euclidean distance between the input patternand the nearest center greater than a threshold δ(t)?
Is the mean square error at the output greater than adesired accuracy?
A new center is allocated when both criteria are satisfied.Intelligent Control – p.22/??
Contd...
Find out a center that is closest to x in terms of Euclideandistance. This particular center is updated as follows:
ci(t + 1) = ci(t) + α(x − ci(t))
Thus the center moves closer to x.
While centers are updated using unsupervised learning,the weights can be updated using least mean squares(LMS) or recursive least squares (RLS) algorithm. We willpresent the RLS algorithm.
Intelligent Control – p.23/??
Contd...
The ith input of the RBFN can be written as
x̂i = φTθi, i = 1, · · · , n
where φ ∈ Rl, the output vector of the hidden layer;θi ∈ Rl, the connection weight vector from hiddenunits to ith output unit. The weight update law:
θ̂i(t + 1) = θ̂i(t) + P (t + 1)φ(t)[xi(t + 1) − φT (t)θ̂i(t)
P (t + 1) = P (t) − P (t)φ(t)[1 + φT (t)P (t)φ(t)]−1φT (t)P (t)
where P (t) ∈ Rl×l. This algorithm is more accurateand fast compared to LMS algorithm.
Intelligent Control – p.24/??
Example: System Identification
The same surge tank model is identified with thesame input-output data set. The parameters ofRBFN are also same as that of gradient descenttechnique.
Center training is done using unsupervised K-meanclustering with a learning rate of 0.5.
Weight training is done using gradient descentmethod with a learning rate of 0.1.
After identification root mean square is found to be< 0.007.
Intelligent Control – p.25/??
Contd...
Mapping of input data and centers are shown in thefollowing figure. Left figure shows how the centers areinitialized and right figure shows the spreading of centerstowards the nearest inputs after learning.
−4 −2 0 2 4 6 8 100
0.5
1
1.5
2
2.5InputCenter
Before Training
−4 −2 0 2 4 6 8 100
0.5
1
1.5
2
2.5InputCenter
After Training
Intelligent Control – p.26/??
Comparison of Results
Schemes RMS error No. of iterations
Back-propagation 0.00822 ∼ 5000
RBFN (Gradient Descent) 0.00825 ∼ 2000
RBFN (Hybrid Learning) 0.00823 ∼ 1000
It is seen from the table that RBFN learns faster comparedto a simple feedforward network.
Intelligent Control – p.27/??