Download - My Dhsch6 Rbf
-
8/11/2019 My Dhsch6 Rbf
1/17
RBF Neural Networks
We consider the other major class of neural
network model Radial Basis Function (RBF)
Neural Networks in which the activation of a
hidden unit is determined by the distance
between the input vector and a prototype vector.
A function is approximated as a linear
combination of a set ofbasis functions.
yk
x
M
j 1
wk jj
x
wk0
M
j 0
wk jj
x
Architecture of the RBF neural network p.1
-
8/11/2019 My Dhsch6 Rbf
2/17
Basis Functions
Basis functions normally take the formj
x
j
. The
function depends on the distance (usually taken to be
Euclidean) between the input vector
xand a vector
j.
The most common form of basis function used is the
Gaussian function
j
x
exp
x
j
2
22j
j determines the center of basis functionj;j is a width
parameter that controls how spread the curve is.
The Gaussian function is not normalized, since any overall
factors can be absorbed into the weights p.2
-
8/11/2019 My Dhsch6 Rbf
3/17
Basis Functions
The Gaussian function is alocalizedbasis
function with the property that
r
0as
r
.
Another choice of localized basis function
r
r2 2
0
r
x
j
A hidden neuron is more sensitive to data points
near its center. This sensitivity may be tuned byadjusting the width. For a given input vector,typically only a few hidden units will have
significant activations.
p.3
-
8/11/2019 My Dhsch6 Rbf
4/17
Expressiveness
The hidden layer applies a nonlinear
transformation from the input space to the hidden
space
RBF neural networks are capable of universal
function approximation, with only mild restrictionson the form of the basis functions.
More general topologies of RBF network (more
than one hidden layer) are not normallyconsidered.
p.4
-
8/11/2019 My Dhsch6 Rbf
5/17
Example: Implementing XOR
z1 e
x
1
2
1
0 0
t
z2 e
x
2
2
2
1 1
t
x1 x2 z1 z2
0 0 1 .135
0 1 .37 .37
1 0 .37 .371 1 .135 1
p.5
-
8/11/2019 My Dhsch6 Rbf
6/17
Example: Implementing XOR
11
1
X1
X2
1
When mapped into the feature space
z1 z2
, the two
classes become linearly separable. p.6
-
8/11/2019 My Dhsch6 Rbf
7/17
Generalized Gaussian RBFs
The Gaussian RBFs can be generalized to allow for
arbitrary covariance matricesj.
j
x
exp 1
2
x
j
t 1
j
x
j
Since the matricesj are symmetric, each basis function
hasd
d
3
2independent adjustable parameters, as
compared with thed
1independent parameters for the
regular Gaussian basis functions.
In practice there is a trade-off to be considered between
using a smaller number of basis with many adjustable
parameters and a larger number of less flexible functions.
p.7
-
8/11/2019 My Dhsch6 Rbf
8/17
Training RBF networks
Sum-of-squared-error function
E np 1
Ep Ep 1
2
ck 1
tkp ykp
2
wheretkp is the target value for the output unitkwhenthe network is presented with input vector
xp.
zjp e
xp
j
2
22j ykp
M
j 0
wk jzjp
Gradient descent algorithm computing gradients p.8
-
8/11/2019 My Dhsch6 Rbf
9/17
Training RBF networks
Ep
wk j
Ep
ykp
ykp
wk j
Ep
ykp
tkp ykp
ThusEp
wk j
tkp ykp
zjp
p.9
-
8/11/2019 My Dhsch6 Rbf
10/17
Training RBF networks
Ep
ji
k
Ep
yk p
yk p
zj p
zj p
ji
zj p
ji zj p
ji
xp
j
2
22j
zj p
22j
ji
i
xip ji
2
zj p
22j
2
xip ji
1
zj p2j
xip ji
Ep
ji
zj p
2j
xip
ji
k
tk p
yk p
wk j
p.10
-
8/11/2019 My Dhsch6 Rbf
11/17
Training RBF networks
Ep
j
k
Ep
ykp
ykp
zjp
zjp
j
zjp
j
zjp
j
xp
j
2
22j
zjp2
jlnzjp
Ep
j
zjp2
j
lnzjpk
tkp ykp
wk j
p.11
-
8/11/2019 My Dhsch6 Rbf
12/17
-
8/11/2019 My Dhsch6 Rbf
13/17
Two-stage training of RBF nets
In the first stage the input data set
xp
alone is
used to determine the parameters of the basis
functions (e.g.
j andj for the Gaussian basisfunctions). For example
randomly select a set of training data as the
basis function centers
identify clusters of training data and put a
basis function centered at each clusterchoosej to be some multiple of averagedistance between the basis function centers.
p.13
i i f
-
8/11/2019 My Dhsch6 Rbf
14/17
Two-stage training of RBF nets
In the second stage, the basis functions are kept
fixed and the second-layer weights are optimized.
yk
x
M
j
0
wk jj
x
Since the basis functions are considered fixed,
the network is equivalent to a single-layer neural
network.
p.14
T i i f RBF
-
8/11/2019 My Dhsch6 Rbf
15/17
Two-stage training of RBF nets
Consider a sum-of-squared-error function
E
1
2p k
tk p
yk
xp
2
Since the error function is a quadratic function of the
weights, its minimum can be found in terms of the solution
of a set of linear equations.
Setting the gradient ofEwith respect towk j to zero gives
t
Wt
tT
where
W
k j
wk j,
p j
j
xp
, and
T
pk
tk p.
p.15
T t t i i f RBF t
-
8/11/2019 My Dhsch6 Rbf
16/17
Two-stage training of RBF nets
Given the pseudoinverse
t
1t, theformal solution for the weights is given by
Wt
T
In practice, the equations are solved usingsingular value decomposition, to avoid problems
due to the possibility oftbeing singular or
nearly singular.The second-layer weights can be found by fast,
linear algebra techniques!
p.16
T t t i i f RBF t
-
8/11/2019 My Dhsch6 Rbf
17/17
Two-stage training of RBF nets
The possibility of choosing suitable parameters for the
hidden units without having to perform a full non-linear
optimization of the network is one of the major advantages
of RBF networks, as compared with the multi-layer neural
networks.
The use of unsupervised techniques to determine the basis
function parameters need not to be optimal for the
supervised learning problem, since it does not take onaccount of the target labels associated with the data
Unsupervised techniques can be used to initialize the basis
function parameters before running the gradient descent
algorithm p.17