radial basis function networks - computer science @ the college of
TRANSCRIPT
Radial Basis Function NetworksRadial Basis Function Networks
Ravi KaushikProject 1
CSC 84010 Neural Networks and Pattern Recognition
HistoryHistory
Radial Basis Function (RBF) emerged in late 1980’s as a variant of artificial neural network.The activation of the hidden layer is dependent on the distance between the input vector and a prototype vectorTopics include function approximation, regularization, noisy interpolation, density estimation, optimal classification theory and potential functions.
MotivationMotivation
RBF can approximate any regular functionTrains faster than any multi-layer perceptronIt has just two layers of weightsEach layer is determined sequentiallyEach hidden unit implements a radial activated functionInput is non-linear and output is linear
AdvantagesAdvantages
RBFN can be trained faster than multi-layer perceptron due to its two stage training procedure.Two layer networkNon-linear approximationUse of both unsupervised and supervised learning No saturation while generating outputsWhile training, it does not get stuck in local minima
Basis FunctionsBasis Functions
RBF network has be shown to be a universal approximatorfor continuous functions, provided that the number nr of hidden nodes is sufficiently large.
However, the use of direct multi-quadric function as activation function will avoid saturation of the node outputs.
Network TopologyNetwork Topology
Gaussian Activation Function
Output Layer: is a weighted sum of hidden inputs
Output for pattern recognition problems
φ j x( )= exp − X − μ j( )Σ j−1 X − μ j( )[ ] j =1...L
ψk (x) = λ jk .φ j (x)j=1
L
∑
Yk (x) =1
1+ exp −ψk (x)( )k =1.....M
RBF NN MappingRBF NN Mapping
X is a d dimensional input vector with elements xi and μj is the vector determining the center of basis function φjand has elements μji.
yk (x) = wkjφ j (x) + wk0j=1
M
∑
φ j (x) = exp −x − μ j
2
2σ j2
⎛
⎝
⎜ ⎜
⎞
⎠
⎟ ⎟
Network TrainingNetwork Training
Two stages of TrainingStage 1:Unsupervised trainingDetermine the parameters of the basis
functions (μj and σj) using the dataset xn.
Network TrainingNetwork Training
Stage 2:Optimization of the second layer
weights
yk (x) = wkj .φ j (x)j= 0
M
∑ y(x) = Wφ
E =12
yk (xn ) − tkn{ }
k∑ 2
n∑ Sum of least
squares
ΦTΦW T = ΦTT W T = Φ−1T
Training AlgorithmsTraining Algorithms
Two kinds of training algorithms- Supervised and Unsupervised- RBF networks are used mainly in
supervised applications- In this case, both dataset and its output is
known.- Network parameters are found such that
they minimize the cost function
min Yk Xi( )− Fk Xi( )( )TYk Xi( )− Fk Xi( )( )
i=1
Q
∑
Training algorithmsTraining algorithms
Clustering algorithms (k-mean)The centers of radial basis functions
are initialized randomly. For a given data sample Xi the
algorithm adapts its closest center
Xi − ˆ μ j = mink=1
LXi − ˆ μ k
Training Algorithms (cont..)Training Algorithms (cont..)
Regularization (Haykin, 1994)Orthogonal least squares using Gram-Schimdt algorithm Expectation-maximization algorithm using a gradient descent algorithm (Moody and Darken, 1989) for modeling input-output distributions
RegularizationRegularization
Determines weight by matrix computation
E =12
y(xn ) − t n{ }2+
v2
Py 2 dx∫n
∑
E is the total error to be minimizedP is some differential operatorν is called the regularization parameterν controls the relative importance of the regularization hence the degree of smoothness of the function y(x)
RegularizationRegularization
If Regularization parameter is zero, the weights converge to the pseudo inverse solution
If the input dimension and the number of patterns are large, not only it is difficult to implement the regularization, but also numerical errors may occur during the computation.
Gradient Descent MethodGradient Descent Method
Gradient Descent method goes through entire set of training patterns repeatedlyIt tends to settle down to a local minimum and sometimes even does not converge if the patterns of the outputs of the middle layer are not linearly separableIts difficult obtain parameters such as learning rate
RBFNN vs. MultiRBFNN vs. Multi--Layer Layer PerceptronPerceptron
RBFNN uses a distance to a prototype vector followed by transformation by a localized function. MLP depends on weighted linear summations of the inputs, transformed by monotonic variation functions.MLP, for a given input value, many hidden units will typically contribute to the determination of the output value. RBF, for a given input vector, only a few hidden units are activated.
RBFNN vs. MultiRBFNN vs. Multi--Layer Layer PerceptronPerceptron
MLP has many layers of weights, a complex pattern of connectivity, so that not all possible weights in a given layer are present. RBF is simplistic with two layers. First layer contains the parameters of the basis functions, second layer forms linear combinations of the activations of the basis functions to generate outputs.All parameters of MLP are determined simultaneously using supervised training. RBFNN is a two stage training technique, with first layer parameters are computed using unsupervised network and second layer using fast linear supervised methods
Programming Paradigm and LanguagesProgramming Paradigm and Languages
Java with Eclipse IDEMatlab 7.4 Neural Network Toolbox
Java Application DevelopmentExisting Codes onlineObject Oriented ProgrammingDebugging is easier in Eclipse IDEJava Documentation is extensive.
Applications of RBNNApplications of RBNN
Pattern Recognition(Lampariello & Sciandrone)
Problem is formulated in terms of a system of non-linear equalities, a suitable error function, which only depends on the violated inequalities.
Reason to choose RBFNN over MLP- Classification problems will not saturate by
a suitable choice of an activation function.
Pattern Recognition (using RBFNN)Pattern Recognition (using RBFNN)
Different error functions are used such as
cross entropy
Exponential function
Pattern Recognition (using RBFNN)Pattern Recognition (using RBFNN)
Non linear Inequality
Error function
Four 2D Gaussian Clusters grouped into two classesFour 2D Gaussian Clusters grouped into two classes
Modeling a 3D ShapeModeling a 3D Shape
The algorithms using robust statistics provide better parameter estimation than classical RBF network estimation
Classification problem applied to Diabetes MellitusClassification problem applied to Diabetes Mellitus
Two stages of RBF NN
Stage one of training includes fixing the radial basis centers μjusing the k-means clustering algorithm
Stage two of training involves determination of Weight Wijwhich would approximate the limited sample data X, thus leading to a linear optimization problem using least squares.
Classification problem applied to Diabetes MellitusClassification problem applied to Diabetes Mellitus
Results
1200 cases, 600 for training, 300 for validation and 300 for testing.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
ConclusionConclusionRBF has very good properties such as
LocalizationFunctional approximationInterpolationCluster modelingQuasi-orthogonality
Applications in fields includeTelecommunicationsSignal and image processingControl engineeringComputer vision
ReferencesReferences
Broomhead, D. S. and Lowe, D. (1988). Multivariable function interpolation and adaptive networks. Complex Systems, 2, 321-355.Moody, J. and Darken, C. J. (1989). Fast learning in networks of locally-tuned processing units. Neural Computation, 1, 281-294.Poggio, T. and Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE, 78, 1481-1497.
ReferencesReferences
Hwang, Young-Sup, Sung-Yang, “An Efficient Method to construct a Radial Basis Function Neural Network classifier and its application to unconstrained handwritten digit recognition”, 13th Intl. Conference on Pattern Recognition, pp. 640, vol. 4, 1996Venkatesan P, Anitha. S, “Application of a radial basis function neural network for diagnosis of diabetes mellitus” Current Science, vol. 91, pp. 1195-1199, 2006