radial basis function networks - computer science @ the college of

33
Radial Basis Function Networks Radial Basis Function Networks Ravi Kaushik Project 1 CSC 84010 Neural Networks and Pattern Recognition

Upload: others

Post on 03-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Radial Basis Function NetworksRadial Basis Function Networks

Ravi KaushikProject 1

CSC 84010 Neural Networks and Pattern Recognition

HistoryHistory

Radial Basis Function (RBF) emerged in late 1980’s as a variant of artificial neural network.The activation of the hidden layer is dependent on the distance between the input vector and a prototype vectorTopics include function approximation, regularization, noisy interpolation, density estimation, optimal classification theory and potential functions.

MotivationMotivation

RBF can approximate any regular functionTrains faster than any multi-layer perceptronIt has just two layers of weightsEach layer is determined sequentiallyEach hidden unit implements a radial activated functionInput is non-linear and output is linear

AdvantagesAdvantages

RBFN can be trained faster than multi-layer perceptron due to its two stage training procedure.Two layer networkNon-linear approximationUse of both unsupervised and supervised learning No saturation while generating outputsWhile training, it does not get stuck in local minima

Network TopologyNetwork Topology

φ j (x) ψk (x)

Basis FunctionsBasis Functions

RBF network has be shown to be a universal approximatorfor continuous functions, provided that the number nr of hidden nodes is sufficiently large.

However, the use of direct multi-quadric function as activation function will avoid saturation of the node outputs.

Network TopologyNetwork Topology

Gaussian Activation Function

Output Layer: is a weighted sum of hidden inputs

Output for pattern recognition problems

φ j x( )= exp − X − μ j( )Σ j−1 X − μ j( )[ ] j =1...L

ψk (x) = λ jk .φ j (x)j=1

L

Yk (x) =1

1+ exp −ψk (x)( )k =1.....M

RBF NN MappingRBF NN Mapping

X is a d dimensional input vector with elements xi and μj is the vector determining the center of basis function φjand has elements μji.

yk (x) = wkjφ j (x) + wk0j=1

M

φ j (x) = exp −x − μ j

2

2σ j2

⎜ ⎜

⎟ ⎟

Network TrainingNetwork Training

Two stages of TrainingStage 1:Unsupervised trainingDetermine the parameters of the basis

functions (μj and σj) using the dataset xn.

Network TrainingNetwork Training

Stage 2:Optimization of the second layer

weights

yk (x) = wkj .φ j (x)j= 0

M

∑ y(x) = Wφ

E =12

yk (xn ) − tkn{ }

k∑ 2

n∑ Sum of least

squares

ΦTΦW T = ΦTT W T = Φ−1T

Training AlgorithmsTraining Algorithms

Two kinds of training algorithms- Supervised and Unsupervised- RBF networks are used mainly in

supervised applications- In this case, both dataset and its output is

known.- Network parameters are found such that

they minimize the cost function

min Yk Xi( )− Fk Xi( )( )TYk Xi( )− Fk Xi( )( )

i=1

Q

Training algorithmsTraining algorithms

Clustering algorithms (k-mean)The centers of radial basis functions

are initialized randomly. For a given data sample Xi the

algorithm adapts its closest center

Xi − ˆ μ j = mink=1

LXi − ˆ μ k

Training Algorithms (cont..)Training Algorithms (cont..)

Regularization (Haykin, 1994)Orthogonal least squares using Gram-Schimdt algorithm Expectation-maximization algorithm using a gradient descent algorithm (Moody and Darken, 1989) for modeling input-output distributions

RegularizationRegularization

Determines weight by matrix computation

E =12

y(xn ) − t n{ }2+

v2

Py 2 dx∫n

E is the total error to be minimizedP is some differential operatorν is called the regularization parameterν controls the relative importance of the regularization hence the degree of smoothness of the function y(x)

RegularizationRegularization

If Regularization parameter is zero, the weights converge to the pseudo inverse solution

If the input dimension and the number of patterns are large, not only it is difficult to implement the regularization, but also numerical errors may occur during the computation.

Gradient Descent MethodGradient Descent Method

Gradient Descent method goes through entire set of training patterns repeatedlyIt tends to settle down to a local minimum and sometimes even does not converge if the patterns of the outputs of the middle layer are not linearly separableIts difficult obtain parameters such as learning rate

RBFNN vs. MultiRBFNN vs. Multi--Layer Layer PerceptronPerceptron

RBFNN uses a distance to a prototype vector followed by transformation by a localized function. MLP depends on weighted linear summations of the inputs, transformed by monotonic variation functions.MLP, for a given input value, many hidden units will typically contribute to the determination of the output value. RBF, for a given input vector, only a few hidden units are activated.

RBFNN vs. MultiRBFNN vs. Multi--Layer Layer PerceptronPerceptron

MLP has many layers of weights, a complex pattern of connectivity, so that not all possible weights in a given layer are present. RBF is simplistic with two layers. First layer contains the parameters of the basis functions, second layer forms linear combinations of the activations of the basis functions to generate outputs.All parameters of MLP are determined simultaneously using supervised training. RBFNN is a two stage training technique, with first layer parameters are computed using unsupervised network and second layer using fast linear supervised methods

Programming Paradigm and LanguagesProgramming Paradigm and Languages

Java with Eclipse IDEMatlab 7.4 Neural Network Toolbox

Java Application DevelopmentExisting Codes onlineObject Oriented ProgrammingDebugging is easier in Eclipse IDEJava Documentation is extensive.

Java Eclipse IDEJava Eclipse IDE

MatlabMatlab 7.0 Neural Network Toolbox7.0 Neural Network Toolbox

MatlabMatlab 7.0 Neural Network Toolbox7.0 Neural Network Toolbox

Applications of RBNNApplications of RBNN

Pattern Recognition(Lampariello & Sciandrone)

Problem is formulated in terms of a system of non-linear equalities, a suitable error function, which only depends on the violated inequalities.

Reason to choose RBFNN over MLP- Classification problems will not saturate by

a suitable choice of an activation function.

Pattern Recognition (using RBFNN)Pattern Recognition (using RBFNN)

Different error functions are used such as

cross entropy

Exponential function

Pattern Recognition (using RBFNN)Pattern Recognition (using RBFNN)

Non linear Inequality

Error function

Four 2D Gaussian Clusters grouped into two classesFour 2D Gaussian Clusters grouped into two classes

Modeling a 3D ShapeModeling a 3D Shape

The algorithms using robust statistics provide better parameter estimation than classical RBF network estimation

Classification problem applied to Diabetes MellitusClassification problem applied to Diabetes Mellitus

Two stages of RBF NN

Stage one of training includes fixing the radial basis centers μjusing the k-means clustering algorithm

Stage two of training involves determination of Weight Wijwhich would approximate the limited sample data X, thus leading to a linear optimization problem using least squares.

Classification problem applied to Diabetes MellitusClassification problem applied to Diabetes Mellitus

Results

1200 cases, 600 for training, 300 for validation and 300 for testing.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

ConclusionConclusionRBF has very good properties such as

LocalizationFunctional approximationInterpolationCluster modelingQuasi-orthogonality

Applications in fields includeTelecommunicationsSignal and image processingControl engineeringComputer vision

ReferencesReferences

Broomhead, D. S. and Lowe, D. (1988). Multivariable function interpolation and adaptive networks. Complex Systems, 2, 321-355.Moody, J. and Darken, C. J. (1989). Fast learning in networks of locally-tuned processing units. Neural Computation, 1, 281-294.Poggio, T. and Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE, 78, 1481-1497.

ReferencesReferences

Hwang, Young-Sup, Sung-Yang, “An Efficient Method to construct a Radial Basis Function Neural Network classifier and its application to unconstrained handwritten digit recognition”, 13th Intl. Conference on Pattern Recognition, pp. 640, vol. 4, 1996Venkatesan P, Anitha. S, “Application of a radial basis function neural network for diagnosis of diabetes mellitus” Current Science, vol. 91, pp. 1195-1199, 2006

ReferencesReferences

“Christopher Bishop, “ Neural Networks for Pattern Recognition”, Oxford University Press, 1995