paper-2

International Journal of Computer Information Systems,

Vol. 2, No. 5, May 2011

Comparing the Performance of Backpropagation Algorithm

and Genetic Algorithms in Pattern Recognition Problems

Chukwuchekwa Ulumma Joy

Department of Mathematics

Federal University of Technology, Owerri

(NIGERIA)

E-mail: [email protected]

Abstract— Multilayer Perceptrons (MLPs) with

Backpropagation (BP) training algorithm have been known to be

very useful in solving a wide variety of real world problems (such

as Pattern Classification, Clustering, Function Approximation,

Forecasting, Optimization, Pattern Association and Control), but

there have been numerous research on determining alternative

training algorithms to the backpropagation algorithm, which has

been known to be using the gradient descent technique. One

approach has been to use optimization algorithms, such as the

genetic algorithm, that are not dependent on derivatives to

modify the objective function to penalize for unwanted weights in

the solution. In this paper, using some pattern recognition

problems as illustrations, comparisons are made based on the

effectiveness and efficiency of both backpropagation and genetic

algorithm training algorithms on the networks. The

backpropagation algorithm is found to outperform the genetic

algorithm in this instance.

Keywords- Backpropagation neural networks; Genetic Algorithms;

Evolving neural networks

I. INTRODUCTION

Multilayer perceptron (MLP) neural networks trained

with backpropagation (BP) algorithm have been proved to be

useful in solving a wide variety of real world problems in

various domains. Despite numerous extensions and

modifications [1] (such as the acceleration of the convergence

speed (i.e. fast-backpropagation), special learning rules and

data representation schemes (i.e. cumulative delta-rule,

cascade and batching learning), different error functions (i.e.

cubic instead of standard RMS), alternative transfer

(activation) functions of the neurons, weight distribution (i.e.

weight pruning) among others, to improve the results or to

achieve some required properties of the trained networks, one

key element i.e., the BP, still based on the gradient descent

algorithm to minimize the network error, has been scarcely

changed.

Usually a gradient descent algorithm is used to adjust the

neural networks weight by comparing the target (desired) and

actual network results when a set of inputs are introduced in

the network, but despite its popularity in the training of MLP,

BP has some drawbacks. It depends on the shape of the error

surface, the values of the randomly initialized weights and

some other parameters, that is, BP very much depends on

good, problem specific parameter settings [2]. Also there is the

tendency of the trained neural network getting stuck in local

minima.

Numerous attempts have been made to prevent the

gradient descent algorithm from becoming trapped in local

minima as the training of the network progresses. Genetic

algorithms (GAs) usually avoid local minima by searching in

several regions simultaneously (working on a population of

trial solutions). And the only information that GAs need is

some performance value that determines how good a given set

of weights is. They have no need for gradient information.

GAs also place no restrictions on the network topology

because they do not require backward propagation of an error

signal [2].

In this paper, using some pattern recognition problems as

illustrations, comparisons are made based on the effectiveness

and efficiency of both the backpropagation and the genetic

algorithm training algorithms on the neural networks. The

backpropagation algorithm is found to outperform the genetic

algorithm in this instance.

II. MATERIALS AND METHODS

A. TRAINING MLPS WITH THE BP

There are many kinds of learning rules but the most often used

is the Delta rule or Back-propagation rule. A NN is trained to

map a set of input data by iterative adjusting the weights

iteratively. Information from inputs is fed forward through the

network to optimize the weights between neurons.

Optimization of the weights is usually made by backward

propagation of the error during training or learning phase. The

ANNs read the input and output values in the training data set

and change the value of the weighted connections (links) to

reduce the difference between the predicted and target values.

The error in prediction is minimized across many training

cycles (epochs) until network reaches specified level of

accuracy. Stergiou and Siganos [3] gave a good explanation

for the backpropagation algorithm.

A.1 JAVANNS

JAVANNS is a simulator for ANNs. It enables one to

use predefined networks or create new networks, to train and

analyze them. Details of the usage can be found in

Special Issue Page 7 of 52 ISSN 2229 5208

mailto:[email protected]


Vol. 2, No. 5, May 2011

JAVANNS-manual-4.html © 2001 – 2002 Universität

Tübigen.

B. TRAINING THE MLPS USING GENETIC ALGORITHMS

Genetic algorithms (GAs) belong to a family of

computational models based on evolution. GAs belong to a

class of population based random search algorithms that are

inspired by the principles of natural evolution known as

Evolutionary Algorithms (EAs). A key element in a genetic

algorithm is that it maintains a population of candidate

solutions that evolves over time. By selecting suitable

parameters to control the GAs, high efficiency and good

performance can be achieved. GAs are developed by John

Holland in 1975. GAs have ability to create an initial

population of feasible solutions, and then recombine them in a

way to guide their search to only the most promising areas of

the state space. Each feasible solution is encoded as

chromosome (string) and each chromosome is a measure of

fitness via a fitness function. The fitness of a chromosome

determines its ability to survive and produce offspring.

B.1 EVOLVING NEURAL NETWORKS

Combining GAs and ANNs produces a special class

of ANNs in which GAs is another form of adaptation in

addition to learning [4]. The GAs can be used effectively to

find an optimal set of connection weights globally without

computing gradient information. GAs have been combined

with ANNs at different levels such as connection weights,

network architecture and learning rules. The evolution of

architectures enables ANNs to adapt their structures

(topologies) to different tasks without human intervention.

Evolution of learning rules is a process where the adaptation

of learning rules is achieved through evolution. Evolution of

connection weights introduces an adaptive and global

approach to training. Fig. 1. [5] shows a typical evolutionary

neural network design.

Applying GAs to weight training in ANNs consists of

two major phases – deciding on the representation of

connection weights, i.e., whether in the form of binary strings

or not and the evolutionary process simulated by GAs, in

which search operators such as crossover and mutation have to

be decided in conjunction with the representation scheme.

Different representations and search operators can lead to

quite different training performance. The evolution stops when

the fitness is greater than a predefined value (i.e., the training

error is smaller than a certain value) or the population has

converged.

A typical cycle of the evolution of connection weights is as

follows [4].

Decode each individual (genotype) in the current

generation into a set of connection weights and

construct a corresponding ANN with the weights.

Evaluate each ANN by computing its total mean

square error between actual and target outputs. The

Fig. 1: A typical evolutionary neural network design.

• fitness of an individual is determined by the error.

The higher the error, the lower the fitness.

• Select parents for reproduction based on their fitness.

• Apply search operators, such as crossover and/or

mutation, to parents to generate offspring, which

form the next generation.

The ANNs weights can be represented in two different

ways – binary and real valued [6]; [7]; [8]. In either of the

cases, it is just a concatenation of the network’s weights in a

string [5]. The focus in this paper is real-valued weight

representation, in which each gene in the chromosome is a real

rather than a bit. The weights are read off the network in a

fixed order (i.e., from left to right and from top to bottom and

placed in a list. Each chromosome is a vector (list) of weights.

The main genetic operators used here are the recombination

(crossover) and the mutation operators. A mutation operator

takes one parent and randomly changes some of the entries in

its chromosome to create a child. A crossover operator takes

two parents and creates one or two children containing some

of the genetic material of each parent.

B.2 A SOFTWARE MODEL FOR THE EVOLUTION OF

MULTILAYER PERCEPTRON WEIGHTS

In this section, a software model is constructed for

the evolution of MLP network weights using an object

oriented approach. The whole process is carried out using the

Unified Modeling Language (UML), which provides a formal

framework for the modeling of software systems. The final

implementation, called GANN, has been written in the C++

Programming Language.

A neuron model is the basic information processing

unit in ANNs. The perceptron is the characteristic neuron

model in the MLP [9]. It computes a net input signal u, as a

function f of the input signals x and the free parameters – bias

and weights (b,w). The net input signal is then subjected to an

activation function g to produce an output signal y. Two of the

most used activation functions are the sigmoid function, g(u) =

1/1+e-u , and the hyperbolic tangent function, g(u) = tanh(u).



Vol. 2, No. 5, May 2011

The power of neural computation comes from

connecting many neurons in a network architecture. The

architecture of a neural network refers to the number of

neurons, their arrangement and connectivity. The

characteristic network architecture in the multilayer perceptron

is the so called feed-forward architecture [10]. A feedforward

architecture is usually made up of an input layer of nodes

(units), one or more hidden layers of neurons (nodes), and an

output layer of nodes. Information normally proceeds layer by

layer from the input layer through the hidden layers and then

to the output layer. In this way, a MLP becomes feedforward

network architecture of perceptron neuron models [10].

B.3 THE GANN MODEL

The Unified Modeling Language (UML) is a general

purpose visual modeling language that is used to specify,

visualize, construct, and document the artifacts of a software

system [11]. UML class diagrams usually describe the classes

of the system, the way the classes relate to one another and the

attributes and operations (methods) of the classes.

In order to construct the GANN model for the

multilayer perceptron, a top-down development shall be

followed [11]. This approach normally begins at the highest

conceptual level and works down to the details. In this way, to

create and evolve a conceptual class diagram for the

multilayer perceptron, we iteratively model (i) classes, (ii)

associations, (iii) derived classes and (iv) attributes and

operations. In object-oriented modelling, concepts (objects)

are represented by means of classes. Therefore, a major task is

to identify the main objects of the problem domain. In this

work, the multilayer perceptron is characterized by a neuron

model, network architecture, and associated objective

functional (mean squared error) and training algorithms

(genetic algorithms). The characteristics of the major classes

are as follows:

Perceptron: The class that represents the concept of

perceptron neuron model is called Perceptron.

Multilayer perceptron: The class representing the

concept of MLP network architecture is called

MultilayerPerceptron.

Objective functional: The class that represents the

concept of objective functional of the multilayer

perceptron is called ObjectiveFunctional.

Training algorithm: The class that represents the

concept of training algorithm for a multilayer

perceptron is called TrainingAlgorithm.

Once the main concepts (classes) in the model have been

identified, it is necessary to identify their interrelationship to

one another. For example, the multilayer perceptron is built by

a set of neurons (perceptrons). A multilayer perceptron assigns

an objective functional (mean squared error). An objective

functional (mean squared error) is improved by a training

algorithm (genetic algorithm).

Fig. 2 shows a simplified UML class diagram for the GANN

showing the main classes and their interrelationship.

C. THE DATASETS

In order to have the results comparable to BP-trained MLPs,

some Pattern recognition (classification) problems/datasets

have been applied to evaluate the performance of the GA-

trained neural networks. The datasets used for the neural

network learning are split into two parts: one part on which the

training is performed, is called the training set, and another

part on which the performance of the resulting network is

measured (to test for generalization ability

Fig. 2: A Simplified UML class diagram for the GANN

of the trained network), is called the test set. The idea is

that the performance of a network on the test set estimates its

performance in real use [12]. The following briefly describes

some of the datasets used for this study.

C.1 LOGICAL OPERATORS

Learning logical operations is a traditional

benchmark application for neural networks [11]. Here a single

MLP is used to learn a set of logical operations. In this section,

a neural network is being trained to learn the logical

operations namely, AND, OR, NAND, NOR, XOR and

XNOR. The number of samples in the data set is 4. The

number of input variables for each sample is 2 and the number

of target variables is 6. Table 1 shows the input-target data set

for this problem.

C.2 FISHERS IRIS DATA

The data set contains 3 classes of 50 instances each, where

each class refers to a type of Iris plant. One class is linearly

separable from the other 2. This data set was created from the

‘Iris’ problem data set from the UCI repository of machine-

learning databases.

C.3 PIMA INDIAN DIABETES PROBLEM

This data set is used for diagnosing diabetes among Pima

Indians. This data set includes eight inputs and one output.

The patterns are split with 576 for training and 192 for testing,

totalling 768 patterns. All inputs are continuous, and 65.1% of

the patterns are negative for diabetes. This data set was created



Vol. 2, No. 5, May 2011

from the ‘Pima Indians diabetes’ problem data set from the

UCI repository of machine-learning databases.

C.4 AIRCRAFT LANDING DATA

The data set provided is of image analysis of aircraft

approaching an aircraft carrier. This is data from 5 classes and

there are 200 examples from each class. The patterns are split

with 750 for training and 250 for testing, totaling 1000

patterns.

III EXPERIMENTATIONS

• A series of experiments are discussed to compare the

performance of GA-trained neural networks with BP

trained neural networks using some classification

problems (datasets).

• In comparing the two algorithms, one iteration of the

BP is considered to be equal to one iteration of the

GAs.

• The ability of the BP and GAs to reduce the objective

function (MSE) and the quantity of time (CPU time)

utilized by each of the algorithms are measured to

determine the effectiveness and efficiency of the

algorithms.

Table 1. Logical operations input-target data set.

a b AND OR NAND NOR XOR

XNOR

1 1

1 0

0 1

0 0

1 1 0 0 0

1

0 1 1 0 1

0

0 1 1 0 1

0

0 1 1 0 1

1

• The weights of the initial members of the population

are chosen randomly with a uniform distribution

between -1.0 and 1.0.

A. COMPONENT OF THE Gas

• The weights of the initial members of the population

are chosen randomly with a uniform distribution

between -1.0 and 1.0.

• The activation function for both the hidden and

output units is the hyperbolic tangent function.

• The crossover probability is set at 0.25 and the

mutation rate is set at 0.1.

• The fitness Assignment Method is LinearRanking.

The selection Method is Roulette Wheel Selection.

The recombination (crossover) Method is

Intermediate and the normal mutation method is used.

• The training using GA stops when the best evaluation

(mean squared error) is 0.01 or the maximum number

of generations is reached.

B. COMPONENT OF THE BP

• The BP algorithm described in JAVANNS mentioned

in Section A.1 is used.

• The BP requires epoch learning, i.e., the weights are

updated only once per epoch.

• The Mean squared error (MSE) function is used.

• The learning rate is set 0.5 in all the cases.

• The activation function for the input unit is linear

(Identity) while the activation function for both the

hidden and output units is the sigmoidal function.

• Experiment 1 was designed to compare the

performances of the GA and the BP on the logical

operations problem.


performances of the GA and the BP on the Fisher’s

Iris data.


performances of the GA and the BP on the Pima

India Diabetes data.


performances of the genetic algorithm and the BP on

the Aircraft landing data.

IV RESULTS AND DISCUSSION

As shown in table 2, it took BP only 500 epochs and

consequently less time to train the logical operations problem

against the GAs that used 40,000 epochs and consequently

larger time to reach the same percentage correct classification.

The results also show very significant change in the

performance of the GA-trained network (from 0.0% correct

classification, when the number of epochs is 500 to 100%

correct classification, when the number of epochs is 40000,

thus showing that what GAs needs is more iterations (epochs)

to reach the desired accuracy.

Table 2. Results of Experiment 1 – Comparison of results of

training the logical operation problem with the BP and GAs.

Training

Algorithm

(TA)

# Of

Epochs

(EPC)

CPU

Time

Used

(CPU)

% Correct

Classification(COC)

Best

Evaluation

(MSE)

BP 500

40,000

1

8

100

100

0.0

0.0

GA 500

40,000

2

60

0.0

100

4.4438

0.2

It took BP about 100000 epochs to achieve 100%

correct accuracy and took more than 150000 epochs to achieve

the same percentage correct classification using the Gas as

shown in table 3. There is also significant improvement in the

percentage correct classification of the GA-trained network as



Vol. 2, No. 5, May 2011

the number of iterations increases from 40000 to 150000 (i.e.,

from 70.67% to 96%).

As seen in table 4, it took BP about 100000 epochs to

achieve 100% correct accuracy and took more than 150000

epochs to achieve the same percentage correct classification

using the GAs. There is little significant improvement in the

percentage correct classification of the GA-trained network as

the number of iterations increases from 40000 to 150000 (i.e.,

from 74% to 76.04%).

From table 5, it took BP about 40000 epochs to

achieve 99.6% correct accuracy and took more than 150000

epochs to achieve the same percentage correct classification

using the GAs. The results also show that the percentage

correct classification by BP remains the same in all the cases,

showing that the best number of iterations is 40000 for this

particular experiment. There is also significant improvement

in the percentage correct classification of the GA-trained

network as the number of iterations increases from 40000 to

150000 (i.e., from 46.3% to 86%).

Table 3. Results of Experiment 2 - Comparison of results of

training the Iris data with the BP and GAs.

Training

Algorithm

# of

Epochs

CPU

Time

(Seconds)

Best

Evaluation

(MSE)

% Correct

Classification

BP 40,000

100,000

150,000

147

522

935

0.013

0.0

0.0

99.34

100

100

GA 40,000

100,000

150,000

2116

5540

8310

0.487

0.1109

0.1026

70.67

95.33

96


training the Pima Indian Diabetes data with the BP and GAs.

Training

Algorithm

# of

Epochs

CPU

Time

(Seconds)

Best

Evaluation

(MSE)

%Correct

Classification

BP

40,000

100,000

150,000

985

3173

4260

0.021

0.0052

0.0

98.9

100

100

GA

40,000

100,000

150,000

11320

28360

42540

0.602

0.5670

0.5543

74

75.91

76.04


training the Aircraft landing data with the BP and GA.

Training

Algorithm

# of

Epochs

CPU

Time

(Seconds)

Best

Evaluation

(MSE)

%Correct

Classification

BP 40,000

100,000

150,000

2033

3756

6400

0.0115

0.004

0.004

99.6

99.6

99.6

GA 40,000

100,000

150,000

18400

46000

69000

1.0668

0.4634

0.3881

46.3

81.3

86

V. CONCLUSION

This paper in essence did not try to find a training

algorithm that will be a substitute for BP training algorithm

but rather investigates what happens when the BP and GA are

used to train feedforward neural networks using pattern

recognition examples. The results showed that the BP

outperformed the GAs in this instance. The results also

confirm that MLPs using BP training algorithm are still

considered as universal classifiers [13]. The results imply that

caution should be taken before using other algorithms as

substitutes for the BP algorithm, more especially in

classification problems. The performance of the GAs also

indicated that the GAs can be used as an alternative training

algorithm for MLPs in some cases. To make good comparison

for the two algorithms, more complex experiments are

required to ascertain the performances of both BP and GAs

algorithms especially in other applications other than the

pattern recognition problems.

REFERENCES

[1] S. Udo. Multiple Layer Perceptron Training Using

Genetic Algorithms European Symposium on

Artificial Neural Networks (ESANN’2001).

[2] J. Branke. Evolutionary Algorithms for Neural

Network Design and Training, Proceedings of the

first Nordic Workshop on Genetic Algorithms and its

Applications 1995, Vaasa, Finland, (1995).

[3] C. Stergiou and D. Siganos. Neural Networks.

Accessed 20/04/10. [Internet] Available from:

http://www.doc.ic.ac.uk/~nd/surprise_96/journal/v

ol4/cs11/report.html , (1996) [Accessed 20/05/09].

[4] X. Yao. Evolving Artificial Neural Networks.

Proceedings of the IEEE, Vol. 87, No. 9, pp. 1423 –

1447, (1999).

[5] D. Rinku. Evolutionary Neural Networks: Design

Methodologies. [Internet] Available from: http://ai-


http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html

http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html

http://ai-depot.com/articles/evolutionary-neural-networks-design-methodologies/,(2003)


Vol. 2, No. 5, May 2011

depot.com/articles/evolutionary-neural-networks-

design-methodologies/,(2003)

[Accessed 02/04/10].

[6] M. Mitchell. An introduction to genetic algorithms.

MIT Press, (1996).

[7] D. J. Montana and L. Davis. Training Feedforward

Neural Networks Using Genetic Algorithms,

Proceedings of the International Joint Conference on

Artificial Intelligence, pp. 762-767, (1989).

[8] P. Koehn. Combining genetic algorithms and neural

networks: The encoding problem. Master’s thesis,

University of Tennessee, Knoxville. Accessed

20/04/10. Available online from:

ftp://archive.cis.ohio-

state.edu/pub/neuroprose/koehn.encoding.ps.Z.,

(1994)

[9] C. M. Bishop. Neural Networks for Pattern

Recognition. Clarendon Press: Oxford, (1995).

[10] Rumbaugh, J. et al. The Unified Modeling Language

Reference Manual. Addison Wesley, (1999).

[11] R. Lopez and E. Oñate. A Software model for the

multilayer Perceptron

IADIS International Conference of Applied

Computing 2007, pp. 464 – 468, (2007).

[12] L. Prechelt. PROBEN1 A Set of neural network

benchmark problems and benchmarking rules.

Technical report, Fakutat Fur Informatik, Universitat

Karlsruhe. Doc: pub/papers/techreports/1994/1994-

21.ps.Z, Data :/pub/neuron/-Proben1.tar.gz from

ftp.ira.uka.de, (1994).

[13] K. Hornik, M. Stinchcombe and H. White. Multilayer

feed-forward networks are universal approximators.

Neural Networks 2(5): 359–366, (1989).

.

Joy Ulumma Chukwuchekwa received her M.Sc. degree

from University of Glarmorgan, United Kingdom, in

Intelligent Computer Systems in 2010. She also holds M.Sc

degree in Applied Mathematics from Federal University of

Technology, Owerri, Nigeria where she is at present lecturing.

She is a member of IEEE Computer Society and Society of

Industrial and Applied Mathematics, all based in USA


ftp://archive.cis.ohio-state.edu/pub/neuroprose/koehn.encoding.ps.Z

ftp://archive.cis.ohio-state.edu/pub/neuroprose/koehn.encoding.ps.Z

ftp://ftp.ira.uka.de/

paper-2

Documents