soft computing and intelligent systems || evolutionary algorithms and neural networks

CHAPTER 6

Evolutionary Algorithms and Neural Networks

R.G.S. ASTHANA Business Standard Ltd., 5 BSZ Marg, New Delhi 110 002, India

1 INTRODUCTION

Discovering relationships between sets of data has always been a very active research problem. Statistical methods were developed as valuable tools to find relationships of observed facts. These methods are based on assumptions such as that the data are normally distributed, that the equation relating the data is of a specific form (e.g. linear, quadratic or polynomial), and that the variables are independent. However, real-life problems seldom meet these criteria. Artificial neural networks (ANNs) are not limited by these assumptions and serve as strong predictive models. An ANN can uncover complex relationships but provides little insight into the underlying mechanisms that describe the relationship. Evolutionary algorithms are not only robust methods of exploring complex solution spaces, but also provide some insight into revealing the mechanisms relating data items.

While neural networks are metaphorically based on learning processes in individual organ- isms, evolutionary algorithms (EAs) are inspired by evolutionary change in populations of individuals. Relative to neural nets, evolutionary algorithms have only recently gained wide acceptance in academic and industrial circles. Evolutionary algorithms describe computer-based problem solving techniques that use computational models of some of the known mechanisms of evolution as key elements in their design and implementation. EAs maintain a population of structures that evolve according to rules of selection and through use of operators, which are referred to as "search" or "genetic operators," such as recombination and mutation. A measure of fitness in the environment is assigned to each individual in the population. Reproduction focuses attention on high-fitness individuals, thus exploiting the available fitness information. Recombination and mutation perturb those individuals, providing general heuristics for explora- tion.

The performance of ANN is critically dependent on factors such as the choice of primitives (neurons), network architectures and the learning algorithm used. The learning phase of an ANN

111

112 CHAPTER 6 / EVOLUTIONARY ALGORITHMS AND NEURAL NETWORKS

involves the search for a suitable setting of weights, which are adjusted within a-priori specified network topology under guidance from training samples in a specified problem environment. Although one may obtain suitable settings for the parameters of the network, the capability of the resulting network to generalize on data not encountered during training, or the cost of the hardware realization and use of the network may be far from optimal. In addition, lack of sound design principles in the development of large-scale ANNs presents further difficulty for a variety of real-life problems.

This has led researchers to develop techniques for automating the design of neural architectures to solve specific problems under given design and performance constraints. The combination of ANNs and evolutionary techniques offers an attractive approach to solving complex problems where neither the detailed structure nor the size of the solution is known in advance. The evolutionary algorithm introduces changes in the topology of an ANN one step at a time and measures the performance of the network in terms of reduced errors and improved training times. The changes introduced in the network topology include aspects such as connectivity, number of layers in the network, and so on. Topology changes that result in improved performance are retained and others are discarded. Hence, an ANN can be forced to grow and perform a set of specified tasks better.

ANNs offer an attractive paradigm for the design and analysis of adaptive intelligent systems. Wide ranges of applications in artificial intelligence and cognitive modeling [2, 4, 5, 32, 42] using ANNs have been developed. These include 2- and 3-dimensional visual pattern recognition and classification, motion detection, speech recognition, language processing, signal detection, inductive and deductive inference, financial forecasting, adaptive control, planning, robotics, potential for massively parallel computation, robustness in the presence of noise, resilience to the failure of components, amenability to adaptation, and learning by the modification of computational structures. EAs [48], together with ANNs, offer an attractive and relatively efficient, randomized opportunistic approach to searching for near-optimal solutions in a variety of problem domains.

In this chapter, we give a brief review of the evolutionary algorithm followed by evolutionary design of neural networks. A few applications of EAs and combinations of EAs and neural networks are also described.

2 BRIEF REVIEW OF EVOLUTIONARY ALGORITHMS

The term EA covers topics such as genetic algorithms (GAs), evolutionary programming (EP), evolution strategies (ES), classifier systems (CS), and genetic programming (GP). EAs are iterative. Iteration is also referred to as a "generation." The basic EA begins with a population of randomly chosen individuals. In each generation, the individuals "compete' among themselves to solve a given problem. Individuals that perform better are more likely to "survive" to the next generation. Those surviving to the next generation may be subject to small, random modifica- tions. If the algorithm is properly set up, the quality of the solutions contained in the population improves as the iteration progresses.

As in the case of neural nets, EAs vary widely in their degree of biological realism. There are wide ranges of implementation details, which can have a profound effect on the outcome. While the theory to explain the behavior of evolutionary algorithms exists, it is far from complete. EAs, therefore, are loosely modeled on processes that appear to be at work in biological evolution and working of the immune system. Evolution is determined by natural selection process, that is, each individual competes for resources in a specified environment. Individuals who are better than others are more likely to survive and propagate their genetic material.

2 BRIEF REVIEW OF EVOLUTIONARY ALGORITHMS 113

The encoding for genetic information (the genome) is done in nature in a way that admits asexual reproductionuresulting in offspring that are genetically identical to the parent (e.g. in bacteria). The sexual reproduction process allows the exchange of pieces of genetic information between chromosomes, producing offspring that contain a combination of information from each parent. This operation is called recombination or crossover because of the way that biologists have observed strands of chromosome crossing over during the exchange.

In the recombination approach, the selection of who gets to mate is largely a function of the fitness of the individual. Luck (or random effect) often plays a major role in the selection. EAs use either a simple function of the fitness measure to select individuals for genetic operations such as crossover/asexual reproduction (a case in which the propagation of genetic material remains unaltered) or a model in which certain randomly selected individuals in a subgroup compete and the fittest is selected. This is called tournament selection.

Genotypes are binary strings of some fixed length (say n) that code for points in an n- dimensional Boolean search space [30] in a simple EA. A population of genotypes can be seen as elements of a high-dimensional search space or as an arrangement of genes represented as strings, where each gene takes on values from a suitably defined domain of values (also known as alleles). A simple crossover operator is defined as follows. Let A - a 1, a 2 . . . . . a n and B - bl, b2 . . . . . b, be genotypes from the population P( t ) . Select at random a number x from 1,2 . . . . . n - 1. The two genotypes are formed from A and B by exchanging sets of alleles to the fight of position of x, yielding Ane w - a 1, a 2 . . . . . a bx+l, b 2 . . . . . b n and Bne w -- a 1, a 2 . . . . .

x,

a x . . . . . bx+l, b2 . . . . . b n .

Crossover is performed with probability p c r o s s (a common value is 0.7) between parents (two selected individuals). Parents exchange parts of their genomes (i.e., encoding) to form two new individuals called offspring. In its simplest form, substrings are exchanged after randomly selected crossover point. The evolutionary process assisted by crossover operation moves toward "promising" regions of the search space.

The crossover operator is able to generate all possible values of the genotypes even if the population contains only one cope of a specific allele value. However, as the GA proceeds to generate new genotypes, it is always possible to lose the last copy of an allele value. With the second operator, called mutation, we prevent the population of genotypes from losing a specific value of an allele. This operator is an "insurance" against the loss of information inside a population of genotypes.

The mutation operator prevents premature convergence to local optima by randomly sampling new points in the search space. It is carried out by flipping bits at random, with some probability p m u t .

Nature introduces diversity through mutation. In EAs, it is achieved by randomizing the genes in the population. A pseudo code for EAs is given below:

begin EA t := 0 {/initial time at the start of the EA/}

Initialize population P(t) Evaluate population P ( t ) {/i.e., compute fitness of all initial individuals in population/} while not done do

t : = t + 1 Select P ( t ) from P ( t - 1) {/select sub-population for offspring production/}

Crossover P ( t ) Mutate P ( t ) Evaluate P ( t ) end while end EA


2.1 Genetic Algorithms (GA) The most popular evolutionary algorithm is the genetic algorithm of J. Holland [28, 38]. GA is an iterative procedure and its starting condition is a large set of random strings called genes or the genome. The genes are linear arrays of numbers. Each number represents an aspect of the system to which the algorithm is being applied; for example, if we are dealing with neural network topology, then one of the numbers could represent the number of layers in a particular network.

When the algorithm is run, each of the networks represented by the population of genes is tested and graded according to its performance. The genes are then copied with a probability that is larger if their performance was greater. That is, the genes that produce a network with poor performance are less likely to get copied; those with good performance are more likely to get copied. The copying process therefore produces a population that has a large number of better- performing genes.

Having completed the copying process, the genes are then "bred" by crossing over some of the numbers in the arrays at random points. A small random change called a "mutation" is introduced to add some diversity. The whole process is then repeated.

Each genome or gene encodes a possible solution in a given problem spacemreferred to as the search space. This space comprises all possible solutions to the problem at hand. The symbol alphabet used is often binary but this has been extended in recent years to include character- based encodings, real-valued encodings, and tree representations.

The following steps are required to define a basic GA:

1. Create a population of random individuals (e.g., a mapping from the set of parameter values into the set of (0-1) such that each individual represents a possible solution to the problem at hand.

2. Compute each individual fitness, i.e., its ability to solve a given problem. This involves finding a mapping from bit strings into the reals, the so-called fitness function.

3. Select individual population members to be parents. One of the simplest selection procedure is the fitness-proportionate selection, where individuals are selected with a probability proportional to their relative fitness. This ensures that the expected number of times an individual is chosen is approximately proportional to its relative performance in the population. Thus, high-fitness individuals stand a better chance of "reproducing," while low-fitness ones are more likely to disappear.

4. Produce children by recombining patent material via crossover and mutation and add them to the population.

5. Evaluate the children's fitness. 6. Repeat step (3) to (5) until a solution with required fitness criteria is obtained.

Selection alone cannot introduce any new individuals into the population. Genetically inspired operators such as crossover and mutation are used to find new points in the search space. The most important genetic operator is the crossover operator. As in biological systems, the crossover process yields recombination of alleles via exchange of segments between pairs of genotypes. Genetic algorithms are stochastic iterative processes that are not guaranteed to converge. The termination condition may be specified either as some fixed, maximal number of generations, or on reaching a predefined acceptable fitness level.

So-called premature convergence is a problem often discovered in small population GAs. It arises from the fact that at the beginning there may be some extraordinary genotypes in a


population of mediocre others. In the first couple of generations these genotypes take over a huge part of the population before the crossover operator is able to construct a more diverse set of good genotypes.

This leading cause of premature convergences can be avoided by fitness scaling. With this method, the few extraordinary genotypes are scaled down, while the lowly members of the population get scaled up.

The genetic algorithm is widely used in practical contexts [17, 18] such as financial forecasting and management science.

EXAMPLE [31, 50] A simple example is given. The parameters are:

�9 Population"

�9 Representation"

�9 Fitness value:

4 individuals (Typical number is 50 to 1000) Binary-encoded strings (genomes) of length 8 Equals the number of ones in the bit string, with p c r o s s - 0.7, and p m u t - 0.001.

The initial (randomly generated) population might look like this"

Label Genome Fitness A 00000110 2 B 11101110 6 C 00100000 1 D 00110100 3

Selection" Choose 4 individuals or two sets of parents {B,D} and {B, C}, with probabilities proportional to their fitness values. Selection was probabilistic so A was not selected.

�9 Crossover is effected between the selected parents with probability say, pcross, resulting in two offspring. If the crossover is not done, then the offspring will be exact copies of each parent. Let us assume that crossover takes place between parents B and D at the (randomly chosen) first bit position, forming offspring E - 10110100 and F - - 01101110. No crossover is effected between parents B and C, forming offspring that are exact copies of B and C.

�9 Mutation: Each offspring is subjected to mutation with probability p m u t per bit. Offspring E is mutated at the sixth position to form E ' - 10110000, offspring B is mutated at the first bit position to form B' - 01101110, and offspring F and C are not mutated at all. Next generation population: The above operators of selection, crossover, and mutation create the next generation. The result is

Label Genome Fitness E' 10110000 3 F 01101110 5 C 00100000 1 B' 01101110 5

Although in the new population the best individual with fitness 6 has been lost, the average fitness has been increased. Iterating this procedure, the genetic algorithm finds a perfect string, that is, with maximal fitness value of 8.


2.2 Single Gene Methods [41] A single linear array is used in this method. As in the case of a GA, each number in the array represents a property of the network. For example, an ANN an array with four numbers is a b c d where a is a number representing the number of layers in the network, b represents the number of neurons, c represents the connectivity, and so on.

Mutation is introduced by picking one of the numbers at random. Two new values (at random) are taken, one being slightly above the original number and the other one slightly below, and two new genes are produced. These new genes are tested against the original gene and the best combination of numbers is kept. This process is repeated and so the numbers in the gene eventually converge to an optimum solution [5, 6].

This procedure might be thought of in terms of biology as a mutation producing a whole new species of animal, whereas a GA is a population of animals producing the fittest provided the range of the GA is not too large. A single gene and the GA approach can be combined in the same evolutionary algorithm in such a way that the single gene produces gross changes and the GA fine tunes the system.

2.2.1 Improving the Single Gene Approach. It was found that algorithm can converge faster if the network that produced the greatest changes in training times is prioritized. For example, if the quantum of change in the training times is higher then it is preferable to first change the number of neurons in a given layer. Attempts to change the bias of the network should be done later. Application of such strategies reduces search times for an optimal solution by one-third.

2.3 Evolutionary Programming (EP) [25, 26] This is a stochastic optimization strategy similar to genetic algorithms. It places emphasis on the behavioral linkage between parents and their offspring, rather than seeking to emulate specific genetic operators as observed in nature. Evolutionary programming is similar to evolution strategies. Like both the ESs and GAs, EP is a useful method of optimization when other techniques such as gradient descent or direct analytical discovery are not possible. Combinatoric and real-valued function optimization, in which the optimization surface or fitness landscape is "rugged," possessing many locally optimal solutions, are well suited for evolutionary programming.

For EP, a fitness criterion can be characterized in terms of variables, for which an optimum solution exists in terms of these variables. For example, to find the shortest path in a Traveling Salesman problem, each solution would be a path. The length of the path (say, from location a to location b) can be expressed as a number, which in turn would serve as the fitness criteria for the solution. The fitness criteria for this problem could be characterized as a hypersurface proportional to the path lengths in a space of possible paths. The goal would be to find the globally shortest path in that space or, more practically, to find very short tours very quickly. The basic EP method involves three steps (repeated until a threshold for iteration is exceeded, or an adequate solution is obtained):

1. Choose an initial "population" of trial solutions at random. The number of solutions in a population is highly relevant to the speed of optimization, but no definite answers are available as to how many solutions are appropriate (other than > 1) and how many solutions are wasteful.

2. Each solution is replicated into a new population. Each of these offspring solutions is mutated.


3. Each offspring solut ion is assessed by comput ing its fitness. Typically, a stochastic tournament is held to determine N solutions to be retained for the popula t ion o f solutions, a l though this is occasional ly per formed deterministically. The popula t ion size need not be held constant and more than one offspring can be generated from each parent.

Genet ic p rog ramming and evolut ionary p rogramming are compared in Table 1 based on representat ion, extent o f use o f genetic operators, and opt imizat ion strategy.

Evolu t ionary strategy and evolut ionary p rogramming are compared in Table 2 based on select ion and recombinat ion strategies.

A pseudo-code for EP is given below:

begin EP t : = 0 Initialize popula t ion P(t)

{start with an initial time} {initialize a usual ly r andom popula t ion o f individuals}

Table 1 Differences between GA and EP

Item Genetic algorithm (GA) Evolutionary programming (EP)

Representation

Genetic operators Mutation

Optimization strategy

No constraint (see note 1)

No constraint in using crossover Simple mutation used

Gradient descent or direct analytical discovery

Follows from the problem (see note 2)

Typically does not use any crossover Ranges from minor to extreme

(see note 3) Used where GA fails

1. The typical GA approach involves encoding the problem solutions as a string of representative tokens, the GENOME. 2. A neural network can be represented in the same manner as it is implemented, for example, because the mutation

operation does not demand a linear encoding. In this case, for a fixed topology, real-valued weights could be coded directly. Mutation operates by perturbing a weight vector with a zero mean multivariate Gaussian perturbation. For variable topologies, the architecture is also perturbed, otlen using Poisson-distributed additions and deletions.

3. The mutation operation changes aspects of the solution according to a statistical distribution, which introduces minor variations in the behavior of the offspring. Further, the severity of mutations is often reduced as the global optimum is approached. The "meta-evolutionary" technique is commonly used in which the variance of the mutation distribution is subject to mutation by a fixed-variance mutation operator and evolves along with the solution.

Table 2 Main Differences between ES and EP

Item Evolutionary programming (EP) Evolution strategy (ES)

Selection

Recombination

Uses stochastic selection via a tournament (see note 1)

Not used as recombination does not occur between species (see note 3)

Uses deterministic selection (see note 2)

Used in many forms (see note 4)

1. Each trial solution in the population faces competition from a preselected number of opponents and receives a "win" if it is at least as good as its opponent in each encounter. Selection then eliminates those solutions with the least wins.

2. The worst solutions are purged from the population based directly on their function evaluation. 3. EP is an abstraction of evolution at the level of reproductive populations, also referred to as species. 4. ES is an abstraction of evolution at the level of individual behavior. When self-adaptive information is incorporated

this is purely genetic information (as opposed to phenotypic).


Evaluate population P(t)

while not done do t : = t + 1 Select P(t) from P ( t - 1) Mutate P(t) Evaluate P(t)

end while end EP

{compute fitness of all initial individuals in population} {/test for termination criterion time, fitness, etc./}

{/stochastically select the survivors from actual fitness/}

{/evaluate its new fitness/}

2.4 Evolution Strategy (ES) [45] Until recemly, ESs were only used by civil engineers to solve technical optimization problems as an alternative to standard solutions. Usually no closed form analytical objective function is available for technical optimization problems and, hence, no applicable optimization method exists but the engineer's intuition.

In a two-membered or (1 + 1) ES, one parent generates one offspring per generation by applying normally distributed mutations, that is, smaller steps are more likely than big ones, until a child performs better than its ancestor and takes its place. Because of this simple structure, theoretical results for step-size control and convergence velocity can be derived. The ratio between successful and all mutations should come to 1/5--the so-called 1/5 success rule. This first algorithm using mutation only, has then been enhanced to an (m + 1) strategy, which incorporated recombination due to several (i.e. m) parents being available. The mutation scheme and the exogenous step-size control were taken across unchanged from (1 + 1).

Schwefel [57] later generalized these strategies to the multimembered ES, now denoted by (m + 1) and (m, 1), which imitates the following basic principles of organic evolution: a population leading to the possibility of recombination with random mating, mutation, and selection. These strategies are termed the Plus Strategy and the Comma strategy, respectively:

�9 In the Plus case, the parental generation is taken into account during selection. �9 In the Comma case, only the offspring undergoes selection, as the parents die off.

Self-adaptation within ESs depends on the following agents:

�9 Randomness: Mutation cannot be modeled as a purely random process. This would mean that a child is completely independent of its parents.

�9 Population size: The population has to be sufficiently large. Both the current best as well as a set of good individuals should be permitted to reproduce.

�9 Cooperation: In order to exploit the effects of a population (m > 1), the individuals should recombine their knowledge with that of others (cooperate), because one cannot expect knowledge to accumulate only in the best individuals.

�9 Deterioration: In order to allow better internal models to evolve (step size), and to ensure progress in the future, one should accept deterioration from one generation to the next. A limited life span in nature is not a sign of failure but an important means of preventing a species from "freezing" genetically.


The main features of the ES are:

�9 It is adaptable to nearly all sorts of problems in optimization because they need very little information about the problem, especially no derivatives of the objective function.

�9 It is capable of solving high-dimensional, multimodel, nonlinear problems subject to linear and/or nonlinear constraints.

�9 The objective function can be the result of a simulation. It does not have to be given in a closed form. This also holds for the constraints, which may represent the outcome of, e.g., a finite elements method.

�9 It can be adapted to vector optimization problems [45]. They can also serve as a heuristic for NP-complete combinatorial problems like the Traveling Salesman problem, or problems with a noisy or changing response surface.

2.5 Classifier Systems

Cognitive models [10, 30] were initially referred to as "classifier systems" or CSs, and sometimes as CFS. A reinforcement component was added to the overall design of a CFS that emphasized its ability to learn. Thus, the name became "learning classifier systems" (LCSs). LCSs are also called "evolutionary reinforcement learning or ERL.

A classifier system is a set of decision rules, each endowed with a weight that determines the probability with which a rule applicable to the current environment situation is activated. The characteristic learning rule uses a credit assignment process by which the weights of the rules are updated as a function of the payoff they yield. The first learning task in a CS is to modify the strength of the classifiers as the CS accumulates experience. This credit-apportionment is tackled with a bucket brigade algorithm. The second learning task is to generate plausible new classifiers (a set of rules). The task is carried out by GA. Using classifiers as genotypes and their respective strength as fitness parameter, it generates new classifiers to be tested under the bucket brigade algorithm.

Much work is being done on intelligence through dynamic modeling of simple creatures (animat = animal + robot) that satisfy their needs in semirealistic simulated environments, thus requiting adaptiveness, generalization, and learning. One part of the research concerns evolution of wired-in adaptive behavior in populations of animals in response to physical characteristics in the environment (e.g., food availability). The other part concerns learning of adaptive behaviors by animals during their lifetime so as to maximize success at obtaining food expeditiously and avoiding dangers. Work is also done at the borderline between evolution and learning and this is often regarded as "artificial life" rather than evolutionary computation.

A CS obtains its information from an outside world by means of its detector and has a number of effectors that are used to select system action. The outside world is represented as an environmental object to be modified by application. An animat studied extensively by Wilson [65] is a good example to describe CS. An animat lives in a digital microcosm and is incarnated into the real world in the form of an autonomous robot vehicle with sensors/detectors (camera eyes, whiskers, etc.) and effectors (wheels, robot arms, etc.). A LCS algorithm acts as the "brain" of this vehicle, connecting the hardware parts with the software learning component.

The animat world is an artificially created digital world; for example, in Booker's Gofer system, a 2-dimensional grid that contains "food" and "poison" and the Gofer itself, which walks across this grid and tries to learn to distinguish between these two items, and survive well fed. Imagine a very simple animat: A simplified model of a frog, called "Kermit" living in little


ponds~the environment~and having Eyes (i.e., sensorial input detectors) and Hands and legs (i.e. environment-manipulating effectors), which is a spicy-fly-detecting-and-eating device.

A single "if-then" role can be referred to as a "classifier" and encoded into binary strings. Similarly, a set of classifiers (rules) is called a "classifier population" and is used to GA-generate new classifiers from the current population. A CFS starts from scratch, that is, without any knowledge, using a randomly generated classifier population, and we let the system learn its program by induction [40]. This reduces the input stream to recurrent input pattems that must be repeated over and over again to enable the animat to classify its current situation/context and react on events appropriately.

Kermit lives in its digital microwildemess where it moves randomly. Whenever a small flying object appears that has no stripes, Kermit should eat it because it is very likely a spicy fly, or other flying insect. If the insect has stripes (possibly a wasp, hornet, or bee), then Kermit should leave it. If Kermit encounters a large, looming object, it immediately uses its effectors to jump away as far as possible. So, part of these behavior pattems within a specified virtual environment, in this case "pond," are referred to as a "frame" in AI. This knowledge can be represented in a set of "if <condition> then <act ion>" rules as given in the second column of Table 4.

The set of rules for this virtual world is encoded for use within a CS. A possible encoding of the above rule set is CFS-C (Riolo) terminology (see Table 3). The encoded roles are given in the third column of Table 4. In CFS-C terminology rules are encoded as follows:

�9 NAND (/AND followed by NOT) operation is denoted by the ' ~ ' prefix character (see rule #5).

�9 Bits 1-4, i.e. pattem "0000" denotes small, and the pattem "00" in the first part of the second column (bits 5 and 6) denotes flying. The last two bits 7 and 8 of column #2 encode the direction of the object approaching, where "00" means left, "01" means fight, etc.

�9 In rule #4 a "don't care" symbol "#" is used that matches "1" and "0". In other words, the position of the large, looming object is completely arbitrary: a simple fact that can save Kermit's life.

The following steps are required to convert the previous nonleaming CFS into a learning CS or LCS [40]:

Table 3 Knowledge Representation of Classifier System

Rule Knowledge represented as CFS-C (Riolo) # if <condition> then <action> rules terminology

IF THEN

1 IF small, flying object to the left THEN send @ 0000, 00 00 00 00 2 IF small, flying object to the fight THEN send % 0000, 00 01 00 01 3 IF small, flying object centered THEN send $ 0000, 00 10 00 10 4 IF large, looming object THEN send ! 1111, 01 ## 11 11 5 IF not large, looming object THEN send * ~ 1111, 01 ## 10 00 6 IF * and @ THEN move head 15 degrees left 1000, 00 00 01 00 7 IF * and % THEN move head 15 degrees fight 1000, 00 01 01 01 8 IF * and $ THEN move in direction head pointing 1000, 00 10 01 10 9 IF I THEN move rapidly away from direction head pointing 1111, ## ## 01 11


Table 4 Nonlearning and Learning CFS

Algorithm Nonlearning CFS Learning CFS

Start Initialize time Initially empty message list Randomly created population of

classifers Test for cycle termination criterion

(time, fitness, etc.) increase the time counter

1. Detectors check whether input messages are present

2. Compare ML to the classifiers and save matches

LCS: highest bidding classifier(s) collected in ML' wins the "race" and post the(ir) message(s)

LCS: tax bidding classifiers, reduce their strength

LCS: effectors check new message list for output msgs

LCS: receive payoff from environment (reinforcement)

LCS: distribute payoff/credit to classifiers

LCS: Eventually (depending on t), a GA is applied to the classifier population

CS: process new messages through output interface

Iterate end CFS

begin CFS t = O initMessageList ML (t); initClassifierPopulation P(t);

begin LCS t = O initMessageList ML (t); initClassifierPopulation P(t);

while not done do t - t + 1; while not done do t = t + 1;

ML = readDetectors (t);

ML' = matchClassifiers ML,P(t);

ML = sendEffectors ML' (t);

ML = readDetectors (t);

ML' = matchClassifiers ML,P(t); ML' = selectMatchingClassifiers ML',P(t);

ML' = taxClassifiers ML',P(t); ML: = sendEffectors ME' (0; C: = receivePayoff (t);

P' = distributeCredit C, P(t); if criterion then P: = generateNewRules P'(t); else P: =P '

end while end while end LCS

Step 1: The major cycle has to be changed such that the activation of each classifier depends on some additional parameter that can be modified as a result of experience, i.e. reinforcement from the environment; and/or

Step 2: Change the contents of the classifier list, i.e., generate new classifiers/rules by removing, adding, or combining condition/action-parts of existing classifiers.

Table 4 gives (pseudo code both from non-learning and learning CFSs).

2.6 Genetic Programming (GP)

Genetic programming is an automatic programming technique for evolving computer programs that solve (or approximately solve) problems. Starting with thousands of randomly created computer programs, a population of programs is progressively evolved over many generations using for example, the Darwinian principle of survival of the fittest.

Linear GAs (the structure of an individual is a flat bitstring) are adept at developing rule- based systems. They cannot develop equations, GPs can generate symbolic expression and


perform symbolic regressions to the limited extent of modifying structure of the expression but not its contents. In GP [43, 44, 64], the genome, can be represented by a LISP expression. Thus, the evolution is through computer programs, rather than bit strings as in the case of the usual genetic algorithm. Therefore, the objects that constitute the population are not fixed-length character strings that encode possible solutions to the problem but are programs that, when executed, become the candidate solutions to the problem. In GP these programs are called parse trees and not lines of code. Thus, for example, the simple program "a + b . c" would be represented as parse tree:

+ / \ a �9

/ \ b c

or as suitable data structures linked together to achieve this effect. GP methods can be implemented using either a LISP or a non-LISP programming environment.

The programs in the population are composed of elements from the function set and the terminal set, which are typically fixed sets of symbols selected to be appropriate to the solution of problems in the domain of interest.

In GP, the crossover operation is implemented by taking randomly selected subtrees in the individuals (selected according to fitness) and exchanging them. GP does not, usually, exercise any mutation as a genetic operator.

3 EVOLUTIONARY DESIGN OF NEURAL ARCHITECTURES

The key processes in this approach to the design of neural architectures [2, 3, 7-9, 11, 14, 29, 33, 39, 55, 62, 63] are depicted in Figure 1.

In a simple genetic algorithm, genotypes are binary strings of some fixed length (say n) that code for points in an n-dimensional Boolean search space. Each genotype may encode for typically one, but possibly a set of phenotypes or candidate solutions in the domain of interests, for example, a class of neural architectures. The encodings employ genes that taken on numeric values or complex symbol structures. The next step is to develop decoding processes to map these structures into phenotypes (in this case ANNs), which may also be equipped with learning algorithms that train it using environmental stimuli. For example, the weights of the network may also be derived by the encoding/decoding mechanism. This evaluation of a phenotype computes the fitness of the corresponding genotype.

O ffsp r ing i l l | l

E v o l u t i o n a r y C o m p o n e n t

" Se l ec t i on M u t a t i o n

D e c o d i n g T r a i n e d A N N

L e a r n i n g C o m p o n e n t

F i t n e s s E v a l u a t i o n A N N T r a i n i n g

R e c o m b ina t i on

FIGURE 1 Evolutionary design of neural architectures.

3 EVOLUTIONARY DESIGN OF NEURAL ARCHITECTURES 123

The EA operates on the population of such genotypes, and selects genotypes that code for high-fitness phenotypes and reproduces them. Genetic operators, such as mutation, crossover, and inversion are used to introduce variety into the population. Thus, operating over several generations, a population gradually evolves toward genotypes that correspond to high-fitness phenotypes.

3.1 Genotype Representation

The evolutionary ANN design approach is to develop a scheme to represent or encode a variety of neural architectures as genotypes. This salient aspect determines the classes of neural architectures that could evolve and also constraints the choice and speed of the decoding process. For example, if it is necessary to discover neural networks with recurrent structures, the encoding scheme must be sufficiently flexible to describe recurrent neural architectures, and the decoding mechanism must make it into an appropriate recurrent network (phenotype).

The genotype representations may be broadly classified into two categories based on the decoding effort that is needed.

�9 Direct encoding. The transformation of the genotype into a phenotype is trivial. An example of such an encoding is a connection matrix (see Figure 2) that precisely and directly, specifies the architecture of the corresponding neural network.

�9 Indirect encoding. The complexity of the procedure required to decode and construct a phenotype varies from problem to problem. An example of such an encoding is one that uses rewrite rules to specify a set of construction rules that are recursively applied to yield the phenotype. Indirect encodings can be grouped into two broad categories:

(1) Grammatical encodings, e.g., cellular grammars, graph grammars, genetic/biological grammars, geometric grammars etc.

(2) Others, e.g., the genetic programming paradigm--LISP programs, development models, etc.

Other encoding schemes could be deterministic versus stochastic, hierarchical versus nonhier- archical, modular versus nonmodular, and so on.

3.2 Network Topology or Structure of the Phenotypes

Network topology, or the structure of the phenotype in an evolutionary design system, plays a crucial role in the success of a neural architecture in solving any problem. In the elementary (single layer) perceptron there are no hidden neurons; therefore, it cannot classify input patterns that are not linearly separable. In theory, for an m-class classification problem in which the union of the m distinct classes forms the entire input space, we need a total of m outputs to represent all of the classification decisions. A purely feed-forward neural network cannot discover or respond to temporal dependencies in its environment. A recurrent network is needed for this task. Neural network topologies may be classified into two broad types:

Feed-forward networks: those without feedback loops and all connections pointing in one (direction); and

Recurrent networks [6]: those with feedback connections or loops.


Each of the two basic types of topologies may be further classified as networks that are multilayered, strictly layered, randomly connected, locally connected, sparsely connected, fully connected, regular, irregular, modular, hierarchical, and so on.

The choice of the target class of network structures dictates the choice of the genetic representation. This is very much analogous to the choice of the knowledge representation scheme for problem solving using state-space search in artificial intelligence. Thus, the search efficiency can be enhanced by restricting the search space based on a-priori knowledge about the properties of the solution set. Structures are selected based on cost and/or performance considerations with respect to a particular technology. For example, in a VLSI design, preference would be given to locally connected, fault-tolerant, highly modular neural network structures over globally connected nonmodular ones [42].

3.3 Variables of Evolution

Neural architectures are typically specified in terms of the topology (or connectivity pattern), functions computed by the neurons (e.g., threshold, sigmoid) and the connection weights, and/or the learning algorithm used [32]. A more complete description of a neural architecture requires the specification of coordination and control structures and learning structures [42, 46, 52]. Virtually any subset of these variables is a candidate to be operated on by evolutionary processes. For example, a system A might evolve the network connectivity as well as the weights (while maintaining everything else constant), whereas a system B might evolve the connectivity, relying on a perhaps more efficient local search for weights within each network. The time/performance trade-offs for the two systems, on the given problem, will be different, making the choice of variables subjected to evolution, which is an extremely critical factor.

In addition to the network connectivity and the weights, one might evolve a learning algorithm, control or regulatory functions, the functions computed by various neurons, distribution of different types of neurons, relative densities of connections, parameters (and/or processes) governing the decoding of a genotype into a phenotype, and so on.

3.3.1 Back-propagation or Backprop. "Backprop" is short for "back-propagation of error". Strictly speaking, back-propagation refers to the method for computing the error gradient for a feedforward network, a straightforward but elegant application of the chain rule of elementary calculus. By extension, backprop refers to a training method that uses back- propagation to compute the gradient. By further extension, a backprop network is a feedforward network trained by back-propagation.

One of the most widely used supervised training methods for neural nets is the generalized delta rule, the training algorithm that was popularized by Rumelhart et al. [54]. The generalized delta rule (including momentum) is called the "heavy ball method" in the numerical analysis literature. Standard backprop can be used for incremental training (in which the weights are updated after processing each case), but it does not converge to a stationary point on the error surface. To obtain convergence, the learning rate must be slowly reduced. This methodology is called "stochastic approximation."

Another supervised error-correcting learning algorithm realizes a gradient descent in error-- the difference of the actual output of the system and a target output. The simple delta rule or the perceptron convergence procedure is used in networks with only one modifiable connection layer. It can be proven to find a solution for all input-output mappings that are realizable in the simpler architecture. The error surface in such networks has only one minimum, and the system moves on this error surface toward this minimum.


This is not true for back-propagation in a network with more than one modifiable connection layer. In a network with a hidden layer or hidden layers of nodes (i.e., nodes that are neither in the input nor in the output layer), the error surface has, in addition to the global minimum also a local minima, and the system can get stuck in such local error minima. In the simpler case m networks with only one modifiable connection layermthe system stays at the minimum.

The connectivity structure is feedforward between the input and the output layers, i.e., connections from the input layer nodes to the hidden layer nodes and from the hidden layer nodes to the output layer nodes. There are no connections backward, e.g., from the hidden layer nodes to the input layer nodes. There is also no lateral connectivity within the layers. In addition, connectivity between the layers is total in that sense that each input layer node is connected to each hidden layer node and each hidden layer node is connected to each output layer node. Before learning, the weights of these connections are set to small random values. Back- propagation learning proceeds in the following way:

1. An input pattern is chosen from a set of input patterns. This input pattern determines the activations of the input nodes.

2. Setting of the activations of the input layer nodes is followed by the activation of the forward-propagation phase. The activation values of first the hidden units and then the output units are computed. This is done using a sigrnoid activation function.

For batch processing, there is no reason to suffer through the slow convergence and the tedious tuning of learning rates and momenta of the standard backprop. Much of the ANN research literature is devoted to attempts to speed up backprop. However, conventional methods for nonlinear optimization are usually faster and more reliable than any of the "props." It is observed that the topology of the network plays an important role in selecting the learning algorithm as well as the convergence rate.

3.4 Application Domain: An Example [49] The architecture of a network of n units is represented by a connectivity constraint matrix, C, of dimension n x (n + 1), with the first n columns specifying the constraints on the connections between the n units, and the final column denoting the constraint on the threshold bias of each unit. Each entry C 0. indicates the nature of the constraint on the connection from unit j to unit i, or the constraint on the threshold bias of unit i i f j - n + 1. The value of C O. can either be "0", denoting the absence, or "1", indicating the presence of a trainable connection between the corresponding units. The genotype is constructed by concatenating the rows of the matrix, to yield a bit-string of length n x (n + 1) as illustrated in Figure 2. The genetic algorithm maintains a population of such bit strings (each of length n x (n + 1)). Each string codes for network architecture. A fitness-proportionate selection scheme chooses parents for reproduction--crossover swaps rows between parents, while mutation randomly flips bits in the connectivity constraint matrix with some low, predetermined probability.

The architecture encoded by a genotype is first created only by providing connections corresponding to matrix entries of 1. All of the feedback connections are ignored even though they may be specified in the genotype. This system evolves thus on pure feedforward networks. The connections in the network are then set to small random values, and trained using the back- propagation learning rule for a fixed number of iterations. At the end of the learning phase, the total sum squared error E of the network is used as the fitness measure. A higher fitness attribute


Connectivity Constraint Matrix

F rom n e u r o n s ! 2 3 4 5 Bias TO n e u r o n s I ' 0 0 0 0 0 0 . , 000000

2 ' 0 0 0 0 0 0 " . ~ .... 000000 3 i 1 1 0 0 0 ! . . . . . . . .11 110001 4 ~ I l 0 0 0 1 JI. 110~01 S : 0 0 I 1 0 I , - . ' ' , ' " . . . . . O01JOI

l - I t B it - Str ing G e n o t y p e 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 1 1 0 0 0 1 O01101

q)

Cr A N N A r c h i t e c t u r e

FIGURE 2 Evolutionary design of ANN.

for the corresponding genotype is assigned to low values of E as it corresponds to better learning of the task.

This system uses a constraint matrix to specify the network connectivity. The genotype is a simple concatenation of the rows of this matrix. This system uses a direct encoding method and decodes the genotype to construct the phenotype, which, in this case, is a trivial process. Further, feedback connections are ignored by the decoding process. Hence, only networks with feedforward topology are evolved by this system.

In addition, the only variable subjected to evolution is the connectivity (or the topology) of the network. The parameters (connection weights) are determined separately by the back-propagation procedure and are not subject to evolution.

The system was tested on three toy problems--XOR, the four-quadrant problem and the pattem-copying problem--which form its application domain.

3.5 Network Optimization Using Single Gene Technique It was found that although a pseudo-random search through the gene was effective, a more structured approach of grading the strategies from the most effective to the least effective and applying them in that order produced better results. The effectiveness was estimated by testing how long the network took to train to a preset error target on a character recognition problem.

A survey of the work published in the past on the subject of network topology shows that there are several strategies for growing networks [47]. A fully connected two-layer standard network is used as a reference and growth strategies are described in Table 5.

The following strategies are selected out of those listed in Table 5 for use as other strategies that result in very simple schemes:


Table 5 Growth Strategies for ANNs

Growth strategies Description Figure

Reference standard network Change the number of neurons

Change the connections (network pruning) Asymmetry

Sideways connections

Skipping layers

Feedback

Bias units Sideways growth

Two layer fully connected Neurons in a layer are added or reduced without

changing connectivity in that layer Number of connections are increased or decreased Number of connections in the network may be

reduced or increased In synchronous networks sideways connections

can be introduced between neurons in the same layer

Rather than connecting to the layer directly, below a connection may skip a layer. A three-layer network is, therefore, necessary

Feedback may be added to the network and it may also skip layers

Bias may be added to the network A network may add to the neurons in a particular

layer by growing sideways

10 11

I n p u t

O u t p u t FIGURE 3

I n p u t

O u t p u t

FIGURE 6

I n p u t

O u t p u t

FIGURE 9

I n p u t I n p u t

O u t p u t O u t p u t FIGURE 4 FIGURE 5

I n p u t I n p u t

O u t p u t FIGURE 7 FIGURE 8

I n p u t I n p u t

B i a s

O u t p u t O u t p u t FIGURE 10 FIGURE 11


Changing the number of neurons Changing the connectivity Adding bias units Sideways growth

EXAMPLE CHARACTER RECOGNITION �9 Object ive: To identify the 26 letters of the alphabet. �9 General: The network was tested on a character recognition problem using a 5 x 7 pixel

grid (matrix). The output layer had a fixed number of 26 neurons, each representing one letter of the alphabet. The network was set up using the matrix handling commands in the MATLAB mathematics software package and trained using the back-propagation algorithm.

�9 Figure o f merit: Number of cycles taken for the network to train. �9 The measure o f p e r f o r m a n c e : The number of cycles a back-propagation algorithm takes

to train the network to a sum squared error of 0.1 in a character recognition problem. �9 Observa t ion: As the network gets larger, the performance improves until the network

starts to overfit. The network, therefore, has an optimum size for the problem, which we can find by growing it using evolutionary methods (in this case 16 neurons). This method thus represents a powerful means of optimization. Any further alterations to the network after this till result in a degradation of the training times.

�9 Experiments showed that the general approach worked on a variety of different neural network problems. However, the order in which the strategies were applied changed in the different applications. The pseudo-random nature of weight initialization in a back- propagation matrix means that consecutive runs may take slightly different training times to reach a given error goal. Therefore, the same problem was applied four times to the network to obtain an average answer. Table 6 and Figure 12 show the growth strategies adopted and the result of running the algorithm on a typical character recognition network.

�9 Resul ts: The evolutionary algorithm changes the topology of the neural network one step at a time and measures the performance of the network in terms of reduced error and improved training times. Aspects of the network topology may be changed include connectivity, the number of layers in the network, addition of bias units, and sideways growth (found to be least effective). Alterations to the topology that result in better performance are kept, and the others are discarded. Hence, a network placed in an artificial "environment" can be forced to grow and perform its task better.

Table 6 Growth Strategy and Performance

Strategy Number of cycles Results back-propagation takes

to train network

An increase in the number of neurons in the top layer of two layer network

An increase in connectivity Addition of bias

Step 1 to 2 See Figure 12

Step 2 to 3 Step 3 to4 Step 5 to 6 and Step 6 to 7

Marginal reduction

4 APPLICATIONS OF EVOLUTIONARY ALGORITHMS (EA) 129

FIGURE 12 Number of cycles back-propagation takes to train network.

4 APPLICATIONS OF EVOLUTIONARY ALGORITHMS (EA)

In principle, EAs can analyze and solve any computable function, that is, do everything a normal digital computer can do. Special-purpose algorithms that have a certain amount of problem domain knowledge hard-coded into them, however, could outperform EAs. EAs are useful tools when there is no other known problem solving strategy and the problem domain is NP-complete. The following sections describe a few EA applications.

4.1 Biocomputing

Biocomputing, or bioinformatics, is the field of biology dedicated to the automatic analysis of experimental data. Several approaches to specific biocomputing problems have been described that involve the use of GA, GP and simulated annealing. There are three main domains in which GAs have been applied to bioinformatics, viz., protein folding, RNA folding, and sequence alignment. General information about software and databases can be found on the server of the European Bioinformatics Institute: http://www.ebi.ac.uk/ebi_home.html.

4.2 Evolvable Hardware (EHW or "E-HARD") [35-37, 56, 58, 60, 61]

This field is developing by using innovations from the evolutionary computation domain as well as from the hardware field. Recently, the term "evolware" has been used by some researchers to describe such evolving ware, with current implementations centering on hardware while raising the possibility of using other forms in the future, such as bioware. A more accurate title might have been "biologically inspired electronics" (bionics).

The concept of the FPGA (field programmable gate array) provides flexibility to the extent that the functioning of the circuit is determined not in the factory but in the field by the user with a "configuration bit string"ma piece of software that instructs the hardware how to configure itself or wire itself up. FPGAs based on static RAM can be written an infinite number of times. An approach toward designing a E-hard could be to configure bit strings as a genetic algorithm chromosome and then to find a pattem of bit strings that generates the "best" circuit in terms of


some functional criterion. The best pattern is downloaded into an FPGA. In this approach most of the evolution occurs outside the actual piece of hardware.

Each chromosome would reconfigure the circuit: for a GA with a population of P chromosomes, and G generations, the reconfigurable hardware would be rewritten a total of P x G times. That is, the evolution would occur principally inside the circuit. This inside- outside distinction [20] can be characterized as

Intrinsic E-hard: circuit gets configured for each chromosome for each generation. Extrinsic E-hard: the evolution is simulated in software, and only the elite chromosome (i.e.,

the configuring bit string) gets written. Thus, the circuit is configured only once. This process is slower than the intrinsic approach. A hardware description language in the form of trees, which can be evolved GP-style, is also used [30].

Electronics is one of the world's largest industries, involving automobiles, aviation, housing, and oil. With Moore's law of doubling circuit speeds and densities every year or so, electronic circuits are getting so complex that designing them in the traditional "blueprinting" manner is reaching the breaking point. By using a "evolutionary engineering" approach, it is possible to evolve electronic circuits with a complexity beyond human design ability. In the words of de Gaffs [20]

eventually E-hard will grow into a trillion dollar industry. There is a limit to what is designable. Moore's law will continue fight down to one bit per atom, single electron transistors, and the like. It will be totally impractical to design a trillion component electronic devices. It will have to self assemble in an embryological way, but the complexity will be so great that predicting the outcome will be impossible. Hence, the only way left is to test the result. Since the traditional link between structure and function (the basis of traditional electronic design) will be destroyed, the only way to make improvements will be by trial and error, i.e., by random mutation, building/executing the new circuit, and then testing it. If it performs well, it survives. Nature has used this evolutionary engineering technique for billions of years and it produced us.

4.3 Game Playing [26] A simple game has just two moves, X and Y. The players receive a reward, analogous to Darwinian fitness, depending on which combination of moves occurs and which move they adopted. In complex models, there may be several players and several moves.

The players iterate such a game a number of times, and then move on to a new partner. At the end of all such moves, the players will have a cumulative payoff, their fitness. This fitness can then be used as a means of conducting something akin to roulette-wheel selection to generate a new population.

The real key in using a GA is to develop encodings to represent a player's strategies, one that is amenable to crossover and to mutation. Suppose at each iteration a player adopts X with some probability. A player can thus be represented as a real number or a bit string by interpreting the decimal value of the bit string as the inverse of the probability. If a player adopts a move X with probability P(X) then the probability of move Y is given by [1 -P (X) ] .

An alternative characterization is to model the players as finite state machines, or finite automata (FA). These can be thought of as a simple flow chart governing the behavior in the "next" play of the game depending upon previous plays. For example:

4 APPLICATIONS OF EVOLUTIONARY ALGORITHMS (EA) 131

10 Play X 20 If opponent plays X go to 10 30 Play Y 40 If opponent plays X go to 10 else go to 30

represents a strategy that does whatever its opponent did last, and begins by playing X, known as "tit-for-tat" (TFT). Such machines can readily be encoded as bit strings. Consider the encoding "101001" to represent TFT. The first three bits "101" are state 0. The first bit, "1 ", is interpreted as "play X". The second bit, "0", is interpreted as "if opponent plays X go to state 1 ". The third bit, "1", is interpreted as "if the opponent plays Y, go to state 1". State 1 has a similar interpretation. Crossing over such bit-strings always yields valid strategies.

EA-based commercial games are entering the market, viz., Galapagos and Creatures from Anark Software and Alife (see http://alife.santafe.edu/alife/www/), respectively.

4.4 Job Shop Scheduling Problem (JSSP) [19, 51] The job shop scheduling problem is very difficult NP-complete, and branch and bound search techniques are used. GA is used to encode schedules. Research workers have used a variety of encoding schemes. A genome directly encodes the operation completion times instead of binary representation [66], and (ii) genomes represent implicit instructions [21 ] for building a schedule.

The open shop scheduling problem (OSSP) is similar to the JSSP. Fang et al [21] reported results showing reliable achievement (through use of GA) within less than 0.23% of optimal on moderately large OSSPs (so far, up to 20 x 20), including an improvement on the previously best known solution for a benchmark 10 x 10 OSSP. GA has been used successfully to solve the flow shop sequencing problem [53], a simple JSSR

In contrast to job shop scheduling, some maintenance scheduling problems consider which activities to schedule within a planned maintenance period, rather than seeking to minimize the total time taken by the activities. The constraints on which parts may be taken out of service for maintenance at particular times may be very complex, particularly as they will, in general, interact. Expert systems have been widely used to solve such a problem.

4.5 Artificial Painter (AP) The AP is a joint Danish-Italian research project between the Institute of Psychology, the National Research Council, Rome, Italy, and the Department of Computer Science, l~rhus University, Denmark. The AP software uses a GA on ANN. It evolves pictures to be used in artistic design. The evolution of pictures is based on the user's aesthetic evaluation on the number of pictures shown on the screen.

In this environment (a rectangular grid) a number of landmarks are placed, and the network senses the angle and distance of each of the landmarks. The output activity of each single output unit of the ANN is recorded by placing the network in each cell of the environment. Output activity can change due to changes in the angle and distance measures, which act as an input stimuli. To compose the final image each level of output activity is mapped in different colors and shown in a computerized picture in which each pixel is a cell in the environment.

In AP the genotype of each individual (ANN) is represented by a bit string codifying landmark coordinates, neural network weights, the output activation function type (sum, logistic, exponential, trigonometric, etc.), and color mapping (palette of color to map each level of the output to a specific color). Different landmark coordinates give different sensory input, while connection weights provide different activation flows in the ANN.


AP forms a population of individuals. The genotype of each individual is set randomly. Each individual builds a picture, which is shown to the user. According to the genotype, different images will appear. The user selects the most appealing picture. A new generation is formed by cloning each selected picture a fixed number of times (i.e., each picture will have fixed number of children/copies). Each clone is mutated in some randomly chosen parts of the genotype. Mutation alters parameters such as landmark coordinates, connection weights, output activation function type, and color mapping. The selection and cloning process continues until an acceptable picture is obtained.

4.6 Management Sciences Application of EA in management science and closely related fields such as organizational ecology is a domain that has been covered by some EA researchers with considerable bias toward scheduling problems.

4.7 Nonlinear Filtering GAs have already been successfully applied to a large class of nonlinear filtering problems found in radar/sonar/GPS signal processing. The new results also point out some natural connections between genetic-type algorithms, information theory, nonlinear filtering theory, and interacting and branching particle systems.

4.8 Timetabling [1, 15, 16] The first application of GAs to timetabling was to schedule teachers in an Italian high school. At the Department of Artificial Intelligence, University of Edinburgh, UK timetabling of the M.Sc. examinations is now done using a GA [16]. In the examination timetabling case, the fitness function for a genome representing a timetable involves computing degrees of punishment for violating various preferences and avoiding clashes, such as instances of students having to take consecutive examinations, students having three or more examinations in one day, the degree to which heavily-subscribed examinations occur late in the timetable (which makes marking harder), overall length of timetable, and so on. The power of the GA approach is the ease with which it can handle arbitrary kinds of constraints and objectives. All such things can be handled as weighted components of the fitness function, making it easy to adapt the GA to the particular requirements of a very wide range of possible overall objectives.

4.9 Human Motion Modelling [12] Human motion models are important in virtual reality applications. One approach is to use conceptual-level learning that imitates the way humans learn, i.e., through macro-instructions. The human motion modeling system receives input either as motion or as words from a training person. This information is transformed and stored as fuzzy-based knowledge. A tennis motion system was used in the experiments. A camera was developed that detected the human motion model using indicators on the body of a person playing. Several different tennis strokes such as forehand, backhand, and smash strokes were stored in the system. The strokes from other tennis players were then input to test the recognition robustness. On average, experiments showed correct recognition 84.2% of the time.

5 FURTHER WORK 133

4.10 Paper Currency and Document Recognition [13, 27] GA is adapted to find effective masks for paper currency recognition by using masking characteristic parts of an image. A perceptron neural network does the recognition itself. GA is used to find the optimal position of the masks both to obtain shortened training time and to improve the generalization. Recently, this problem has been dealt with by Glory Ltd., which has developed a neuro-board jointly with the University of Tokushima, Japan. They claim that the use of GA results in a machine ten times faster than conventional recognition machines.

4.11 Railway Bridge Fault Detection Objective

�9 To develop methods for the inspection, evaluation, and fault detection of railway bridges. �9 To instrument and remotely monitor the bridges.

Method. The passing of a train over the bridge is assumed to constitute a test. The measure- ments are used to evaluate the state of the bridge and to determine some key damage indices. Currently, the genetic algorithm is being used to determine the parameters of a model of the bridge by minimizing the differences between the model predictions and measurement.

5 FURTHER WORK

It has been observed that when a network becomes too large it starts to overfit and performance drops. Therefore, a different approach is required to tackle problems such as complex image recognition. Preliminary results show that one can use a hierarchical network consisting of several small networks each dedicated to one small task, and each feeding into another network in the layer above that collates the results. Research into the nervous systems of animals indicates that the biological neural structure is organized in this way, with functional areas of the brain having small areas of responsibility. The application of the evolutionary approach to the growth of hierarchical networks is an area with research potential.

Current ANNs use a neuron model of the perceptron. Although this shows good results under some circumstances, the functionality is strictly limited and it is a poor model of a biological neuron, which operates by sending a train of pulses the frequency of which is proportional to their stimulation. However, neurons in the nervous system have widely differing modes of operation and are therefore difficult to simulate. Evolutionary neural networks provide two methods, using (i) a neuron that is capable of producing any mathematical function of its input vector at its output, and (ii) a genetic algorithm to evolve the functionality of the cell itself.

To use evolutionary networks in practical applications, we need also to consider the wiring of the system. Only part of the biological neural network is used for processing. The majority is actually the "wiring" or control system that interfaces the neural network to the outside world. For the artificial neural network to become useful, we must also consider how to interface the network to a similar control system.


6 SUMMARY

Fuzzy logic, neural networks, genetic algorithms, and evolutionary strategies have been treated in parallel for a long time without recognition of the common aspects. Recently, hybrid systems have been developed to unify the advantages of these different strategies. Accepting the advantages of hybrid approaches, nowadays these systems are treated as integral parts of computational intelligence.

The process of evolution and natural selection has developed the biological brain from a simple bundle of cells into the nervous system of the higher species. The development of artificial evolutionary techniques has paved the way for the application of similar methods to ANNs, with excellent results in some specific application areas, but they have not lived up to the initial promise of a realistic simulation of the biological brain.

In this chapter, the current status of evolutionary approaches to the design of neural architectures has been presented. There is a need, however, to develop sound scientific principles to guide the development and application of evolutionary techniques in the design of artificial neural architectures given a variety of design and performance constraints for specific classes of problems.

REFERENCES

[1 ] Abramsson and Abela, 'A Parallel Genetic Algorithm for Solving the School Timetabling Problem', Technical Report, Division ofI.T., C.S.I.R.O, c/o Dept. of Communication and Electronic Engineering, Royal Melbourne Institute of Technology, PO POX 2476V, Melbourne 3001, Australia, April, 1991.

[2] Advances in Fuzzy Logic, Neural Networks, and Genetic Algorithms, IEEE/Nagoya-University World Wisepersons Workshop, Nagoya, Japan, August 9-10, 1994.

[3] E. Alba, J.E Aldana, and J.M. Troya, Genetic algorithms as heuristics for optimizing ANN design, in R.E Albrecht, C.R. Reeves, and N.C. Steele (eds.) Proc. Int. Conf. Artificial Neural Nets and Genetic Algorithms, pp. 683-690. Springer-Verlag, Berlin, 1993.

[4] Rudolf E Albrecht, Colin R. Reeves, and Nigel C. Steele, (eds.) Proc. Int. Conf. Artificial Neural Networks and Genetic Algorithms. Springer-Verlag, Berlin, 1993.

[5] J.A. Anderson and E. Rosenfeld, Neurocomputing, Vols 1 and 2, MIT Press, Cambridge MA, 1991. [6] Peter J. Angeline, G.M. Saunders, and J.B. Pollack, An evolutionary algorithm that constructs recurrent

neural networks, IEEE Trans. Neural Networks, Vol. 5, pp. 54-64, 1994. [7] EZ. Brill, D.E. Brown, and WN. Martin, Fast genetic selection of features for neural network classifiers,

IEEE Trans. Neural Networks, Vol. 3, No. 2, pp. 324-328, Mar. 1992. [8] Aviv Bergrnan, Variation and selection: An evolutionary model of learning in neural networks, Neural

Networks, Vol. 1, No. 1, pp. 75, 1988. [9] Pierre Bessiere, Genetic algorithms applied to formal neural networks, in P. Bourgine, (ed.) ECAL91,

1st European Conference on Artificial Life, Paris. MIT Press, Cambridge, MA, 1993. [10] L.B. Booker, D.E. Goldberg, and J.H. Holland, Classifier systems and genetic algorithms, Artificial

Intelligence, Vol. 40, Nos. 1-3, pp. 235-282, Sept., 1989. [ 11] Stephan Bomholdt and Dirk Graudenz, General asymmetric neural networks and structure design by

genetic algorithms, Neural Networks, Vol. 5, No. 2, pp. 327-334, 1992. [12] Rodney A. Brooks, A robot that walks. Emergent behavior from a carefully evolved network, Neural

Computation, Vol. 1, No. 2, pp. 253-262, 1989. [ 13] Marcelo H. Carugo, Optimization of parameters of a neural network, applied to document recognition,

using genetic algorithms, in Nat. Lab. Technical note no. 049/91, N.V. Philips, Eindhoven, The Netherlands, 1991.

[14] Thomas P. Caudell and Dolan P. Charles, Parametric connectivity: training of constrained networks using genetic algorithms, in Proc. Third Int. Conf. Genetic Algorithms, 1989, pp. 370-374.

[15] A. Colorni, M. Dorigo and V. Maniezzo, Genetic algorithms and highly constrained problems: The time-table case, in Proc. First Int. Workshop Parallel Problem Solving from Nature, Dortmund, Germany. (Lecture Notes in Computer Science No. 496). pp. 55-59, Springer-Verlag, Berlin, 1990.

REFERENCES 135

[16] D. Come, H.L. Fang, and C. Mellish, Solving the modular exam scheduling problem with genetic algorithm, Proc. 6th Int. Conf. Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, ISAL 1993.

[17] Lawrence Davis, Genetic Algorithms and Simulated Annealing. Morgan Kaufmann, Los Altos, CA, 1989.

[18] Lawrence Davis, Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York, 1991. [ 19] L. Davis, Job-shop scheduling with genetic algorithms, in Proc. Int. Conf. Genetic Algorithm, 1985, pp.

136-140. [20] H. de Gaffs, Evolvable hardware: Genetic programming of a Darwin machine, in Artificial Neural Nets

and Genetic Algorithms, Proc. Int. Conf. Innsbruck, Austria. Springer-Verlag, Berlin, 1993, pp. 441- 449.

[21 ] H.-L. Fang, E Ross, and D. Come, A promising genetic algorithm approach to job-shop scheduling, resheduling and open-shop scheduling problems, in Proc. Int. Conf. Genetic Algorithm, 1993, pp. 375- 382.

[22] L. Fogel, A. Owens, and M. Walsh, Artificial Intellligence through Simulated Evolution. Wiley, New York, 1966.

[23] D.B. Fogel, Evolutionary Computation: Toward a New Philosophy of Machine Intelligence. IEEE Press, Piscataway, NJ, 1995.

[24] D.B. Fogel, On the philosophical foundations of evolutionary algorithms and genetic algorithms, in D.B. Fogel and W. Atmar (eds.) Proc. Second Annual Conf. Evolutionary Programming, La Jolla, CA, 1993, pp. 23-29.

[25] L.J. Fogel, Evolutionary programming in perspective: The top-down view, in J.M. Zurada, R.J. Marks, and C.J. Robinson (eds.) Computational Intelligence: Imitating Life, pp. 135-146. Pisataway, NJ, IEEE Press, 1994.

[26] D.B. Fogel, Evolutionary programming: An introduction and some current directions, Statistics and Computing, Vol. 4, pp. 113-129, 1994.

[27] A. Frosini, M. Gori, and P. Pffami, A neural network based model for paper currency recognition and verification, IEEE Trans. Neural Networks, Vol. 7, No. 6, pp. 1482-1490, Nov. 1996.

[28] D. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading, MA, 1989.

[29] S. Grossberg, (ed.), Neural Networks and Natural Intelligence. MIT Press, Cambridge, MA, 1988. [30] D. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley,

Reading, MA, 1989. [31 ] D.E. Goldberg, Genetic Algorithms, Addison-Wesley, Reading, MA, 1989. [32] S. Haykin, Neural Networks, Macmillan, New York, 1994. [33] J. Hertz, A. Krogh, and R.G. Palmer, Introduction to the Theory of Neural Computation, Addison-

Wesley, Reading, MA, 1991. [34] H. Hemm, J. Mizoguchi, and K. Shimohara, Development and evolution of hardware behaviors, Proc.

Artificial Life IV, MIT Press, Cambridge, MA, 1994. [35] T. Higuchi, T. Niwa, T. Tanaka, H. Iba, H. de Gaffs, and T. Furuya, Evolving hardware with genetic

learning: A first step towards building a Darwin machine, Proc. of 2nd. Int. Conf. Simulation of Adaptive Behavior, MIT Press, Cambridge, MA, 1993.

[36] T. Higuchi, H. Iba, and B. Mandrake, Evolvable hardware, in Massively Parallel Artificial Intelligence (H. Kitano, ed.), MIT Press, Cambridge, MA, 1994.

[37] T. Higuchi, et al. (eds.), Proc. First Int. Conf. Evolvable Systems: From Biology to Hardware (ICES96), (Lecture Notes in Computer Science). Springer-Verlag, Berlin, 1997.

[38] J. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor, 1975.

[39] J.H. Holland, Adaptation in Natural and Artificial Systems, MIT Press, Boston, MA, 1992. [40] J.H. Holland, Escaping brittleness: The possibilities of general-purpose learning algorithms applied to

parallel rule based system, in R.S. Michalski-J.G., J.G. Carbomel, and T.M. Mitchel (eds.) Machine Learning: An Artificial Intelligence Approaches, Vol. II, pp. 593-623, Morgan Kaufmann, Los Altos, CA, 1986.

[41 ] O. Holland and M. Snaith, The blind neural network maker: Can we use constrained embryologies to design Animat nervous systems, in Proc. ICANN91, Espoo, Finland, pp. 1261-1264, 1991.

[42] V. Honavar and L. Uhr, Symbolic artificial intelligence, artificial naural networks, and beyond in S. Goonatilake and S. Khebbal (eds.) Hybrid Intelligent Systems. Wiley, London, 1995.


[43] John R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection, MIT Press, Cambridge, MA, 1992.

[44] John R. Koza, Genetic Programming II: Automatic Discovery of Reusable Subprograms, MIT Press, Cambridge, MA, 1994.

[45] E Kursawe, Evolution Strategies for Vector Optimization, pp. 187-193. Taipei, National Chiao Tung University, 1992.

[46] P. Langley, Elements of Machine Learning. Morgan Kaufmann, Palo Alto, CA, 1995. [47] P.H.W. Leong and M.A. Jabri, Connection topologies for digital neural networks, in ACNN-91, Sydney,

Australia, 1991, pp. 34-37. [48] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolutionary Programs. Springer-Verlag,

New York, 1992. [49] G. Miller et al., Designing neural networks using genetic algorithm, in Proc. 3rd Int. Conf. Genetic

Algorithm, 1989, pp. 379-384. [50] M. Mitchell, An Introduction to Genetic Algorithms, MIT Press, Cambridge, MA, 1996. [51] R. Nakano, Conventional genetic algorithms for job-shop problems, in Proc. 4th Int. Conf. Genetic

Algorithm, 1991, pp. 474--479. [52] N. Nilsson, Learning Machines. McGraw-Hill, New York, 1965. [53] C.R. Reeves, A Genetic Algorithm for Flowshop Sequencing, Coventry Polytechnic Working Paper,

Coventry, UK, 1993. [54] D.E. Rumelhart, G.E. Hinton, and R.J. Williams, Learning internal representation by error propagation,

in D.E. Rumelhart and J.L. McClelland (eds.) Parallel Distributed Processing, Vol. 1: Foundations, pp. 318-362, MIT Press, Cambridge, MA, 1986.

[55] M.A. Rudnick, Bibliography: The Intersection of Genetic Search and Arttficial Neural Networks, Technical Report CSE-90-001, Department of Computer Science and Engineering, Oregon Graduate Institute, 1990.

[56] E. Sanchez and M. Tomassini (eds.), Towards Evolvable Hardware, (Lecture Notes in Computer Science No. 1062), Springer-Verlag, Berlin, 1996.

[57] H.P. Schwefel, Collective phenomena in evolutionary systems, in 31st Annual Meeting Int. Soc. General Systems Research, Budapest, 1987, pp. 1025-1033.

[58] M. Sipper, et al., A phylogenetic, ontogenetic, and epigenetic view of bio-inspired hardware systems, IEEE Trans. Evolutionary Computation, Vol. 1, No. 1, pp. 83-97, 1997.

[59] Stephen Bornholdt and Dirk Graudenz, General asymmetric neural networks and structure design by genetic algorithms, Neural Networks, Vol. 5, No. 2, pp. 327-334, 1992. (DESSY 91-046, Deutsches Electronen-Synchrotron, Hamburg, Germany, May 1991.)

[60] S. Nolfi, O. Miglino, and D. Parisi, Phenotypic plasticity in evolving neural networks, in P. Gaussier and J.-D. Nicoud (eds.) Perception Action Conference, IEEE Computer Society Press, pp. 146-157, 1994.

[61] A. Thompson, Evolving electronic robot controllers that exploit hardware resources, in Proc. 3rd European Conf. Artificial Life. Springer-Verlag, Berlin, 1995.

[62] Vaario Jail and Setsuo Ohsuga, Adaptive neural architectures through growth control, in Intelligent Engineering Systems through Artificial Neural Networks, Conference Proceedings, pp. 11-16, 1991.

[63] Mark Watson, Common Lisp Modules~Artificial Intelligence in the Era of Neural Networks and Chaos Theory, Springer-Verlag, Berlin, 1991.

[64] A. Walter, Tackett genetic programming for feature discovery and image discrimination, in Stephanie Forrest (ed.) Proc. Fifth Int. Conf. Genetic Algorithms. Morgan Kauffman, Palo Alta, CA, 1993.

[65] S.W. Wilson and D.E., Goldberg, A critical review of classifier systems, Proc. 3rd Int. Conf. Genetic Algorithm, 1989, pp. 244-255.

[66] T. Yamada and R. Nakano, A genetic algorithm applicable to large-scale job-shop problems, in R. Manner and B. Manderick (eds.), Parallel Problem Solving from Nature (PSN92), pp. 281-290, North- Holland, Amsterdam, 1992.

soft computing and intelligent systems || evolutionary algorithms and neural networks

Documents