crossover in grammatical evolution

Genetic Programming and Evolvable Machines, 4, 67–93, 2003© 2003 Kluwer Academic Publishers. Manufactured in The Netherlands.

Crossover in Grammatical Evolution

MICHAEL O’NEILL [email protected] of Computer Science & Information Systems, University of Limerick, Ireland

CONOR RYAN [email protected] of Computer Science & Information Systems, University of Limerick, Ireland

MAARTEN KEIJZER [email protected] University, Amsterdam

MIKE CATTOLICO [email protected] Mountain Scientific, Inc.

Received April 8, 2002; Revised October 31, 2002

Abstract. We present an investigation into crossover in Grammatical Evolution that begins by examininga biologically-inspired homologous crossover operator that is compared to standard one and two-pointoperators. Results demonstrate that this homologous operator is no better than the simpler one-pointoperator traditionally adopted.

An analysis of the effectiveness of one-point crossover is then conducted by determining the effects ofthis operator, by adopting a headless chicken-type crossover that swaps randomly generated fragmentsin place of the evolved strings. Experiments show detrimental effects with the utility of the headlesschicken operator.

Finally, the mechanism of crossover in GE is analysed and termed ripple crossover, due to its defin-ing characteristics. An experiment is described where ripple crossover is applied to tree-based geneticprogramming, and the results show that ripple crossover is more effective in exploring the search spaceof possible programs than sub-tree crossover by examining the rate of premature convergence duringthe run. Ripple crossover produces populations whose fitness increases gradually over time, slower than,but to an eventual higher level than that of sub-tree crossover.

Keywords: grammatical evolution, genetic programming, ripple crossover, homologous crossover, head-less chicken crossover, sub-tree crossover

1. Introduction

While crossover is generally accepted as an explorative operator in string basedG.A.s [4], the benefit or otherwise of employing crossover in tree based GeneticProgramming is often disputed. Work such as [2] went as far as to dismiss GP as abiological search method due to its use of trees, while [1] presented results whichsuggested that crossover in GP can provide little benefit over randomly generatingsub-trees in certain cases.Grammatical Evolution (GE) [13, 15, 19] utilises linear genomes and, as with

GP systems, has come under fire for its seemingly destructive crossover operator,

68 o’neill et al.

a simple one-point crossover inspired from GA’s. In this paper we address crossoverin GE, seeking answers to the question of how destructive our one-point crossoveroperator is, and to establish if the system could benefit from a biologically inspiredcrossover. Some earlier work in this area has been reported in [14] and [16].By default, GE employs a standard GA variable length, one-point crossover oper-

ator as follows: (i) Two crossover points are selected at random, one on each indi-vidual (ii) The segments on the right hand side of each individual are then swapped.Other researchers have proposed a number of novel crossover operators [3, 8, 9].

Langdon and Francone et al. derived different homologous crossover operators, theformer on tree based GP and the latter on linear structures [3, 8]. These homologouscrossover operators draw inspiration from the process of recombination that occursduring meiosis (i.e., the production of sex cells) in biological organisms [10]. Theprinciple being exploited is the fact that in nature the entities swapping geneticmaterial only swap fragments that belong to the same position and are of similarsize. This, it has been proposed, results in more productive crossover events for GP.Indeed, results from both Langdon and Francone et al. provide evidence in supportof this claim.The homologous crossover operator applied to linear genomes in [3] is called

Sticky Crossover, and operates by swapping instructions at the same locus, but makesno attempt to swap functionally equivalent code segments. In the tree-based homol-ogous crossover [9] the crossover point on the first parent is selected as normal inGP. The crossover point on the second parent is determined by taking into accountthe size of the sub-tree created from the first crossover point, and selecting thesub-tree in the second parent that is closest to it. Closeness is measured by lookingat the tree shapes, and at the distance between the two crossover points and theroot nodes of the individuals.A consequence of these conservative crossover operators is a reduction in the

bloat phenomenon, which is due, at least in part, to the fact that these new operatorsare less destructive. Bloat is a phenomenon whereby the sizes of individuals ina GP population increase dramatically over the duration of a run, largely due toredundant code. Suggestions have been made that destructive crossover events couldbe responsible for bloat; arising as a mechanism to prevent destructive crossoverevents occurring by acting as buffering regions in which crossover can occur withoutharming functionality [12]. The production of increasingly longer genomes thenbecomes unnecessary with the adoption of a homologous crossover.The paper is structured as follows. A novel biologically-inspired, homologous

crossover operator for GE is presented in Section 2. Following on from the disap-pointing results for the homologous operator experiments we conduct an analysisof the effectiveness of the simple one-point crossover, in Section 3, by switching theoperator off, and by using a headless chicken crossover. This set of experiments isdesigned to test the hypothesis that the one-point operator is acting in a produc-tive fashion with the exchange of useful building blocks. Finally, the mechanism ofcrossover in GE is investigated in Section 4. This part of the investigation providesinsights into the type of fragments and the manner in which these fragments areexchanged, consequently the one-point operator is dubbed ripple crossover due to

crossover in grammatical evolution 69

its effects on the derivation trees. Investigations are then conducted with the appli-cation of ripple crossover to a tree-based GP system, where it is compared to astandard sub-tree crossover operator.

2. Homologous crossover

This section proposes a new form of operator inspired by molecular biology and thenovel homologous crossover operators designed for GP. We compare the standardGE one-point crossover to two different versions of this homologous crossover, aswell as two alternative forms of a two-point crossover operator.The standard GE homologous crossover, an illustration of which can be seen in

Figure 1, proceeds as follows:

1. During the mapping process a history of the rules selected in the grammar isstored for each individual.

2. The histories of the two individuals to crossover are aligned.3. Each history is read sequentially from the left while the rules selected are iden-

tical for both individuals, this region of similarity is noted.4. The first two crossover points are selected to be at the boundary of the region

of similarity, these points are the same on both individuals.5. The two second crossover points are then selected randomly from the regions of

dissimilarity.6. A two-point crossover is then performed.

The reasoning behind this operator is the facilitation of the recombination ofblocks that are in context with respect to the current state of the mapping process.In this case these blocks are of differing lengths.The second form of the homologous crossover operator differs only in that it

swaps blocks of identical size. In step 5, the two second crossover points are at thesame locus on each individual.The two-point crossover operators employed are the standard GA two-point oper-

ator (fragments of unequal size), and one in which the size of the fragments beingswapped are the same.

2.1. Experimental approach

For each type of crossover, 100 runs were carried out on the Santa Fe trail andsymbolic regression problems. Performance of each operator was measured in termsof each of the following:

1. Cumulative frequency of success2. Average size of fragments being swapped at each generation3. Ratio of the average fragment size being swapped to the average genome length4. Ratio of crossover events resulting in successful propagation of the individual to

the next generation and the total number of crossover events.

70 o’neill et al.

PARENT 1

PARENT 2

0 1 0 1 1 3 0 3 Rules

0 1 0 4 0 2 1 0 Rules

PARENT 1

PARENT 2

0 1 0 1 1 3 0 3 Rules

0 1 0 4 0 2 1 0 Rules

Codon IntegersPARENT 2

2 13 40 7 4 5 1 100

2 13 40 1 3 240 100 23 PARENT 1

First Crossover Point at Boundary of Similarity

Codon Integers

0 1 0 1 1 3 0 3 Rules

Rules 0 1 0 4 0 2 1 0

(i)

(ii)

(iii)

Figure 1. Depicted is the homologous crossover of GE. (i) Shows two parents represented as theircodon integer values on top, and the corresponding rules selected during the mapping process beloweach integer value. (ii) The rule strings (mapping histories) are aligned, and the region of similaritynoted (underlined). The first crossover points are selected at this boundary. (iii) The second crossoverpoints are then selected after the boundary of similarity for each individual.

Measures (2) and (3) have been used previously in [18] to measure the amountof genetic material exchanged during crossover. Measure (4) is used to determinethe productiveness of the crossover operator by looking at the number of individ-uals that are propagated to the next generation after having undergone crossover.Tableaus describing the parameters and terminals are given in Tables 1 and 2. Thegrammars used for each problem are given below.

Symbolic regression

<expr> ::= <expr> <op> <expr> | ( <expr> <op> <expr> )| <pre-op> ( <expr> ) | <var>

<op> ::= + | - | / | *<pre-op> ::= Sin | Cos | Exp |Log<var> ::= X | 1.0


Table 1. Symbolic regression tableau

Objective Find a function of one independent variable andone dependent variable, in symbolic formthat fits a given sample of 20 (xi� yi)data points, where the target function is thequartic polynomial X4 +X3 +X2 +X

Terminal Operands X (the independent variable), 1�0

Terminal Operators The binary operators +� ∗� /� and −The unary operators Sin, Cos, Exp and Log

Fitness cases A sample of 20 data points in theinterval [−1�+1]i.e., �−1�−�9�−�8�−�76�−�72�−�68,−�64�−�4�−�2� 0� �2� �4� �63� �72� �81,�90� �93� �96� �99� 1�

Raw Fitness The sum, taken over the 20 fitness cases,of the error

Standardised Fitness Same as raw fitness

Wrapper Standard productions to generateC functions

Parameters Population Size = 500,Generations = 20Prob. Mutation = 0�01, Prob. Crossover = 0�9Prob. Duplication = 0�01, Steady State

Santa Fe Trail

<code> ::= <line> | <code><line><line> ::= <if-statement> | <op><if-statement> ::= if(food_ahead()) {<line>}

else {<line>}<op> ::= left(); | right(); | move();

Table 2. Tableau for the Santa Fe Trail

Objective Find a computer program to control an artificialant so that it can find all 89 pieces of foodlocated on the Santa Fe Trail

Terminal Operators left(), right(), move(), food ahead()

Terminal Operands None

Fitness cases One fitness case

Raw Fitness Number of pieces of food before the ant times outwith 615 operations

Standardised Fitness Total number of pieces of food less the raw fitness

Wrapper None

Parameters Population Size = 500,Generations = 20Prob. Mutation = 0�01, Prob. Crossover = 0�9Prob. Duplication = 0�01, Steady State

72 o’neill et al.

2.2. Results

As can be seen in Figure 2, the cumulative frequencies of success clearly show thatstandard one- and two-point crossover are superior to the other operators on bothproblem domains. We will now describe the results for each operator under theother measures described in the previous section.Figure 3 shows the average fragment size being swapped at each generation with

homologous crossover. Data is presented for 20 separate runs and plotted in thismanner because we are interested in general trends over the set of runs as opposedto the precise details for individual runs in each of these graphs.Overall the fragment size increases as each generation passes, although the

lengths of the chromosomes also increase. As such, it is difficult to see what ishappening to the fragment size. A more useful measure of the ratio of averagefragment size to the average chromosome length can be seen in Figure 4.Similar graphs for same size homologous, two-point, same size two-point, and

one-point can be seen in Figures 16, 17, 18, and 19 respectively, in Appendix A.Figure 5 shows the ratio of individuals undergoing crossover that have been suc-

cessfully propagated to the next generation and the total number of crossover eventsthat have occurred over the 20 runs. The results of this measure for all other oper-ators can be seen in Figures 20, 21, 22, and 23 in Appendix B. For both homol-ogous operators there is no obvious trend to the data; indeed, the transmission

0

10

20

30

40

50

60

70

0 5 10 15 20

Cum

ulat

ive

Fre

quen

cy o

f Suc

cess

Generation

Grammatical Evolution on Santa Fe Trail

Homologous1 point2 point

UniformSame Size Homologous

Same Size 2 Point

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20

Cum

ulat

ive

Fre

quen

cy o

f Suc

cess

Generation

Grammatical Evolution on Symbolic Regression

Homologous1 point2 point

UniformSame Size Homologous

Same Size 2 point

Figure 2. Comparison of the cumulative frequencies of success for each crossover operator on the SantaFe ant trail problem (left), and on the symbolic regression problem (right).


Figure 3. Average fragment size being swapped each generation for homologous crossover shown foreach of the 20 runs.

of individuals as a result of homologous crossover to the next generation wouldappear to be erratic. Both of the two-point operators and the one-point crossovereach have clearer trends. The two-point operator does appear to be less success-ful on this measure. On the Santa Fe trail, propagation is more erratic than on thesymbolic regression problem. We propose this is as a result of the dependency onthe use of the wrapping operator that would have an effect of making crossovermore disruptive [11, 13].

2.3. Discussion

Examining Figures 6 and 7, we can see the average over 20 runs of the ratio ofindividuals undergoing crossover that are propagated to the next generation to thetotal number of crossover events for each generation, and the ratio of the averagecrossover fragment size to the average chromosome length, respectively. In terms ofthe ratio of individuals being propagated to the next generation having undergonecrossover, we can see that for one point crossover in the case of the symbolicregression problem the rate of transfer remains relatively constant throughout therun around the value 0.35.A similar trend can be seen for one point crossover in the case of the Santa Fe

trail problem although there is a slight deterioration as runs progress. Looking at the

74 o’neill et al.

Figure 4. For each of the 20 runs the ratio of the average fragment size being swapped to the averagechromosome length at each generation for homologous crossover is shown, notice the different minimumvalues.

ratio of individuals transfered to the total number of crossover events for individualruns (Figures 5, 20, 21, 22 and 23), we can see in all cases that in the first fewgenerations this value is extremely high. These results for homologous crossover arenot as consistent as those for one-point, and, directly compared, appear erratic. Theyshow that the effort required to carry out our version of homologous crossover, andthe occasional peak it achieves in terms of individual transfer to each generation,are outweighed by the consistent results produced by the much simpler one-pointoperator.In general though, we can see the utility of one-point crossover by virtue of the

fact that it produces individuals that are capable of being propagated to successivegenerations, given the steady state replacement strategy.Looking at the ratio of the average crossover fragment size to the average chro-

mosome length (Figure 7), in the case of one-point crossover we get a relativelysmooth line around the 0.5 mark for both problems. Similar trends are observedfor both types of two-point crossover; however, these are localised to lower val-ues. It can also be seen that the homologous operators are more erratic, althoughon symbolic regression they exhibit a consistently higher performance than all otheroperators. The experimental evidence shows us that crossover results in individ-uals that are being propagated to the next generation, and that the size ratio of


Figure 5. Ratio of the number of individuals undergoing homologous crossover that have been propa-gated to the next generation to the total number of crossover events occurring in that generation overthe 20 runs.

these fragments to the actual genome length is consistent throughout the runs witha value close to 50%. This is in contrast to the results obtained in [18] that showeda drop off for all crossover operators except, naturally, for the uniform operator.Their results showed how the operators examined start off as global search opera-tors but change rapidly to local search operators. In this case the one-point operatorconsistently exchanges a large proportion of an individual’s chromosome over thecourse of an entire run, thus acting in a more global fashion.In light of the data obtained, namely the transfer of individuals having undergone

crossover to the next generation, it is reasonable to suspect that within GE individ-uals there exist useful building blocks that are being recombined to produce betterperforming individuals. With respect to the homologous operator described in [8]for tree based GP, a lot of effort is required to carry out this operation, whereasin GE, we get a simple, efficient crossover operator with less effort, that exchangeshalf the material on average.

3. Headless chicken crossover

We now continue the analysis of crossover in GE by conducting a set of experimentswith the objective of testing the hypothesis that crossover is exchanging useful blocks

76 o’neill et al.

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 2 4 6 8 10 12 14 16 18 20

Rat

io o

f ind

ivid

uals

tran

sfer

ed to

nex

t gen

erat

ion

and

the

tota

l num

ber

of c

ross

over

eve

nts

Generation


1 point2 point

Same size 2 pointHomologous

Same size Homologous

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 2 4 6 8 10 12 14 16 18 20

Rat

io o

f ind

ivid

uals

tran

sfer

ed to

nex

t gen

erat

ion

and

the

tota

l num

ber

of c

ross

over

eve

nts

Generation


1 point2 point

Same size 2 pointHomologous

Same size Homologous

Figure 6. Ratio of the number of individuals undergoing crossover that have been propagated to thenext generation to the total number of crossover events occurring in that generation averaged over 20runs.

as suggested by the results found in the previous section. To this end we use twomain strategies, firstly by turning off crossover, and secondly, by exchanging randomblocks in a headless chicken-type crossover [1]. If we observe a decrease in perfor-mance in the absence of crossover this would suggest that this operator is providingsome useful search, and if the exchange of randomly generated blocks producesan inferior performance to our standard operator this would provide evidence tosupport the claim that useful building blocks are being exchanged.Two forms of headless chicken crossover were described for trees, with strong

headless chicken crossover (SHCC) being the most similar to the one adopted here.SHCC operated by creating a random tree for each of the parents selected forcrossover, followed by standard sub-tree crossover in GP. The modified parenttree (as opposed to the modified randomly generated tree) being returned. Theother form of headless chicken crossover, weak headless chicken crossover (WHCC),returned randomly either the modified parent or the modified random tree.


The headless chicken operator we adopt selects the fragments to crossover, andreplaces them with randomly generated bit strings of the same lengths.


0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 10 12 14 16 18 20

Rat

io o

f ave

rage

cro

ssov

er fr

agm

ent s

ize

to th

e av

erag

e ch

rom

osom

e le

ngth

Generation


1 point2 point2 point

HomologousSame size Homologous

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0 2 4 6 8 10 12 14 16 18 20

Rat

io o

f ave

rage

cro

ssov

er fr

agm

ent s

ize

to th

e av

erag

e ch

rom

osom

e le

ngth

Generation


1 point2 point2 point

HomologousSame size Homologous

Figure 7. Ratio of the average fragment size being swapped to the average chromosome length at eachgeneration averaged over 20 runs.

For each experiment, 50 runs were carried out on the Santa Fe ant trail and thesymbolic regression problem. Performance was ascertained by the cumulative fre-quency of success.

3.2. Results

Results for the experiments can be seen in Figure 8. These graphs clearly demon-strate the damaging effects of the headless chicken crossover, and in the case whencrossover is switched off. On the symbolic regression problem GE fails to find solu-tions in both of these cases, while on the Santa Fe ant trail problem, the system’ssuccess rate falls off dramatically.These results clearly demonstrate the power of GE’s one point crossover as an

operator that successfully exploits an exchange of useful blocks on the problemsexamined. It also demonstrates that the one point crossover operator is essential tothe effective operation of the system.Experiments were also conducted where the mutation operator was turned off in

the cases of both one-point and the homologous crossover operators. The results,see Figure 8, demonstrate that, when mutation is turned off in the presence ofone-point crossover, there is a decrease in performance when compared to thecase where mutation is present. These observations apply to both problem domains

78 o’neill et al.

Figure 8. A comparison of GE’s performance on the Santa Fe ant trail can be seen on the left. Thegraph clearly demonstrates the damaging effects of the headless chicken crossover, and in the case whencrossover is switched off. A comparison of GE’s performance on the symbolic regression problem canbe seen on the right. When the headless chicken crossover is used the system fails to find solutions, asis also the case when crossover is switched off.

examined, and, as such, suggest that mutation plays a beneficial role alongside thesuccessful one-point crossover operator.

4. Ripple crossover

The question arises then as to why GE’s one point crossover operator is so produc-tive. If we look at the effect the operator plays on a parse tree representation ofthe programs undergoing crossover, we begin to see more clearly the mechanism ofthis operator and its search properties.When mapping a string to an individual, GE always works with the left most non-

terminal. Thus, if one were to look at the individual’s corresponding parse tree, onewould see that the tree is constructed in a pre-order fashion. Furthermore, if theindividual is over-specified, that is, has codons left over, they form a tail, which iseffectively a stack of codons, as illustrated in Figure 9.If, during a crossover event, one tried to map the first half of the remaining

strings, the result, not surprisingly, would usually be an incomplete tree. However,the tree would not be incomplete in the same manner as one taken from the middleof a GP crossover event.


x yx y

X Y X Y

OnePoint Crossover Site

*

+ %

1

2

4

6 8

12 14

3

5

10 13

9

(a)

(b)

(d)

8 6 4 5 9 4 5 2 0 5 2 2

(c)

<E> ::= (+ <E> <E>) | ( <E> <E>) | (* <E> <E>) | (% <E> <E>) | X | Y

*

%+

<E>

<E> <E>

<E>11 <E><E> <E>7

Figure 9. The ripple effect of one-point crossover illustrated using an example GE individual repre-sented as a string of codon integer values (b) and its equivalent derivation (c) and parse trees (d). Thecodon integer values in (b) represent the rule number to be selected from the grammar outlined in (a),with the part shaded gray corresponding to the values used to produce the trees in (c) and (d), theremaining integers are an intron. Figure 10 shows the resulting spine with ripple sites and tails.

The pre-order nature of the mapping is such that the result is similar to thatof Figures 9 and 10. That is, the tree is left with a spine and several ripple sitesfrom which one or more sub-trees, dubbed ripple trees, are removed. This crossoverbehaviour, which is an inherent property of GE, was first described in [5] where itwas termed ripple crossover.Each of the ripple trees is effectively dismantled and returned to the stack of

codons in the individual’s tail. Crossover then involves individuals swapping tails sothat, when evaluating the offspring, the ripple sites on the spine will be filled usingcodons from the other parent.There is no guarantee that the tail from the other parent will be of the same

length, or even that it is used in a similar place on the other spine. This meansthat a codon that represented which choice to make could suddenly be expected tomake a choice from a completely different non-terminal, possibly with a differentnumber of choices. Fortunately, GE evaluates codons in context, that is, the exactmeaning of a codon is determined by those codons that immediately precede it.Thus, we can say that GE codons have intrinsic polymorphism, as they can be used

80 o’neill et al.

Ripple Sites

Tail (Material obtained from mate

Spine

*

+ ??

?2

4

Tail (Exchanged with mate)

4 5 9 4 5 2 0 5 2 2

(c)

(a) (b)

used to complete ripples sites in (a) )

4 4 3 5 4

3<E>

<E>1

Figure 10. Illustrated are the spine and the resulting ripple sites (a) and tails (b)(c) produced as aconsequence of the one-point crossover in Figure 9.

in any part of the grammar; furthermore, if the meaning of one codon changes, thechange cascades, or “ripples” through all the rest of the codons. This means that agroup of codons that coded a particular sub-tree on one spine can code an entirelydifferent sub-tree when employed by another spine, or can go back to what it meantin its original context. The power of intrinsic polymorphism can even reach betweenthe ripple trees, in that if one no longer needs all its codons, they are passed to thenext ripple tree and, conversely, if it requires more codons, it can obtain them fromits neighbouring ripple tree.The remainder of the paper will focus on ripple crossover, its interpretation and

utility in a tree based system, and its comparison to a standard sub-tree crossover.In order to apply ripple crossover to a tree based system we must specify a grammarfor parse trees.

4.1. Closed vs. context free grammars

The term Closed Grammar is used here to denote the type of grammar normallyemployed by GP practitioners. Although many GP users may be surprised to haveit claimed that they have actually being using grammars, rather than simple setsof functions and terminals, this is indeed the case. A function and terminal setimplicitly describes a grammar; one indicates the arity of the functions and, asevery function can take every terminal as well as the output of every function, it issimply a matter adhering to the arity demands of every function to produce legalprograms.The arity of the functions could easily be described by a CFG. Consider the GP

function and terminal set

F = �+� ∗�−�%�

T = �x� y�


Where each of the four functions has an arity of two. This can equivalently beexpressed in the Context Free Grammar (CFG):

<E> ::= x | y | (+ <E> <E>) | (* <E> <E>)| (- <E> <E>) | (% <E> <E>).

where <E> denotes the start symbol.This kind of CFG differs from the standard type only in that there is a single

non-terminal node. The use of a single non-terminal node implies that the grammarsatisfies the closure property, desirable for GP’s crossover operator. Notice that useof more than one non-terminal does not preclude a grammar from being closed. Inthis case, the question of closure can only be resolved by determining if it can berewritten as an equivalent grammar with a single non-terminal.If one were to construct a derivation tree for any expression made up from this

set, then clearly, any sub-tree from this grammar can replace a non-terminal, regard-less of its position in the derivation tree. Standard, untyped, GP exploits this fact,although it uses parse trees rather than derivation trees, an entirely reasonableapproach given that there is but a single non-terminal available to the grammar.Figure 11 shows a parse tree constructed from the above function and terminal

set, together with the equivalent derivation tree constructed from the context freegrammar also given above. Because this context free grammar uses a prefix notation,the terminals of the derivation tree in Figure 11 form a prefix representation of theparse tree. The prefix ordering can be used as a memory-efficient implementationof a parse tree [6].The connection between a prefix encoding of a parse tree and a string of rule

choices in a context free grammar such as maintained by GE is now obvious. Inthe prefix encoding, every element is a reference to either a function or terminal.In GE, as this is a closed grammar, every element in the string denotes a choice inthe set of rules that are associated with the same symbol. Thus, the prefix string:

* + x y % x y

has a one-to-one correspondence with the string of choices:

3 2 0 1 5 0 1

x y x y

%+

*

x yx y

1

2

3 4

5

6 7

1

2

3 4

5

6 7

*

%+

<E>

<E>

<E> <E>

<E>

<E> <E>

Figure 11. A parse tree and its derivation tree. Note the numbering of legal crossover points.

82 o’neill et al.

in the context of the grammar above. However, GE does not maintain a string ofchoices, but a string of integers, typically bounded above by a number much largerthan the maximum number of rules. The decoding from an integer to a choiceis usually carried out using the modulo rule. Because of this redundant encodingthere is a one to many mapping from a prefix encoding to the integer encodingused by GE.If one were to introduce more non-terminals into the grammar, sub-tree crossover

would have to be constrained to ensure that the result of crossover will be a legalderivation tree. This can be done by employing the type information present inthe derivation tree. However, when the number of types in the grammar grows,it can be expected that there will be a limited number of instances of each typein a tree. As sub-tree crossover is usually constrained to swap the same types, itmay very well prevent the efficient exploration of the space of possible trees. In GE,this is not an issue, as an integer is decoded into a rule at runtime, i.e., decoded inthe context of the symbol that is derived at the particular point in the derivation.This property of GE to change its form in the context of a different symbol werefer to as intrinsic polymorphism, Figure 12 gives an example of polymorphism ina context free grammar.


A number of experiments are conducted in order to compare the performance ofripple crossover and traditional sub-tree crossover with two different representationschemes, i.e., grammars. These experiments show that when using standard GPfunction and terminal sets, and the closure property they enjoy, ripple crossoverappears to be less likely to get trapped in a local optimum than sub-tree crossover.It is argued that the property of ripple crossover to transmit on average half

of the genetic material for each parent is the main cause of this. While sub-treecrossover exchanges less and less genetic material when the run progresses, ripplecrossover is equally recombinative regardless of the size of the individuals involved.Experiments were performed on two common benchmark problems; the simple

symbolic regression problem and the Santa Fe trail problem.

x y

<Var>

1 0 0 2 0 1 1 0 0 2 0 1 1 0 0 2 0 1

–

<Var> ::= x | y<Opr> ::= + | * | – | %<Exp> ::= <Var> | <Exp> <Opr> <Exp>

<Opr>

y +<Exp><Opr><Exp>

<Var> <Var>

<Exp>

Figure 12. Intrinsic polymorphism: the same string of numbers can decode to different choices, depend-ing on the symbol that they are being grafted onto.


To isolate the effects of crossover within these experiments, all experiments areperformed using only crossover, and employ the same initialization procedure. Allresults have been obtained on 100 independent runs.The runs using sub-tree crossover were performed using the derivation tree and

associated type information, ripple crossover was performed using a codon basedimplementation.The initialization procedure consists of a random walk through the grammar, i.e.,

making random choices at each choice point. Individuals are initialized by extract-ing either the constructed derivation tree (for sub-tree crossover) or the sequenceof choices (for ripple crossover). This is analogous to GP and GE, respectively.Because such a random walk has a strong tendency to produce short individualsmultiple times, a simple occurrence check is implemented that re-creates an indi-vidual when it is already present in the population.The sub-tree crossover used in the experiments was implemented in its purest

form: no bias was set to select terminals less frequently than non-terminals [7], or,in the case of crossover on context free grammars, no a-priori probabilities werespecified to select certain symbols more often than others [20].The ripple crossover used was also simple. If, during the decoding process the

generative string runs out of genetic material, the individual is killed (i.e., gets worstfitness). No attempt was made to initialize the tail of the individual, no wrappingwas used.For the symbolic regression problem, two grammars are used, the closed

grammar:

<E> ::= x | (+ <E> <E>) | (* <E> <E>)| (- <E> <E>) | (/ <E> <E>)

And the context free grammar:

<Exp> ::= <Var> | <Exp> <Op> <Exp><Var> ::= x<Op> ::= + | * | - | /

Note that the division operator is not protected, division by zero results in aruntime error and the individual will get the worst fitness available.1 Further detailsare provided in Table 3 and Table 4.The Santa Fe trail problem used the following closed grammar:

<E> ::= move() | left() | right()| iffoodahead(<E> <E>) | prog2(<E>, <E>)

And the context free grammar:

<Code> ::= <Line> | prog2(<Line>, <Code>)<Line> ::= <Condition> | <Action>

84 o’neill et al.

Table 3. Symbolic regression tableau

Objective See Table 1

Terminal Operands X (the independent variable)

Terminal Operators The binary operators +� ∗� /� and −Fitness cases 20 equally spaced data points in the interval [−1�+1]Raw Fitness Root Mean Squared Error

Standardised Fitness Same as raw fitness

Parameters Population Size = 500, Generations = 50Prob. Mutation = 0.0, Prob. Crossover = 1.0Prob. Duplication = 0.0, Steady State

<Action> ::= move() | right() | left()<Condition> ::= iffoodahead(<Code>, <Code>)

where the function prog2 executes the commands in sequence, iffoodahead checkswhether there is food in front of the artificial ant and executes either the first orthe second argument depending on the result. The move function moves the antforward and left and right rotate the ant 90 degrees in the specified direction.

4.3. Results

Figure 13 shows the success rates for the four different configurations on the sym-bolic regression problem. Although the setups employing ripple crossover bothobtain a 100% success rate, it can not be concluded that for this problem ripplecrossover performs significantly better than sub-tree crossover on the closed gram-mar. These three do however perform significantly better than sub-tree crossoveron the context free grammar.2 Failure rates of the ripple crossover on the closedgrammar were on average 12% at the end of the run (against 0% on the contextfree grammar). This did not seem to impede the performance.

Table 4. Tableau for the Santa Fe Trail

Objective See Table 2

Terminal Operators left(), right(), move(), iffoodahead()

Terminal Operands None

Fitness cases One fitness case

Raw Fitness Number of pieces of food found in 600 time steps

Standardised Fitness Total number of pieces of food less the raw fitness

Wrapper None

Parameters Population Size = 500, Generations = 50Prob. Mutation = 0.0, Prob. Crossover = 1.0Prob. Duplication = 0.0, Steady State


0 5 10 15 20 25 30 35 40 45 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Generations

Suc

cess

Rat

e

Subtree Ripple Subtree CFGRipple CFG

Figure 13. Success rates on the symbolic regression problem, averaged over 100 runs.

For the top three contenders the symbolic regression problem is easy to solve,so the question remains why the sub-tree crossover on the context free grammarperforms so poorly. It may be explained by the fact that the Var type in this grammarmakes up a large part of any tree. Unlike with the closed grammar, the sub-treecrossover on the context free grammar is constrained to swap like with like, thusalways swapping a variable with a variable. However, the fact that there is onlyone variable in the problem definition results with a large number of crossoversproducing identical trees. Although this can be circumvented by avoiding crossingover on the Var type, it does beg the question how much the user of such a systemmust know about the intricate relationship between the grammar, the derivationtrees and the genetic operators to be able to set up this system in order to get goodresults reliably.For the Santa Fe trail problem, the success rates are depicted in Figure 14. Here

the runs on the context free grammars perform significantly better than their coun-terparts on the closed grammar. The success rates between ripple crossover andsub-tree crossover on the context free grammar are however not significantly differ-ent. Similarly as before, the failure rate was on average 12% on the closed grammarand 0% on the context free grammar.It is important to note that while the sub-tree crossover seems to converge before

generation 20, the ripple crossover runs keep on improving.To investigate whether ripple crossover does indeed help the search to continue to

improve, a new set of 100 runs was executed, however this time for 200 generations.Figure 15 clearly depicts the capability of ripple crossover to keep on improving overtime. An extended run up to 500 generations (not depicted here) showed that ripple

86 o’neill et al.

5 10 15 20 25 30 35 40 45 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Generations

Suc

cess

Rat

e

Subtree Subtree CFGRipple Ripple CFG

Figure 14. Success rates on the Santa Fe trail problem, averaged over 100 runs.

20 40 60 80 100 120 140 160 180 2000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Generations

Suc

cess

Rat

e

Subtree CFGRipple CFG Subtree Ripple

Figure 15. Success rates on the Santa Fe trail problem, averaged over 100 runs, each running for 200generations.


crossover approaches a success rate of 70%, which is almost twice the success rateachieved by sub-tree crossover.

4.4. Discussion

The results bring up some important issues. While it is difficult to identify clearlywhich crossover method is best, each appears to have its own particular strength.As suggested by an initially steep curve, sub-tree crossover is particularly adept atobtaining solutions very early in a run. However, in all experiments, performancesoon plateaus, with only the occasional increase in performance. This finding is inkeeping with [17] in which it was suggested that GP performs a global search earlyon in a run, before gradually changing to a more local search as the run progressesand the population becomes characterised by large and often bloated individuals ofsimilar if not identical fitness.Ripple crossover, on the other hand, performs a more global search throughout

a run, and is far less likely to become trapped at a local minimum. Indeed, for thesymbolic regression problem, it never got trapped, while in the case of the SantaFe trail experiments, fitness kept improving. This is because, regardless of howlarge individuals get, on average half the genetic material is exchanged during eachcrossover. It is this disruptive behaviour of the crossover operator that drives thepopulation on to continually higher areas in the fitness landscape, but, ironically, itis also the cause of the relatively slow performance at the start of run.This suggests that the use of ripple crossover will permit longer runs, with less

chance of premature convergence, due to the property of exchanging on average50% of an individual. Such a property could be extremely valuable when one tacklesmore difficult problems that require more time to produce an optimal solution.

5. Conclusions

This paper began with an investigation into a new homologous crossover operatordesigned for GE, only to discover that it was no better than the standard one-pointcrossover originally adopted.Further analysis, using a headless chicken-type operator, and by running the sys-

tem with crossover switched off, revealed the power this one-point operator broughtto GE.An investigation into the mechanism of crossover in GE was conducted, and

standard GE crossover was found to have a ripple effect when analysed in terms ofderivation trees, in that, instead of the traditional single sub-tree type of crossovernormally associated with Genetic Programming type systems, a number of subtreesare removed, and replaced with genetic material from the tail of the correspondingparent.In the problems examined here, it is clear that populations which employ ripple

crossover exhibit a slower rate of increase in fitness at the start of a run relative

88 o’neill et al.

to sub-tree crossover. However, progress keeps improving steadily for more gener-ations, hence ends up at a higher level of fitness. This was taken to be an indicationthat the global nature of ripple crossover makes it less susceptible to get trapped ina local optimum.Clearly, in accordance with the NFL theorem [21] no sweeping generalisations

can be made, but it does appear that on these types of problems, ripple crossover hassome benefits. However, the value of linear chromosomes in general, and the GEsystem in particular, is clear from the results. Ripple crossover occurs effectivelyfor free in a linear system, because of the pre-order nature of tree construction,and results in, on average, 50% of the material being exchanged during a crossoverevent. Furthermore, the phenomenon of intrinsic polymorphism demonstrates theutility of context sensitive genes (groups of codons), that is, genes that can changetheir behaviour depending on the manner in which they are used. Rather elegantly,although the genes are polymorphic, they will always return to their initial state ifused in the same manner again.

Appendix

A. Ratio of average fragment size being swapped to chromosome length

Figure 16. Ratio of the average fragment size being swapped to the average chromosome length at eachgeneration for same size homologous crossover.


Figure 17. Ratio of the average fragment size being swapped to the average chromosome length at eachgeneration for two-point crossover.

Figure 18. Ratio of the average fragment size being swapped to the average chromosome length at eachgeneration for same size two-point crossover.

90 o’neill et al.

Figure 19. Ratio of the average fragment size being swapped to the average chromosome length at eachgeneration for one-point crossover.

B. Ratio of individuals propagated to next generation after undergoing crossoverto the total number of crossover events

Figure 20. Ratio of the number of individuals undergoing same size homologous crossover propagatedto the next generation to the total number of crossover events occurring in that generation.


Figure 21. Ratio of the number of individuals undergoing two-point crossover propagated to the nextgeneration to the total number of crossover events occurring in that generation.

Figure 22. Ratio of the number of individuals undergoing same size two-point crossover that have beenpropagated to the next generation to the total number of crossover events occurring in that generation.

92 o’neill et al.

Figure 23. Ratio of the number of individuals undergoing one-point crossover that have been propa-gated to the next generation to the total number of crossover events occurring in that generation.

Acknowledgments

We would like to thank Bill Langdon and Wolfgang Banzhaf for providing invaluablesuggestions on earlier versions of this work, and to Pat Cattolico for her helpfulcomments.

Notes

1. Strictly speaking the closure property is violated by not protecting the division operator, but on theother hand, in realistic applications, default return values in the case of an arithmetic error areusually less desirable than the occasional faulty individual.

2. Both the t-test and the re-sampling test indicated that the difference was highly significant (probabilityof a type 1 error was 0%).

References

1. P. J. Angeline, “Subtree crossover: Building block engine or macromutation?” in Genetic Program-ming 1997: Proceedings of the Second Annual Conference, J. R. Koza, K. Deb, M. Dorigo, D. B.Fogel, M. Garzon, H. Iba, and R. L. Riolo (eds.), Morgan Kaufmann: San Francisco, CA, 1997,pp. 9–17.

2. R. Collins, Studies in Artificial Life, Ph.D. thesis, University of California, Los Angeles, 1992.


3. F. D. Francone, M. Conrads, W. Banzhaf, and P. Nordin, “Homologous crossover in genetic pro-gramming,” in Proceedings of the Genetic and Evolutionary Computation Conference, W. Banzhaf,J. Daida, A. E. Eiben, M. H. Garzon, V. Honavar, M. Jakiela, and R. E. Smith (eds.), MorganKaufmann: San Francisco, CA, 1999, vol. 2, pp. 1021–1026.

4. D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Addison Wes-ley: Reading, MA, 1989.

5. M. Keijzer, C. Ryan, M. O’Neill, M. Cattolico, and V. Babovic, “Ripple crossover in genetic pro-gramming,” in Proceedings of EuroGP 2001, 2001.

6. M. J. Keith and M. C. Martin, “Genetic programming in C++: Implementation issues,” in Advancesin Genetic programming, K. E. Kinnear, Jr., (ed.), MIT Press: Cambridge, MA, 1994, chap. 13,pp. 285–310.

7. J. R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selec-tion, MIT Press: Cambridge, MA, 1992.

8. W. B. Langdon, “Size fair and homologous tree genetic programming crossovers,” in Proceedings ofthe Genetic and Evolutionary Computation Conference, W. Banzhaf, J. Daida, A. E. Eiben, M. H.Garzon, V. Honavar, M. Jakiela, and R. E. Smith (eds.), Orlando, Florida, USA, 13–17 July 1999.Morgan Kaufmann: San Francisco, CA, 1999, vol. 2, pp. 1092–1097.

9. W. B. Langdon, “Size fair and homologous tree genetic programming crossovers,” Genetic Program-ming and Evolvable Machines, vol. 1, no. 112, pp. 95–119, 2000.

10. B. Lewin, Genes VII, Oxford University Press, 2000.11. M. O’Neill and C. Ryan, “Under the hood of grammatical evolution,” in GECCO ’99: Proc. of the

Genetic & Evolutionary Computation Conference 1999, W. Banzhaf, J. Daida, A. E. Eiben, M. H.Garzon, V. Honavar, M. Jakiela, and R. E. Smith (eds.), Morgan Kaufmann: San Francisco, CA,1999, vol. 2, pp. 1143–1148.

12. P. Nordin, F. Francone, and W. Banzhaf, “Explicitly defined introns and destructive crossover ingenetic programming,” in Advances in Genetic Programming 2, P. J. Angeline and K. E. Kinnear,Jr. (eds.), MIT Press: Cambridge, MA, 1996, chap. 6, pp. 111–134.

13. M. O’Neill, Automatic Programming in an Arbitrary Language: Evolving Programs with Grammat-ical Evolution, PhD thesis, University of Limerick, 2001.

14. M. O’Neill and C. Ryan, “Crossover in grammatical evolution: A smooth operator?” in Genetic Pro-gramming, Proceedings of EuroGP’2000, vol. 1802 of Lecture Notes in Computer Science, R. Poli,W. Banzhaf, W. B. Langdon, J. F. Miller, P. Nordin, and T. C. Fogarty (eds.), Springer-Verlag: Berlin,2000, pp. 149–162.

15. M. O’Neill, and C. Ryan, “Grammatical evolution,” IEEE Transactions on Evolutionary Computa-tion, vol. 5, no. 4, 2001.

16. M. O’Neil, C. Ryan, M. Keijzer, and M. Cattolico, “Crossover in grammatical evolution: The searchcontinues, in Genetic Programming, proceedings of EuroGP’2001, vol. 2038 of Lecture Notes inComputer Science, J. Miller, M. Tomassini, P. Luca Lanzi, C. Ryan, A. G. B. Tettamanzi, and W. B.Kangdon (eds.), Springer-Verlag: Berlin, 2001, pp. 337–347.

17. R. Poli, “Is crossover a local search operator?” Position paper at the Workshop on EvolutionaryComputation with Variable Size Representation at ICGA-97, 1997.

18. R. Poli and W. B. Langdon, “On the search properties of different crossover operators in geneticprogramming,” in Genetic Programming 1998: Proceedings of the Third Annual Conference, J. R.Koza, W. Banzhaf, K. Chellapilla, K. Deb, M. Dorigo, D. B. Fogel, M. H. Garzon, D. E. Goldberg,H. Iba, and R. Riolo (eds.), Morgan Kaufmann: San Francisco, CA, 1998, pp. 293–301.

19. C. Ryan, J. J. Collins, and M. O’Neill, “Grammatical evolution: Evolving programs for an arbitrarylanguage,” in EuroGP’98: Proc. of the First European Workshop on Genetic Programming, vol. 1391of Lecture Notes on Computer Science, Springer-Verlag: Paris, 1998, pp. 83–95.

20. P. A. Whigham, Grammatical Bias for Evolutionary Learning, Ph.D. thesis, School of ComputerScience, University College, University of New South Wales, Australian Defence Force Academy,1996.

21. D. H. Wolpert and W. G. Macready, “No free lunch theorems for optimization,” IEEE Transactionson Evolutionary Computation, vol. 1, no. 1, 1997.

crossover in grammatical evolution

Documents