natural essay finished

Upload: niall-deasy

Post on 10-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Natural Essay Finished

    1/10

    Natural selection in Genetic Algorithms - Effect of

    Elite Size And Tournament Size On Fitness

    Niall Deasy

    School of Computer Science & Informatics

    University College Dublin

    Abstract

    Natural selection, as proposed by Charles Darwin, is a widely accepted mechanism in nature where the best

    survive and the weak eventually die out. In Grammatical evolution, selection mechanisms are based on this theory

    in order to come up with a good enough solution. Two of these mechanisms are Tournament selection and

    Elitism. Both of these selection procedures work in very different ways. However their influence can be

    determined by limiting the population they have access to, i.e. tournament size and elite size. What remains to be

    seen is if there exists a correlation between the elite size and the tournament size, which results in an optimal

    fitness value.

    1. Introduction

    In her book An Introduction to Genetic Algorithms, Melanie Mitchell describes the fitness, in the

    context of genetic algorithms, as the probability that a genetic algorithm will survive to reproduce [3].

    We do this by evaluating a subset of the population, whose size is determined by the tournament size,

    and choosing the algorithms whose fitness value is the best. When we talk about Elitism in Genetic

    Algorithms what we really mean is how many of the top best competing algorithms can make it

    through to the next generation. Tournament selection is similar to rank selection but is much more

    efficient [3]. Tournament selection works by choosing two organisms from a population and using a

    random operator to determine whether the fittest or less fitter algorithm should be chosen as a parent.The two individuals are then returned to the population where they are eligible for selection again [3].

    Both tournament selection and elitism selection are important mechanisms for increasing the

    performance of the fitness of a GA. We can define how much these selection mechanisms are used by

    defining how much of the population they have an effect on, i.e. elite size and tournament size.

    One grammatical evolution tool which makes effect use of these selection procedures is

    GEVA. GEVA was developed at UCD's Natural Computing Research and Applications (NCRA) as an

    open source grammatical evolution platform [2]. It offers a graphical interface, which provides a

    configuration, options as well text and graphed results. GEVA can also be run from the command line,

    which takes in parameters as arguments, which allows for scripting.

    There are essentially 4 main components, the Backus Naur Form (BNF) grammar, the

    genotype search engine, the GE mapper and the fitness function evaluator that drive GEVA. GEVAmakes use of a BNF, which represents a method of expressing the grammar of a language in a logical

    form as a set of production rules [2]. This grammar is made up of 4 main components, a start symbol S,

    non-terminals N, a set of terminals T and finally, a set of production rules P. For example the following

    grammar can describe any simple Boolean expressions:

    N = { , , , }T = { and, or, xor, nand, not,true, false, (, ) }S = { }

    AndPcan be represented as:

    (A) ::= ( )| | (B) ::= and| or| xor| nand(C) ::= not

  • 8/8/2019 Natural Essay Finished

    2/10

    (D) ::= true| false

    At each production rule we have a set of choices, for example production rule D gives us two

    choices true or false, where or is represented by |. GEVA works by mapping what are known as

    codons through this grammar by using the following mapping function,

    Rule = c mod r

    Where c is the codon (integer value) and ris the number of rule choices for the current non-

    terminal symbol [2]. A codon is a piece of the DNA, where DNA is a list of integers, which determines

    which tuple to choose at a give state. For example lets take the current node to be a boolop and the

    codon being read gives the integer value of 6,

    ::= and (0)| or (1)| xor (2)| nand (3)

    6mod4 = 2

    We determine which of the 4 choices to pick from boolop by using the mapping rule, 4 choices and

    codon gives us the value of 6. Therefore is mapped to xor, because the rule gives 2. This

    process of mapping process is whats known as a genotype to phenotype mapping and it continues untila terminal node is reached. This will eventually result in a executable program whos fitness is then

    tested by the fitness function. The selection processes then take place before another generation is

    executed. This potentially infinite cycle, given that the number of generations is infinite, can be seen in

    figure 1.

    In GEVA fitness of an algorithm is determined through using pieces of food, which the

    genetic algorithms need to collect. The amount of food pieces is usually 89. If a genetic algorithm

    manages to find and eat 70 of those pieces, its fitness will be 19 (89-70). Therefore the best fitness will

    be 0 (89-89). Some genetic algorithm mechanisms deal with the fitness function differently for

    example many particle swarm algorithms use a fitness range between 0 and 1.

    What I propose in this project is to test if there exists a correlation between the size of thecompetition (tournament size), and the number of top competing algorithms that can go to the next

    round (elite size).

    Figure 1

  • 8/8/2019 Natural Essay Finished

    3/10

    2. Experiment Set-Up

    Population size = 100 Generations = 100 NCRA and SantaFeAntTrail Initialization: Operator.Operations.TournamentSelect Crossover Probability = - 0.9 Crossover-Point = variable Mutation Probability = 0.01 Replacement type = generational Grow Probability = 0.5 Max Wraps = 3 Max Depth = 6 Class path = bin/GUI.jar Main class = Main.run Userpick size = 20 Fixed point crossover = false Initial chromosome size = 200

    I decided to set my elite size and tournament size values based on the population size, since

    they are closely related. I decided to test the elite size and tournament size values as a percentage of the

    population, for example where the elite size is set to 50, this represents an elite size that is 50 percent

    the size of the population size. I also decided upon testing the values as pairs to determine if there

    exists a correlation between a specific ratio of elite size to tournament size on the fitness result.

    I hope to find the best combination between elite size and tournament size by exploring all of

    the possible combinations for this range. I will run each combination three times and then calculate the

    average per pair. For example, taking the combination to be elite 1 percent, tournament 100 percent,

    this represents a test where the value of elite will be 1 percent of the population size and the value of

    the tournament size will be the population size. I will run each combination 30 times and then calculate

    the average fitness over those 30 combinations.

    As mentioned before, GEVA allows for both graphical user interface input as well as

    command line input. I decided that it would be a good idea to take advantage of the command line

    interface since I plan on running several hundred GEVA runs to gain an accurate results data set. The

    script I designed ran 30 GEVA runs per configuration and averaged the total of those runs. I ran into

    problems with running GEVA through the command line and it should be noted for replication issues

    that the method for running GEVA through the command line in the documentation provided by

    GEVA forgets to mention the correct class path and main class arguments which are required to run

    GEVA. I found that GEVA needed to have its mainclass set to Main.Run and its classpath set to

    bin/GUI.jar. An example command is below:

    java -jar GEVA.jar -mainclass Main.Run -classpath bin/GUI.jar

    3. Results

    I decided against using three-dimensional graphs simply for the reason that I myself find them difficult

    to interpret. I laid out all of my graphs in a unified structure. On the Y-axis I have the Average fitnessvalue determined over 30 runs. On the X-axis I have both the Elite size and the tournament size,

    represents as pairs and their values represent a percentage of the population as described before.

  • 8/8/2019 Natural Essay Finished

    4/10

    Experiment 1

    The first experiment I set out upon was to determine whether a correlation exists between elite

    size and fitness, tournament size and fitness or a combination of the two. A pattern is clearly visible

    from the graph (fig 3.1.A), where the fitness value drastically increases when the elite size is the same

    as the population size. The best performing algorithm is indicated by the lowest fitness value, ie the

    closest value to zero. This correlation between elite size and fitness can be seen infigure 3.1.A where

    the green line representing elite size is at its highest, the fitness value for that tournament size range is

    also at its highest. It appears that a correlation between elite size and fitness exists in that, when the

    elite size is set to over 75 percent the population size, the fitness can be seen to dramatically increase. I

    defined this correlation through the following function:

    ( 75 < Elite Size highest fitness => Worst Performance

    The next step was to plot the average fitness per elite size value, to determine which elite size

    gave the best fitness over any tournament size. I found a surprising pattern where the fitness seems to

    oscillate every time the elite size is increased by 25 percent. This oscillation meats its peak when the

    elite size is the population size. From figure 3.1.B I can conclude that the best elite values are 25

    percent of the population size closely followed by 75 percent of the population size.

    I now needed to determine if the tournament size has any substantial effect on the fitness outcome. This

    can be seen in figure 3.1.C, where I grouped the tournament size in increasing order and calculated the

    average fitness per group. The result was very linear with no apparent indication of a correlation

    between the tournament size and the fitness size.

    Figure 3.1.A Graph displaying the correlation between elite size, tournament size and fitness. The

    fitness function used was SantaFeAntTrail and the grammar used was also SantaFeAntTrail. On the X-

    axis we have an elite size and tournament size pair in that order, where the both are represented as a

    percentage of the population size. On the Y-axis we have the average fitness obtained over 30 runs.

    1:11:25

    1:501:75

    1:10025:1

    25:2525:50

    25:7525:100

    50:150:25

    50:5050:75

    50:10075:1

    75:2575:50

    75:7575:100

    100:1100:25

    100:50100:75

    100:100

    0

    20

    40

    60

    80

    100

    120

    34.7

    19.93

    27.224.13

    67.36

    33.8328.43

    31.8 32.73

    69.66

    30.3330.6334.3333.06

    67.96

    31.6631.6334.46

    29.8

    71

    30.6330.4633.86

    30

    69.3

    Average Fitness: SantaFeAntTrail FitnessElite Size

    Tournament Size

    Fitness

    Tournament-size % : Elite-size %As Percent of population

  • 8/8/2019 Natural Essay Finished

    5/10

    Figure 3.1.B Graph displaying the correlation between elite size and average fitness. The fitness

    function used was SantaFeAntTrail and the grammar used was also SantaFeAntTrail. On the X-axis we

    have elite sizes grouped in ascending order, which are valued as a percentage of the population size.

    On the Y-axis we have the average fitness obtained over 30 runs. The red line indicates the average

    fitness per elite size group. Each elite size group is made up of a elite size and all of the corresponding

    tournament sizes, e.g. group 1: (1:1, 1:25, 1:50, 1:75, 1:100).

    Figure 3.1.CGraph displaying the correlation between tournament size and average fitness. The

    fitness function used was SantaFeAntTrail and the grammar used was also SantaFeAntTrail. On the X-

    axis we have tournament sizes grouped in ascending order, which are valued as a percentage of the

    population size. On the Y-axis we have the average fitness obtained over 30 runs. The red line

    indicates the average fitness per tournament size group. Each tournament size group is made up of a

    tournament size and all of the corresponding elite sizes, e.g. group 1: (1:1, 1:25, 1:50, 1:75, 1:100)where tournament size is to elite size (t:e).

    1:11:25

    1:501:75

    1:10025:1

    25:2525:50

    25:7525:100

    50:150:25

    50:5050:75

    50:10075:1

    75:2575:50

    75:7575:100

    100:1100:25

    100:50100:75

    100:100

    0

    10

    20

    30

    40

    50

    60

    70

    80

    34.7

    19.93

    27.224.13

    67.36

    33.83

    28.4331.8 32.73

    69.66

    30.33 30.6334.33 33.06

    67.96

    31.66 31.6334.46

    29.8

    71

    30.63 30.4633.86

    30

    69.3

    Tournament - Effect on Fitness

    SantaFeAntTrail

    Fitness

    Average

    Tournament size (Percent of Population size)

    Fitne

    ss(Avg

    30

    runs)

    1:11:25

    1:501:75

    1:10025:1

    25:2525:50

    25:7525:100

    50:150:25

    50:5050:75

    50:10075:1

    75:2575:50

    75:7575:100

    100:1100:25

    100:50100:75

    100:100

    0

    10

    20

    30

    40

    50

    60

    70

    80

    34.7 33.83

    30.33 31.66 30.63

    19.93

    28.4330.63 31.63 30.46

    27.2

    31.834.33

    43.46

    33.86

    24.13

    32.73 33.06

    29.8 30

    67.3669.66

    67.9671

    69.3

    Elite Size - Effect on Fitness

    SantaFeAntTrail

    Fitness

    Average

    Elite Size (Percent of population)

    Fitness(Avg

    30

    runs)

  • 8/8/2019 Natural Essay Finished

    6/10

    Experiment 2

    In my second experiment I set out to discover if the 75 to 100 percent range for elite size could be

    narrowed down. I.e. to discover a value for elite size, where the elite size is between 75 to 100 percent

    of the population size, which results in the highest rate of change of fitness. So I adapted my runs to

    work within this range with elite size ranging from 75 to 100 percent of the population size in

    increments of 5 percent and the tournament size remaining constant at one percent of the population

    We can see fromfigure 3.2.A that the rise in fitness only occurs when the elite size is greater

    than or equal to 95 percent of the population size. We can now adjust our relation between Elite Size

    and Fitness:

    (94 Worst Performance

    Figure 3.2.A Graph displaying the correlation between elite size, within the 75 to 100 range, and

    fitness. The fitness function used was SantaFeAntTrail and the grammar used was also

    SantaFeAntTrail. On the X-axis we have an elite size and tournament size pair in that order, where theboth are represented as a percentage of the population size. On the Y-axis we have the average fitness

    obtained over 30 runs.

    Experiment 3

    It is now clear where the best and worst fitness values are achieved based on elite size for the

    SantaFeAntTrail scenario. I now aim to investigate whether there exists the same effect across other

    genetic algorithms. As time is short I will only have time to test the theory against two Scenarios where

    I will only change the fitness function and the grammar used. For these two configurations I picked the

    RoyalTree fitness and grammar, as well as the Sudoku fitness and grammar. I kept the same

    configurations as described in experiment one apart from these two variables.

    75:1 80:1 85:1 90:1 95:1 100:1

    0

    10

    20

    30

    40

    50

    60

    70

    80

    23.224.26 25.83

    29.8626.06

    69.9

    SantaFeAntTrail

    Elite 75 to 100 rangeFitness

    Fitness

    Average(30runs)

    Tournament-size % : Elite-size %As Percent of population

  • 8/8/2019 Natural Essay Finished

    7/10

    The RoyalTree fitness test shows that the highest values for fitness are all achieved where the

    elite size was over 75 percent of the population size. The results are a lot more varied than that of the

    SantaFeAntTrail however the elite size still has a substantial effect on fitness where the elite size is set

    to above 75 percent of the population size. I decided to check plot the average elite values against the

    fitness again (figure 3.3.B). The oscillation that I found in the first test does not appear in this scenario.

    Instead here we notice that the best fitness is to be found when the elite size is set to 25 percent of the

    population size and rises in regular steps from there. The highest rise in fitness is again seen when theelite size is set to over 100 percent the population size.

    Since the same outcome was seen in experiment one I will assume that the elite size has the

    same effect when it is over 95 percent of the population size. We can now determine that only two

    correlations, between the elite size and fitness size, hold true across both fitness functions and

    grammars.

    1. The best fitness is seen when the elite size is set to 25 percent of the population size2. The worst fitness is seen when the elite size is set to over 95 percent of the population size.

    The Sudoku scenario revealed a more dramatic correlation between the elite size and thefitness values (figure 3.3.C, f igure 3.3.D ). We can clearly see that where the elite size is set to the

    population size, the fitness value can be seen to be over double the average. As seen in the previous

    scenarios, the tournament size does not appear to have any substantial effect on the outcome of the

    fitness value. The lowest average fitness value per elite size is again seen where the elite size is set to

    25 percent of the population size. Although in this scenario it is closely followed by elite size at 50

    percent of the population size, by .03 in difference of fitness.

    Figure 3.3.A Graph displaying the correlation between elite size, tournament size and fitness. The

    fitness function used was RoyalTree and the grammar used was also RoyalTree. On the X-axis we have

    an elite size and tournament size pair in that order, where the both are represented as a percentage of

    the population size. On the Y-axis we have the average fitness obtained over 30 runs.

    1:11:25

    1:501:75

    1:10025:1

    25:2525:50

    25:7525:100

    50:150:25

    50:5050:75

    50:10075:1

    75:2575:50

    75:7575:100

    100:1100:25

    100:50100:75

    100:100

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    3.15

    1.381.38

    1.81

    2.79

    2.01

    1.42

    1.87 1.89

    3.32

    1.62 1.65

    2.151.87

    2.65

    1.79

    0.99

    1.43

    2.26

    2.79

    2.08

    1.27 1.37 1.29

    4.02

    RoyalTree Fitness Test

    Fitness

    Fitness

    -(over30

    runs)

  • 8/8/2019 Natural Essay Finished

    8/10

    Figure 3.3.B Graph displaying the correlation between elite size and average fitness. The fitness

    function used was RoyalTree and the grammar used was also RoyalTree. On the X-axis we have elite

    sizes grouped in ascending order, which are valued as a percentage of the population size. On the Y-

    axis we have the average fitness obtained over 30 runs. The red line indicates the average fitness per

    elite size group. Each elite size group is made up of a elite size and all of the corresponding tournament

    sizes, e.g. group 1: (1:1, 1:25, 1:50, 1:75, 1:100), where elite size is to tournament size (e:t).

    Figure 3.3.CGraph displaying the correlation between elite size, tournament size and fitness. The

    fitness function used was Sudoku and the grammar used was also Sudoku. On the X-axis we have an

    elite size and tournament size pair in that order, where the both are represented as a percentage of the

    population size. On the Y-axis we have the average fitness obtained over 30 runs.

    1:11:25

    1:501:75

    1:10025:1

    25:2525:50

    25:7525:100

    50:150:25

    50:5050:75

    50:10075:1

    75:2575:50

    75:7575:100

    100:1100:25

    100:50100:75

    100:100

    0

    10

    20

    30

    40

    50

    60

    70

    43.77

    32.47 31.4 31.3

    64.37

    22.7 22.67

    26.9323.3

    64.5

    23.9725.57

    21.93

    26.27

    63.93

    24.9323.63 23.7 25.37

    63.83

    24 22.93 23.324.87

    64.4

    Average Fitness: Sudoku Fitness

    Tournament-size % : Elite-size %

    As Percent of population size

    FitnessAvg

    over30

    runs

    1:11:25

    1:501:75

    1:10025:1

    25:2525:50

    25:7525:100

    50:150:25

    50:5050:75

    50:10075:1

    75:2575:50

    75:7575:100

    100:1100:25

    100:50100:75

    100:100

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    Elite Size - Effect On Fitness

    RoyalTree

    Fitness

    Average

    Elite size (% of population)

    Fitness

  • 8/8/2019 Natural Essay Finished

    9/10

    Figure 3.3.D Graph displaying the correlation between elite size and average fitness. The fitness

    function used was Sudoku and the grammar used was also Sudoku. On the X-axis we have elite sizes

    grouped in ascending order, which are valued as a percentage of the population size. On the Y-axis we

    have the average fitness obtained over 30 runs. The red line indicates the average fitness per elite size

    group. Each elite size group is made up of a elite size and all of the corresponding tournament sizes,

    e.g. group 1: (1:1, 1:25, 1:50, 1:75, 1:100), where elite size is to tournament size (e:t).

    4. Conclusions & Future Work

    I started this paper as an attempt to find a correlation between Tournament Size Elite Size and fitness. I

    ran a total of 2430 GEVA simulations and from this extensive amount of results data I did find some

    patterns. What I found was that Tournament Size has no noticeable effect on the performance of the

    fitness functions, as far as the above fitness functions are concerned. I did however find a very

    noticeable correlation between the elite size and fitness performance. I found that when the elite size is

    set between 95 and 100 percent of the population size, the fitness performance is dramatically

    decreased. In the SantaFeAntTrail, Royal Tree and the Sudoku configurations, the worst fitness values

    are found where the elite size is over 94 percent of the population size. The fitness in these cases, are

    also at least double the size of the average fitness values when compared to the other combinations. I

    also discovered that when the elite size is set to 25 percent of the population size, the fitness is most

    likely to perform best. The performance can also be seen to be worse than the average where,tournament size is 1 percent of the population size and elite size is 1 percent of the population size.

    Worst Fitness Performanceo Where elite size is 1 percent of the population size and the tournament size is 1

    percent of the population size.

    o Where elite size is greater than 94 percent of the population for all tournament sizes. Best Fitness Performance

    o Elite size is 25 percent of the population size regardless of the tournament size.

    I propose in future research on this subject that to test this 95-100 elite size range on other

    fitness functions to determine whether it holds across those also. I would also propose that the 95-100

    Elite size effect on fitness be tested on other Genetic Evolution simulations other that GEVA to ensure

    1 1 1 1 1 25 25 25 25 25 50 50 50 50 50 75 75 75 75 75 100 100 100 100 100

    0

    10

    20

    30

    40

    50

    60

    70

    43.77

    22.723.97 24.93 24

    32.47

    25.5723.63 22.93 22.67

    31.4

    26.93

    21.9323.7 23.3

    31.3

    23.3

    26.27 25.37 24.87

    64.37 64.5 63.93 63.83 64.4

    Elite size - effect on fitness

    Sudoku

    Fitness

    Average

    Elite Size (Percentage of population)

    Fitness(Average

    over30runs

    )

  • 8/8/2019 Natural Essay Finished

    10/10

    that this effect is not unique to GEVA. If it is found that this effect holds across the board I would

    suggest that the 95-100 percent range be further examined to determine if a more precise range can be

    found. I would also suggest that this range be tested against as many different configurations of

    population size, generations, mutation probability etc, to determine if this 95-100 percent elite size

    range is in fact a global effect on genetic evolution.

    References

    [1] Brabazon A., ONeill M. (2006). Biologically Inspired Algorithms for Financial Modelling.Springer.

    [2] Brabazon A., ONeill M. GEVA - Grammatical Evolution in Java

    [3] Melanie Mitchell An Introduction To Genetic Algorithms