genetic programming

24
Genetic Programming Chapter 6

Upload: joella

Post on 17-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Genetic Programming. Chapter 6. G P quick overview. Developed: USA in the 1990’s Early names: J. Koza Typically applied to: machine learning tasks (prediction, classification…) Attributed features: competes with neural nets and alike needs huge populations (thousands) slow Special: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Genetic  Programming

Genetic Programming

Chapter 6

Page 2: Genetic  Programming

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Programming

GP quick overview

Developed: USA in the 1990’s Early names: J. Koza Typically applied to:

– machine learning tasks (prediction, classification…) Attributed features:

– competes with neural nets and alike– needs huge populations (thousands)– slow

Special:– non-linear chromosomes: trees, graphs– mutation possible but not necessary (disputed!)

Page 3: Genetic  Programming

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Programming

GP technical summary tableau

Representation Tree structures

Recombination Exchange of subtrees

Mutation Random change in trees

Parent selection Fitness proportional

Survivor selection Generational replacement

Page 4: Genetic  Programming

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Programming

Introductory example: credit scoring

Bank wants to distinguish good from bad loan applicants

Model needed that matches historical data

ID No of children

Salary Marital status

OK?

ID-1 2 45000 Married 0

ID-2 0 30000 Single 1

ID-3 1 40000 Divorced 1

Page 5: Genetic  Programming

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Programming

Introductory example: credit scoring

A possible model: IF (NOC = 2) AND (S > 80000) THEN good ELSE bad

In general: IF formula THEN good ELSE bad

Only unknown is the right formula, hence Our search space (phenotypes) is the set of formulas Natural fitness of a formula: percentage of well classified

cases of the model it stands for Natural representation of formulas (genotypes) is: parse

trees

Page 6: Genetic  Programming

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Programming

Introductory example: credit scoring

IF (NOC = 2) AND (S > 80000) THEN good ELSE bad

can be represented by the following tree

AND

S2NOC 80000

>=

Page 7: Genetic  Programming

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Programming

Tree based representation

Trees are a universal form, e.g. consider Arithmetic formula

Logical formula

Program

15)3(2

yx

(x true) (( x y ) (z (x y)))

i =1;while (i < 20){

i = i +1}

Page 8: Genetic  Programming

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Programming

Tree based representation

15)3(2

yx

Page 9: Genetic  Programming

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Programming

Tree based representation

(x true) (( x y ) (z (x y)))

Page 10: Genetic  Programming

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Programming

Tree based representation

i =1;while (i < 20){

i = i +1}

Page 11: Genetic  Programming

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Programming

Tree based representation

In GA, ES, EP chromosomes are linear structures (bit strings, integer string, real-valued vectors, permutations)

Tree shaped chromosomes are non-linear structures

In GA, ES, EP the size of the chromosomes is fixed

Trees in GP may vary in depth and width

Page 12: Genetic  Programming

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Programming

Tree based representation

Symbolic expressions can be defined by – Terminal set T– Function set F (with the arities of function symbols)

Adopting the following general recursive definition:1. Every t T is a correct expression

2. f(e1, …, en) is a correct expression if f F, arity(f)=n and e1, …, en are correct expressions

3. There are no other forms of correct expressions In general, expressions in GP are not typed (closure

property: any f F can take any g F as argument)

Page 13: Genetic  Programming

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Programming

Offspring creation scheme

Compare GA scheme using crossover AND mutation

sequentially (be it probabilistically) GP scheme using crossover OR mutation

(chosen probabilistically)

Page 14: Genetic  Programming

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Programming

GP flowchartGA flowchart

Page 15: Genetic  Programming

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Programming

Mutation

Most common mutation: replace randomly chosen subtree by randomly generated tree

Page 16: Genetic  Programming

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Programming

Mutation cont’d

Mutation has two parameters:– Probability pm to choose mutation vs. recombination

– Probability to chose an internal point as the root of the subtree to be replaced

Remarkably pm is advised to be 0 (Koza’92) or very small, like 0.05 (Banzhaf et al. ’98)

The size of the child can exceed the size of the parent

Page 17: Genetic  Programming

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Programming

Recombination

Most common recombination: exchange two randomly chosen subtrees among the parents

Recombination has two parameters:– Probability pc to choose recombination vs. mutation

– Probability to chose an internal point within each parent as crossover point

The size of offspring can exceed that of the parents

Page 18: Genetic  Programming

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Programming

Child 2

Parent 1 Parent 2

Child 1

Page 19: Genetic  Programming

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Programming

Selection

Parent selection typically fitness proportionate Over-selection in very large populations

– rank population by fitness and divide it into two groups: – group 1: best x% of population, group 2 other (100-x)%– 80% of selection operations chooses from group 1, 20% from group 2– for pop. size = 1000, 2000, 4000, 8000 x = 32%, 16%, 8%, 4%– motivation: to increase efficiency, %’s come from rule of thumb

Survivor selection: – Typical: generational scheme (thus none)– Recently steady-state is becoming popular for its elitism

Page 20: Genetic  Programming

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Programming

Initialisation

Maximum initial depth of trees Dmax is set

Full method (each branch has depth = Dmax):– nodes at depth d < Dmax randomly chosen from function set F

– nodes at depth d = Dmax randomly chosen from terminal set T

Grow method (each branch has depth Dmax):– nodes at depth d < Dmax randomly chosen from F T

– nodes at depth d = Dmax randomly chosen from T

Common GP initialisation: ramped half-and-half, where grow & full method each deliver half of initial population

Page 21: Genetic  Programming

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Programming

Bloat

Bloat = “survival of the fattest”, i.e., the tree sizes in the population are increasing over time

Ongoing research and debate about the reasons

Needs countermeasures, e.g.– Prohibiting variation operators that would deliver

“too big” children– Parsimony pressure: penalty for being oversized

Page 22: Genetic  Programming

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Programming

Problems involving “physical” environments

Trees for data fitting vs. trees (programs) that are “really” executable

Execution can change the environment the calculation of fitness

Example: robot controller Fitness calculations mostly by simulation, ranging from

expensive to extremely expensive (in time) But evolved controllers are often to very good

Page 23: Genetic  Programming

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Programming

Example application: symbolic regression

Given some points in R2, (x1, y1), … , (xn, yn)

Find function f(x) s.t. i = 1, …, n : f(xi) = yi Possible GP solution:

– Representation by F = {+, -, /, sin, cos}, T = R {x}– Fitness is the error– All operators standard– pop.size = 1000, ramped half-half initialisation– Termination: n “hits” or 50000 fitness evaluations reached

(where “hit” is if | f(xi) – yi | < 0.0001)

2

1

))(()( i

n

ii yxfferr

Page 24: Genetic  Programming

A.E. Eiben and J.E. Smith, Introduction to Evolutionary ComputingGenetic Programming

Discussion

Is GP:

The art of evolving computer programs ?

Means to automated programming of computers?

GA with another representation?