1 competent program evolution dissertation defense moshe looks december 11 th, 2006

36
1 Competent Program Evolution Dissertation Defense Moshe Looks December 11 th , 2006

Post on 19-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

1

Competent Program Evolution

Dissertation DefenseMoshe Looks

December 11th 2006

2

Synopsis

Competent optimization requires adaptive decomposition

This is problematic in program spaces

Thesis we can do it by exploiting semantics

Results it works

3

General Optimization

Find a solution s in S Maximizeminimize f(s)

fS

To solve this faster than O(|S|) make assumptions about f

4

Near-Decomposability

Complete separability would be nicehellip

Near-decomposability (Simon 1969) is more realistic

Stronger Interactions

Weaker Interactions

5

Exploiting Separability Separability = independence assumptions

Given a prior over the solution space represented as a probability vector

1 Sample solutions from the model 2 Update model toward higher-scoring points3 Iterate

Works well when interactions are weak

6

Exploiting Near-Decomposability Bayesian optimization algorithm (BOA)

represent problem decomposition as a Bayesian Network learned greedily via a network scoring metric

Hierarchical BOA uses Bayesian networks with local structure

allows smaller model-building steps leads to more accurate models

restricted tournament replacement promotes diversity

Solves the linkage problem Competence solving hard problems quickly

accurately and reliably

7

Program Learning

Solutions encode executable programs execution maps programs to behaviors

execPB find a program p in P maximizeminimize f(exec(p))

fB

To be useful make assumptions about exec P and B

8

Properties of Program Spaces Open-endedness

Over-representation many programs map to the same behavior

Compositional hierarchy intrinsically organized into subprograms

Chaotic Execution similar programs may have very different behaviors

9

Properties of Program Spaces Simplicity prior

simpler programs are more likely

Simplicity preference smaller programs are preferable

Behavioral decomposability fB is separable nearly decomposable

White box execution execution function is known and constant

10

Thesis

Program spaces not directly decomposable

Leverage properties of program spaces as inductive bias

Leading to competent program evolution

11

Representation-Building

Organize programs in terms of commonalities Ignore semantically meaningless variation Explore plausible variations

12

Representation-Building

Common regions must be aligned Redundancy must be identified Create knobs for plausible variations

13

Representation-Building

What abouthellip changing the phase averaging two input instead of picking one hellip

behavior (semantic) space program (syntactic) space

14

Statics amp Dynamics

Representations span a limited subspace of programs

Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)

Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme

based on dominance relationships

15

Meta-Optimizing Semantic Evolutionary Search (MOSES)

1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space

2 Select a deme and run hBOA on it

3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)

4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme

5 Repeat from step 2

16

Artificial Ant

Eat all food pellets within 600 steps Existing evolutionary methods not

significantly than random Space contains many regularities

To apply MOSES three reductions rules for normal form

eg left left left rarr right separate knobs for rotation

movement amp conditionals no neighborhood reduction

needed

rarr

17

Artificial Ant

How does MOSES do it

Searches a greatly reduced space

Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions

appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo

hBOA modeling learns linkage between rotation knobs

Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000

Technique Effort

Evolutionary Programming

136000 x

Genetic Programming

450000 x

MOSES 23000

18

Elegant Normal Form (Holman rsquo90)

Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98

19

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5

Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)

The same computation on the same formulae reduced to ENF

20

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

Random Formulae Reduced to ENF

21

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10

Enumerate all neighbors (edit distances lt2) compute behavioral distance from source

Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size

22

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

Random formulae Reduced to ENF

23

Hierarchical Parity-Multiplexer

Study decomposition in a Boolean domain

Multiplexer function of arity k1 computed from k1 parity function of arity k2

total arity is k1k2

Hypothesis parity subfunctions will exhibit tighter linkages

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

2

Synopsis

Competent optimization requires adaptive decomposition

This is problematic in program spaces

Thesis we can do it by exploiting semantics

Results it works

3

General Optimization

Find a solution s in S Maximizeminimize f(s)

fS

To solve this faster than O(|S|) make assumptions about f

4

Near-Decomposability

Complete separability would be nicehellip

Near-decomposability (Simon 1969) is more realistic

Stronger Interactions

Weaker Interactions

5

Exploiting Separability Separability = independence assumptions

Given a prior over the solution space represented as a probability vector

1 Sample solutions from the model 2 Update model toward higher-scoring points3 Iterate

Works well when interactions are weak

6

Exploiting Near-Decomposability Bayesian optimization algorithm (BOA)

represent problem decomposition as a Bayesian Network learned greedily via a network scoring metric

Hierarchical BOA uses Bayesian networks with local structure

allows smaller model-building steps leads to more accurate models

restricted tournament replacement promotes diversity

Solves the linkage problem Competence solving hard problems quickly

accurately and reliably

7

Program Learning

Solutions encode executable programs execution maps programs to behaviors

execPB find a program p in P maximizeminimize f(exec(p))

fB

To be useful make assumptions about exec P and B

8

Properties of Program Spaces Open-endedness

Over-representation many programs map to the same behavior

Compositional hierarchy intrinsically organized into subprograms

Chaotic Execution similar programs may have very different behaviors

9

Properties of Program Spaces Simplicity prior

simpler programs are more likely

Simplicity preference smaller programs are preferable

Behavioral decomposability fB is separable nearly decomposable

White box execution execution function is known and constant

10

Thesis

Program spaces not directly decomposable

Leverage properties of program spaces as inductive bias

Leading to competent program evolution

11

Representation-Building

Organize programs in terms of commonalities Ignore semantically meaningless variation Explore plausible variations

12

Representation-Building

Common regions must be aligned Redundancy must be identified Create knobs for plausible variations

13

Representation-Building

What abouthellip changing the phase averaging two input instead of picking one hellip

behavior (semantic) space program (syntactic) space

14

Statics amp Dynamics

Representations span a limited subspace of programs

Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)

Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme

based on dominance relationships

15

Meta-Optimizing Semantic Evolutionary Search (MOSES)

1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space

2 Select a deme and run hBOA on it

3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)

4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme

5 Repeat from step 2

16

Artificial Ant

Eat all food pellets within 600 steps Existing evolutionary methods not

significantly than random Space contains many regularities

To apply MOSES three reductions rules for normal form

eg left left left rarr right separate knobs for rotation

movement amp conditionals no neighborhood reduction

needed

rarr

17

Artificial Ant

How does MOSES do it

Searches a greatly reduced space

Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions

appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo

hBOA modeling learns linkage between rotation knobs

Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000

Technique Effort

Evolutionary Programming

136000 x

Genetic Programming

450000 x

MOSES 23000

18

Elegant Normal Form (Holman rsquo90)

Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98

19

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5

Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)

The same computation on the same formulae reduced to ENF

20

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

Random Formulae Reduced to ENF

21

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10

Enumerate all neighbors (edit distances lt2) compute behavioral distance from source

Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size

22

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

Random formulae Reduced to ENF

23

Hierarchical Parity-Multiplexer

Study decomposition in a Boolean domain

Multiplexer function of arity k1 computed from k1 parity function of arity k2

total arity is k1k2

Hypothesis parity subfunctions will exhibit tighter linkages

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

3

General Optimization

Find a solution s in S Maximizeminimize f(s)

fS

To solve this faster than O(|S|) make assumptions about f

4

Near-Decomposability

Complete separability would be nicehellip

Near-decomposability (Simon 1969) is more realistic

Stronger Interactions

Weaker Interactions

5

Exploiting Separability Separability = independence assumptions

Given a prior over the solution space represented as a probability vector

1 Sample solutions from the model 2 Update model toward higher-scoring points3 Iterate

Works well when interactions are weak

6

Exploiting Near-Decomposability Bayesian optimization algorithm (BOA)

represent problem decomposition as a Bayesian Network learned greedily via a network scoring metric

Hierarchical BOA uses Bayesian networks with local structure

allows smaller model-building steps leads to more accurate models

restricted tournament replacement promotes diversity

Solves the linkage problem Competence solving hard problems quickly

accurately and reliably

7

Program Learning

Solutions encode executable programs execution maps programs to behaviors

execPB find a program p in P maximizeminimize f(exec(p))

fB

To be useful make assumptions about exec P and B

8

Properties of Program Spaces Open-endedness

Over-representation many programs map to the same behavior

Compositional hierarchy intrinsically organized into subprograms

Chaotic Execution similar programs may have very different behaviors

9

Properties of Program Spaces Simplicity prior

simpler programs are more likely

Simplicity preference smaller programs are preferable

Behavioral decomposability fB is separable nearly decomposable

White box execution execution function is known and constant

10

Thesis

Program spaces not directly decomposable

Leverage properties of program spaces as inductive bias

Leading to competent program evolution

11

Representation-Building

Organize programs in terms of commonalities Ignore semantically meaningless variation Explore plausible variations

12

Representation-Building

Common regions must be aligned Redundancy must be identified Create knobs for plausible variations

13

Representation-Building

What abouthellip changing the phase averaging two input instead of picking one hellip

behavior (semantic) space program (syntactic) space

14

Statics amp Dynamics

Representations span a limited subspace of programs

Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)

Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme

based on dominance relationships

15

Meta-Optimizing Semantic Evolutionary Search (MOSES)

1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space

2 Select a deme and run hBOA on it

3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)

4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme

5 Repeat from step 2

16

Artificial Ant

Eat all food pellets within 600 steps Existing evolutionary methods not

significantly than random Space contains many regularities

To apply MOSES three reductions rules for normal form

eg left left left rarr right separate knobs for rotation

movement amp conditionals no neighborhood reduction

needed

rarr

17

Artificial Ant

How does MOSES do it

Searches a greatly reduced space

Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions

appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo

hBOA modeling learns linkage between rotation knobs

Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000

Technique Effort

Evolutionary Programming

136000 x

Genetic Programming

450000 x

MOSES 23000

18

Elegant Normal Form (Holman rsquo90)

Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98

19

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5

Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)

The same computation on the same formulae reduced to ENF

20

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

Random Formulae Reduced to ENF

21

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10

Enumerate all neighbors (edit distances lt2) compute behavioral distance from source

Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size

22

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

Random formulae Reduced to ENF

23

Hierarchical Parity-Multiplexer

Study decomposition in a Boolean domain

Multiplexer function of arity k1 computed from k1 parity function of arity k2

total arity is k1k2

Hypothesis parity subfunctions will exhibit tighter linkages

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

4

Near-Decomposability

Complete separability would be nicehellip

Near-decomposability (Simon 1969) is more realistic

Stronger Interactions

Weaker Interactions

5

Exploiting Separability Separability = independence assumptions

Given a prior over the solution space represented as a probability vector

1 Sample solutions from the model 2 Update model toward higher-scoring points3 Iterate

Works well when interactions are weak

6

Exploiting Near-Decomposability Bayesian optimization algorithm (BOA)

represent problem decomposition as a Bayesian Network learned greedily via a network scoring metric

Hierarchical BOA uses Bayesian networks with local structure

allows smaller model-building steps leads to more accurate models

restricted tournament replacement promotes diversity

Solves the linkage problem Competence solving hard problems quickly

accurately and reliably

7

Program Learning

Solutions encode executable programs execution maps programs to behaviors

execPB find a program p in P maximizeminimize f(exec(p))

fB

To be useful make assumptions about exec P and B

8

Properties of Program Spaces Open-endedness

Over-representation many programs map to the same behavior

Compositional hierarchy intrinsically organized into subprograms

Chaotic Execution similar programs may have very different behaviors

9

Properties of Program Spaces Simplicity prior

simpler programs are more likely

Simplicity preference smaller programs are preferable

Behavioral decomposability fB is separable nearly decomposable

White box execution execution function is known and constant

10

Thesis

Program spaces not directly decomposable

Leverage properties of program spaces as inductive bias

Leading to competent program evolution

11

Representation-Building

Organize programs in terms of commonalities Ignore semantically meaningless variation Explore plausible variations

12

Representation-Building

Common regions must be aligned Redundancy must be identified Create knobs for plausible variations

13

Representation-Building

What abouthellip changing the phase averaging two input instead of picking one hellip

behavior (semantic) space program (syntactic) space

14

Statics amp Dynamics

Representations span a limited subspace of programs

Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)

Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme

based on dominance relationships

15

Meta-Optimizing Semantic Evolutionary Search (MOSES)

1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space

2 Select a deme and run hBOA on it

3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)

4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme

5 Repeat from step 2

16

Artificial Ant

Eat all food pellets within 600 steps Existing evolutionary methods not

significantly than random Space contains many regularities

To apply MOSES three reductions rules for normal form

eg left left left rarr right separate knobs for rotation

movement amp conditionals no neighborhood reduction

needed

rarr

17

Artificial Ant

How does MOSES do it

Searches a greatly reduced space

Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions

appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo

hBOA modeling learns linkage between rotation knobs

Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000

Technique Effort

Evolutionary Programming

136000 x

Genetic Programming

450000 x

MOSES 23000

18

Elegant Normal Form (Holman rsquo90)

Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98

19

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5

Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)

The same computation on the same formulae reduced to ENF

20

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

Random Formulae Reduced to ENF

21

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10

Enumerate all neighbors (edit distances lt2) compute behavioral distance from source

Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size

22

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

Random formulae Reduced to ENF

23

Hierarchical Parity-Multiplexer

Study decomposition in a Boolean domain

Multiplexer function of arity k1 computed from k1 parity function of arity k2

total arity is k1k2

Hypothesis parity subfunctions will exhibit tighter linkages

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

5

Exploiting Separability Separability = independence assumptions

Given a prior over the solution space represented as a probability vector

1 Sample solutions from the model 2 Update model toward higher-scoring points3 Iterate

Works well when interactions are weak

6

Exploiting Near-Decomposability Bayesian optimization algorithm (BOA)

represent problem decomposition as a Bayesian Network learned greedily via a network scoring metric

Hierarchical BOA uses Bayesian networks with local structure

allows smaller model-building steps leads to more accurate models

restricted tournament replacement promotes diversity

Solves the linkage problem Competence solving hard problems quickly

accurately and reliably

7

Program Learning

Solutions encode executable programs execution maps programs to behaviors

execPB find a program p in P maximizeminimize f(exec(p))

fB

To be useful make assumptions about exec P and B

8

Properties of Program Spaces Open-endedness

Over-representation many programs map to the same behavior

Compositional hierarchy intrinsically organized into subprograms

Chaotic Execution similar programs may have very different behaviors

9

Properties of Program Spaces Simplicity prior

simpler programs are more likely

Simplicity preference smaller programs are preferable

Behavioral decomposability fB is separable nearly decomposable

White box execution execution function is known and constant

10

Thesis

Program spaces not directly decomposable

Leverage properties of program spaces as inductive bias

Leading to competent program evolution

11

Representation-Building

Organize programs in terms of commonalities Ignore semantically meaningless variation Explore plausible variations

12

Representation-Building

Common regions must be aligned Redundancy must be identified Create knobs for plausible variations

13

Representation-Building

What abouthellip changing the phase averaging two input instead of picking one hellip

behavior (semantic) space program (syntactic) space

14

Statics amp Dynamics

Representations span a limited subspace of programs

Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)

Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme

based on dominance relationships

15

Meta-Optimizing Semantic Evolutionary Search (MOSES)

1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space

2 Select a deme and run hBOA on it

3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)

4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme

5 Repeat from step 2

16

Artificial Ant

Eat all food pellets within 600 steps Existing evolutionary methods not

significantly than random Space contains many regularities

To apply MOSES three reductions rules for normal form

eg left left left rarr right separate knobs for rotation

movement amp conditionals no neighborhood reduction

needed

rarr

17

Artificial Ant

How does MOSES do it

Searches a greatly reduced space

Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions

appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo

hBOA modeling learns linkage between rotation knobs

Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000

Technique Effort

Evolutionary Programming

136000 x

Genetic Programming

450000 x

MOSES 23000

18

Elegant Normal Form (Holman rsquo90)

Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98

19

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5

Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)

The same computation on the same formulae reduced to ENF

20

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

Random Formulae Reduced to ENF

21

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10

Enumerate all neighbors (edit distances lt2) compute behavioral distance from source

Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size

22

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

Random formulae Reduced to ENF

23

Hierarchical Parity-Multiplexer

Study decomposition in a Boolean domain

Multiplexer function of arity k1 computed from k1 parity function of arity k2

total arity is k1k2

Hypothesis parity subfunctions will exhibit tighter linkages

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

6

Exploiting Near-Decomposability Bayesian optimization algorithm (BOA)

represent problem decomposition as a Bayesian Network learned greedily via a network scoring metric

Hierarchical BOA uses Bayesian networks with local structure

allows smaller model-building steps leads to more accurate models

restricted tournament replacement promotes diversity

Solves the linkage problem Competence solving hard problems quickly

accurately and reliably

7

Program Learning

Solutions encode executable programs execution maps programs to behaviors

execPB find a program p in P maximizeminimize f(exec(p))

fB

To be useful make assumptions about exec P and B

8

Properties of Program Spaces Open-endedness

Over-representation many programs map to the same behavior

Compositional hierarchy intrinsically organized into subprograms

Chaotic Execution similar programs may have very different behaviors

9

Properties of Program Spaces Simplicity prior

simpler programs are more likely

Simplicity preference smaller programs are preferable

Behavioral decomposability fB is separable nearly decomposable

White box execution execution function is known and constant

10

Thesis

Program spaces not directly decomposable

Leverage properties of program spaces as inductive bias

Leading to competent program evolution

11

Representation-Building

Organize programs in terms of commonalities Ignore semantically meaningless variation Explore plausible variations

12

Representation-Building

Common regions must be aligned Redundancy must be identified Create knobs for plausible variations

13

Representation-Building

What abouthellip changing the phase averaging two input instead of picking one hellip

behavior (semantic) space program (syntactic) space

14

Statics amp Dynamics

Representations span a limited subspace of programs

Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)

Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme

based on dominance relationships

15

Meta-Optimizing Semantic Evolutionary Search (MOSES)

1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space

2 Select a deme and run hBOA on it

3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)

4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme

5 Repeat from step 2

16

Artificial Ant

Eat all food pellets within 600 steps Existing evolutionary methods not

significantly than random Space contains many regularities

To apply MOSES three reductions rules for normal form

eg left left left rarr right separate knobs for rotation

movement amp conditionals no neighborhood reduction

needed

rarr

17

Artificial Ant

How does MOSES do it

Searches a greatly reduced space

Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions

appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo

hBOA modeling learns linkage between rotation knobs

Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000

Technique Effort

Evolutionary Programming

136000 x

Genetic Programming

450000 x

MOSES 23000

18

Elegant Normal Form (Holman rsquo90)

Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98

19

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5

Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)

The same computation on the same formulae reduced to ENF

20

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

Random Formulae Reduced to ENF

21

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10

Enumerate all neighbors (edit distances lt2) compute behavioral distance from source

Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size

22

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

Random formulae Reduced to ENF

23

Hierarchical Parity-Multiplexer

Study decomposition in a Boolean domain

Multiplexer function of arity k1 computed from k1 parity function of arity k2

total arity is k1k2

Hypothesis parity subfunctions will exhibit tighter linkages

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

7

Program Learning

Solutions encode executable programs execution maps programs to behaviors

execPB find a program p in P maximizeminimize f(exec(p))

fB

To be useful make assumptions about exec P and B

8

Properties of Program Spaces Open-endedness

Over-representation many programs map to the same behavior

Compositional hierarchy intrinsically organized into subprograms

Chaotic Execution similar programs may have very different behaviors

9

Properties of Program Spaces Simplicity prior

simpler programs are more likely

Simplicity preference smaller programs are preferable

Behavioral decomposability fB is separable nearly decomposable

White box execution execution function is known and constant

10

Thesis

Program spaces not directly decomposable

Leverage properties of program spaces as inductive bias

Leading to competent program evolution

11

Representation-Building

Organize programs in terms of commonalities Ignore semantically meaningless variation Explore plausible variations

12

Representation-Building

Common regions must be aligned Redundancy must be identified Create knobs for plausible variations

13

Representation-Building

What abouthellip changing the phase averaging two input instead of picking one hellip

behavior (semantic) space program (syntactic) space

14

Statics amp Dynamics

Representations span a limited subspace of programs

Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)

Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme

based on dominance relationships

15

Meta-Optimizing Semantic Evolutionary Search (MOSES)

1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space

2 Select a deme and run hBOA on it

3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)

4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme

5 Repeat from step 2

16

Artificial Ant

Eat all food pellets within 600 steps Existing evolutionary methods not

significantly than random Space contains many regularities

To apply MOSES three reductions rules for normal form

eg left left left rarr right separate knobs for rotation

movement amp conditionals no neighborhood reduction

needed

rarr

17

Artificial Ant

How does MOSES do it

Searches a greatly reduced space

Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions

appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo

hBOA modeling learns linkage between rotation knobs

Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000

Technique Effort

Evolutionary Programming

136000 x

Genetic Programming

450000 x

MOSES 23000

18

Elegant Normal Form (Holman rsquo90)

Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98

19

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5

Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)

The same computation on the same formulae reduced to ENF

20

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

Random Formulae Reduced to ENF

21

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10

Enumerate all neighbors (edit distances lt2) compute behavioral distance from source

Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size

22

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

Random formulae Reduced to ENF

23

Hierarchical Parity-Multiplexer

Study decomposition in a Boolean domain

Multiplexer function of arity k1 computed from k1 parity function of arity k2

total arity is k1k2

Hypothesis parity subfunctions will exhibit tighter linkages

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

8

Properties of Program Spaces Open-endedness

Over-representation many programs map to the same behavior

Compositional hierarchy intrinsically organized into subprograms

Chaotic Execution similar programs may have very different behaviors

9

Properties of Program Spaces Simplicity prior

simpler programs are more likely

Simplicity preference smaller programs are preferable

Behavioral decomposability fB is separable nearly decomposable

White box execution execution function is known and constant

10

Thesis

Program spaces not directly decomposable

Leverage properties of program spaces as inductive bias

Leading to competent program evolution

11

Representation-Building

Organize programs in terms of commonalities Ignore semantically meaningless variation Explore plausible variations

12

Representation-Building

Common regions must be aligned Redundancy must be identified Create knobs for plausible variations

13

Representation-Building

What abouthellip changing the phase averaging two input instead of picking one hellip

behavior (semantic) space program (syntactic) space

14

Statics amp Dynamics

Representations span a limited subspace of programs

Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)

Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme

based on dominance relationships

15

Meta-Optimizing Semantic Evolutionary Search (MOSES)

1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space

2 Select a deme and run hBOA on it

3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)

4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme

5 Repeat from step 2

16

Artificial Ant

Eat all food pellets within 600 steps Existing evolutionary methods not

significantly than random Space contains many regularities

To apply MOSES three reductions rules for normal form

eg left left left rarr right separate knobs for rotation

movement amp conditionals no neighborhood reduction

needed

rarr

17

Artificial Ant

How does MOSES do it

Searches a greatly reduced space

Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions

appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo

hBOA modeling learns linkage between rotation knobs

Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000

Technique Effort

Evolutionary Programming

136000 x

Genetic Programming

450000 x

MOSES 23000

18

Elegant Normal Form (Holman rsquo90)

Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98

19

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5

Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)

The same computation on the same formulae reduced to ENF

20

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

Random Formulae Reduced to ENF

21

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10

Enumerate all neighbors (edit distances lt2) compute behavioral distance from source

Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size

22

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

Random formulae Reduced to ENF

23

Hierarchical Parity-Multiplexer

Study decomposition in a Boolean domain

Multiplexer function of arity k1 computed from k1 parity function of arity k2

total arity is k1k2

Hypothesis parity subfunctions will exhibit tighter linkages

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

9

Properties of Program Spaces Simplicity prior

simpler programs are more likely

Simplicity preference smaller programs are preferable

Behavioral decomposability fB is separable nearly decomposable

White box execution execution function is known and constant

10

Thesis

Program spaces not directly decomposable

Leverage properties of program spaces as inductive bias

Leading to competent program evolution

11

Representation-Building

Organize programs in terms of commonalities Ignore semantically meaningless variation Explore plausible variations

12

Representation-Building

Common regions must be aligned Redundancy must be identified Create knobs for plausible variations

13

Representation-Building

What abouthellip changing the phase averaging two input instead of picking one hellip

behavior (semantic) space program (syntactic) space

14

Statics amp Dynamics

Representations span a limited subspace of programs

Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)

Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme

based on dominance relationships

15

Meta-Optimizing Semantic Evolutionary Search (MOSES)

1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space

2 Select a deme and run hBOA on it

3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)

4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme

5 Repeat from step 2

16

Artificial Ant

Eat all food pellets within 600 steps Existing evolutionary methods not

significantly than random Space contains many regularities

To apply MOSES three reductions rules for normal form

eg left left left rarr right separate knobs for rotation

movement amp conditionals no neighborhood reduction

needed

rarr

17

Artificial Ant

How does MOSES do it

Searches a greatly reduced space

Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions

appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo

hBOA modeling learns linkage between rotation knobs

Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000

Technique Effort

Evolutionary Programming

136000 x

Genetic Programming

450000 x

MOSES 23000

18

Elegant Normal Form (Holman rsquo90)

Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98

19

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5

Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)

The same computation on the same formulae reduced to ENF

20

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

Random Formulae Reduced to ENF

21

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10

Enumerate all neighbors (edit distances lt2) compute behavioral distance from source

Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size

22

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

Random formulae Reduced to ENF

23

Hierarchical Parity-Multiplexer

Study decomposition in a Boolean domain

Multiplexer function of arity k1 computed from k1 parity function of arity k2

total arity is k1k2

Hypothesis parity subfunctions will exhibit tighter linkages

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

10

Thesis

Program spaces not directly decomposable

Leverage properties of program spaces as inductive bias

Leading to competent program evolution

11

Representation-Building

Organize programs in terms of commonalities Ignore semantically meaningless variation Explore plausible variations

12

Representation-Building

Common regions must be aligned Redundancy must be identified Create knobs for plausible variations

13

Representation-Building

What abouthellip changing the phase averaging two input instead of picking one hellip

behavior (semantic) space program (syntactic) space

14

Statics amp Dynamics

Representations span a limited subspace of programs

Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)

Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme

based on dominance relationships

15

Meta-Optimizing Semantic Evolutionary Search (MOSES)

1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space

2 Select a deme and run hBOA on it

3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)

4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme

5 Repeat from step 2

16

Artificial Ant

Eat all food pellets within 600 steps Existing evolutionary methods not

significantly than random Space contains many regularities

To apply MOSES three reductions rules for normal form

eg left left left rarr right separate knobs for rotation

movement amp conditionals no neighborhood reduction

needed

rarr

17

Artificial Ant

How does MOSES do it

Searches a greatly reduced space

Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions

appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo

hBOA modeling learns linkage between rotation knobs

Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000

Technique Effort

Evolutionary Programming

136000 x

Genetic Programming

450000 x

MOSES 23000

18

Elegant Normal Form (Holman rsquo90)

Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98

19

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5

Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)

The same computation on the same formulae reduced to ENF

20

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

Random Formulae Reduced to ENF

21

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10

Enumerate all neighbors (edit distances lt2) compute behavioral distance from source

Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size

22

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

Random formulae Reduced to ENF

23

Hierarchical Parity-Multiplexer

Study decomposition in a Boolean domain

Multiplexer function of arity k1 computed from k1 parity function of arity k2

total arity is k1k2

Hypothesis parity subfunctions will exhibit tighter linkages

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

11

Representation-Building

Organize programs in terms of commonalities Ignore semantically meaningless variation Explore plausible variations

12

Representation-Building

Common regions must be aligned Redundancy must be identified Create knobs for plausible variations

13

Representation-Building

What abouthellip changing the phase averaging two input instead of picking one hellip

behavior (semantic) space program (syntactic) space

14

Statics amp Dynamics

Representations span a limited subspace of programs

Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)

Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme

based on dominance relationships

15

Meta-Optimizing Semantic Evolutionary Search (MOSES)

1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space

2 Select a deme and run hBOA on it

3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)

4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme

5 Repeat from step 2

16

Artificial Ant

Eat all food pellets within 600 steps Existing evolutionary methods not

significantly than random Space contains many regularities

To apply MOSES three reductions rules for normal form

eg left left left rarr right separate knobs for rotation

movement amp conditionals no neighborhood reduction

needed

rarr

17

Artificial Ant

How does MOSES do it

Searches a greatly reduced space

Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions

appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo

hBOA modeling learns linkage between rotation knobs

Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000

Technique Effort

Evolutionary Programming

136000 x

Genetic Programming

450000 x

MOSES 23000

18

Elegant Normal Form (Holman rsquo90)

Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98

19

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5

Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)

The same computation on the same formulae reduced to ENF

20

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

Random Formulae Reduced to ENF

21

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10

Enumerate all neighbors (edit distances lt2) compute behavioral distance from source

Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size

22

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

Random formulae Reduced to ENF

23

Hierarchical Parity-Multiplexer

Study decomposition in a Boolean domain

Multiplexer function of arity k1 computed from k1 parity function of arity k2

total arity is k1k2

Hypothesis parity subfunctions will exhibit tighter linkages

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

12

Representation-Building

Common regions must be aligned Redundancy must be identified Create knobs for plausible variations

13

Representation-Building

What abouthellip changing the phase averaging two input instead of picking one hellip

behavior (semantic) space program (syntactic) space

14

Statics amp Dynamics

Representations span a limited subspace of programs

Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)

Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme

based on dominance relationships

15

Meta-Optimizing Semantic Evolutionary Search (MOSES)

1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space

2 Select a deme and run hBOA on it

3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)

4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme

5 Repeat from step 2

16

Artificial Ant

Eat all food pellets within 600 steps Existing evolutionary methods not

significantly than random Space contains many regularities

To apply MOSES three reductions rules for normal form

eg left left left rarr right separate knobs for rotation

movement amp conditionals no neighborhood reduction

needed

rarr

17

Artificial Ant

How does MOSES do it

Searches a greatly reduced space

Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions

appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo

hBOA modeling learns linkage between rotation knobs

Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000

Technique Effort

Evolutionary Programming

136000 x

Genetic Programming

450000 x

MOSES 23000

18

Elegant Normal Form (Holman rsquo90)

Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98

19

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5

Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)

The same computation on the same formulae reduced to ENF

20

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

Random Formulae Reduced to ENF

21

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10

Enumerate all neighbors (edit distances lt2) compute behavioral distance from source

Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size

22

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

Random formulae Reduced to ENF

23

Hierarchical Parity-Multiplexer

Study decomposition in a Boolean domain

Multiplexer function of arity k1 computed from k1 parity function of arity k2

total arity is k1k2

Hypothesis parity subfunctions will exhibit tighter linkages

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

13

Representation-Building

What abouthellip changing the phase averaging two input instead of picking one hellip

behavior (semantic) space program (syntactic) space

14

Statics amp Dynamics

Representations span a limited subspace of programs

Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)

Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme

based on dominance relationships

15

Meta-Optimizing Semantic Evolutionary Search (MOSES)

1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space

2 Select a deme and run hBOA on it

3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)

4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme

5 Repeat from step 2

16

Artificial Ant

Eat all food pellets within 600 steps Existing evolutionary methods not

significantly than random Space contains many regularities

To apply MOSES three reductions rules for normal form

eg left left left rarr right separate knobs for rotation

movement amp conditionals no neighborhood reduction

needed

rarr

17

Artificial Ant

How does MOSES do it

Searches a greatly reduced space

Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions

appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo

hBOA modeling learns linkage between rotation knobs

Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000

Technique Effort

Evolutionary Programming

136000 x

Genetic Programming

450000 x

MOSES 23000

18

Elegant Normal Form (Holman rsquo90)

Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98

19

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5

Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)

The same computation on the same formulae reduced to ENF

20

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

Random Formulae Reduced to ENF

21

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10

Enumerate all neighbors (edit distances lt2) compute behavioral distance from source

Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size

22

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

Random formulae Reduced to ENF

23

Hierarchical Parity-Multiplexer

Study decomposition in a Boolean domain

Multiplexer function of arity k1 computed from k1 parity function of arity k2

total arity is k1k2

Hypothesis parity subfunctions will exhibit tighter linkages

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

14

Statics amp Dynamics

Representations span a limited subspace of programs

Conceptual steps in representation-building1 reduction to normal form (x x + 0 rarr x)2 neighborhood enumeration (generate knobs)3 neighborhood reduction (get rid of some knobs)

Create demes to maintain a sample of many representations deme a sample of programs living in a common representation intra-deme optimization use the hBOA inter-deme

based on dominance relationships

15

Meta-Optimizing Semantic Evolutionary Search (MOSES)

1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space

2 Select a deme and run hBOA on it

3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)

4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme

5 Repeat from step 2

16

Artificial Ant

Eat all food pellets within 600 steps Existing evolutionary methods not

significantly than random Space contains many regularities

To apply MOSES three reductions rules for normal form

eg left left left rarr right separate knobs for rotation

movement amp conditionals no neighborhood reduction

needed

rarr

17

Artificial Ant

How does MOSES do it

Searches a greatly reduced space

Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions

appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo

hBOA modeling learns linkage between rotation knobs

Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000

Technique Effort

Evolutionary Programming

136000 x

Genetic Programming

450000 x

MOSES 23000

18

Elegant Normal Form (Holman rsquo90)

Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98

19

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5

Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)

The same computation on the same formulae reduced to ENF

20

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

Random Formulae Reduced to ENF

21

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10

Enumerate all neighbors (edit distances lt2) compute behavioral distance from source

Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size

22

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

Random formulae Reduced to ENF

23

Hierarchical Parity-Multiplexer

Study decomposition in a Boolean domain

Multiplexer function of arity k1 computed from k1 parity function of arity k2

total arity is k1k2

Hypothesis parity subfunctions will exhibit tighter linkages

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

15

Meta-Optimizing Semantic Evolutionary Search (MOSES)

1 Create an initial deme based on a small set of knobs (ie empty program) and random sampling in knob-space

2 Select a deme and run hBOA on it

3 Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes)

4 For each such program1 create a new representation centered around the program2 create a new random sample within this representation3 add as a deme

5 Repeat from step 2

16

Artificial Ant

Eat all food pellets within 600 steps Existing evolutionary methods not

significantly than random Space contains many regularities

To apply MOSES three reductions rules for normal form

eg left left left rarr right separate knobs for rotation

movement amp conditionals no neighborhood reduction

needed

rarr

17

Artificial Ant

How does MOSES do it

Searches a greatly reduced space

Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions

appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo

hBOA modeling learns linkage between rotation knobs

Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000

Technique Effort

Evolutionary Programming

136000 x

Genetic Programming

450000 x

MOSES 23000

18

Elegant Normal Form (Holman rsquo90)

Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98

19

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5

Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)

The same computation on the same formulae reduced to ENF

20

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

Random Formulae Reduced to ENF

21

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10

Enumerate all neighbors (edit distances lt2) compute behavioral distance from source

Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size

22

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

Random formulae Reduced to ENF

23

Hierarchical Parity-Multiplexer

Study decomposition in a Boolean domain

Multiplexer function of arity k1 computed from k1 parity function of arity k2

total arity is k1k2

Hypothesis parity subfunctions will exhibit tighter linkages

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

16

Artificial Ant

Eat all food pellets within 600 steps Existing evolutionary methods not

significantly than random Space contains many regularities

To apply MOSES three reductions rules for normal form

eg left left left rarr right separate knobs for rotation

movement amp conditionals no neighborhood reduction

needed

rarr

17

Artificial Ant

How does MOSES do it

Searches a greatly reduced space

Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions

appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo

hBOA modeling learns linkage between rotation knobs

Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000

Technique Effort

Evolutionary Programming

136000 x

Genetic Programming

450000 x

MOSES 23000

18

Elegant Normal Form (Holman rsquo90)

Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98

19

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5

Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)

The same computation on the same formulae reduced to ENF

20

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

Random Formulae Reduced to ENF

21

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10

Enumerate all neighbors (edit distances lt2) compute behavioral distance from source

Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size

22

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

Random formulae Reduced to ENF

23

Hierarchical Parity-Multiplexer

Study decomposition in a Boolean domain

Multiplexer function of arity k1 computed from k1 parity function of arity k2

total arity is k1k2

Hypothesis parity subfunctions will exhibit tighter linkages

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

17

Artificial Ant

How does MOSES do it

Searches a greatly reduced space

Exploits key dependencies ldquo[t]hese symmetries lead to essentially the same solutions

appearing to be the opposite of each other Eg either a pair of Right or pair of Left terminals at a particular location may be importantrdquo ndash Langdon amp Poli ldquoWhy ants are hardrdquo

hBOA modeling learns linkage between rotation knobs

Eliminate modeling and the problem still gets solved but with much higher variance computational effort rises to 36000

Technique Effort

Evolutionary Programming

136000 x

Genetic Programming

450000 x

MOSES 23000

18

Elegant Normal Form (Holman rsquo90)

Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98

19

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5

Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)

The same computation on the same formulae reduced to ENF

20

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

Random Formulae Reduced to ENF

21

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10

Enumerate all neighbors (edit distances lt2) compute behavioral distance from source

Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size

22

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

Random formulae Reduced to ENF

23

Hierarchical Parity-Multiplexer

Study decomposition in a Boolean domain

Multiplexer function of arity k1 computed from k1 parity function of arity k2

total arity is k1k2

Hypothesis parity subfunctions will exhibit tighter linkages

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

18

Elegant Normal Form (Holman rsquo90)

Hierarchical normal form for Boolean formulae Reduction process takes time linear in formula size 99 of random 500-literal formulae reduced over 98

19

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5

Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)

The same computation on the same formulae reduced to ENF

20

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

Random Formulae Reduced to ENF

21

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10

Enumerate all neighbors (edit distances lt2) compute behavioral distance from source

Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size

22

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

Random formulae Reduced to ENF

23

Hierarchical Parity-Multiplexer

Study decomposition in a Boolean domain

Multiplexer function of arity k1 computed from k1 parity function of arity k2

total arity is k1k2

Hypothesis parity subfunctions will exhibit tighter linkages

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

19

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

5000 unique random formulae of arity 10 with 30 literals each qualitatively similar results for arity 5

Computed the set of pairwise behavioral distances (truth-table Hamming distance) syntactic distances (tree edit distance normalized by tree size)

The same computation on the same formulae reduced to ENF

20

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

Random Formulae Reduced to ENF

21

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10

Enumerate all neighbors (edit distances lt2) compute behavioral distance from source

Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size

22

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

Random formulae Reduced to ENF

23

Hierarchical Parity-Multiplexer

Study decomposition in a Boolean domain

Multiplexer function of arity k1 computed from k1 parity function of arity k2

total arity is k1k2

Hypothesis parity subfunctions will exhibit tighter linkages

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

20

Syntactic vs Behavioral Distance

Is there a correlation between syntactic and behavioral distance

Random Formulae Reduced to ENF

21

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10

Enumerate all neighbors (edit distances lt2) compute behavioral distance from source

Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size

22

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

Random formulae Reduced to ENF

23

Hierarchical Parity-Multiplexer

Study decomposition in a Boolean domain

Multiplexer function of arity k1 computed from k1 parity function of arity k2

total arity is k1k2

Hypothesis parity subfunctions will exhibit tighter linkages

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

21

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

1000 unique random formulae arity 5 100 literals each qualitatively similar results for arity 10

Enumerate all neighbors (edit distances lt2) compute behavioral distance from source

Neighborhoods in MOSES defined based on ENF neighbors are converted to ENF compared to original used to heuristically reduce total neighborhood size

22

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

Random formulae Reduced to ENF

23

Hierarchical Parity-Multiplexer

Study decomposition in a Boolean domain

Multiplexer function of arity k1 computed from k1 parity function of arity k2

total arity is k1k2

Hypothesis parity subfunctions will exhibit tighter linkages

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

22

Neighborhoods amp Knobs

What do neighborhoods look like behaviorally

Random formulae Reduced to ENF

23

Hierarchical Parity-Multiplexer

Study decomposition in a Boolean domain

Multiplexer function of arity k1 computed from k1 parity function of arity k2

total arity is k1k2

Hypothesis parity subfunctions will exhibit tighter linkages

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

23

Hierarchical Parity-Multiplexer

Study decomposition in a Boolean domain

Multiplexer function of arity k1 computed from k1 parity function of arity k2

total arity is k1k2

Hypothesis parity subfunctions will exhibit tighter linkages

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

24

Hierarchical Parity-Multiplexer Computational effort decreases 42 with model-

building (on 2-parity-3-multiplexer)

Paritysubfunctions(adjacent pairs)have tightest linkages

Hypothesis validated

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

25

Program Growth

5-parity minimal program size ~ 53

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

26

Program Growth

11-multiplexer minimal program size ~ 27

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

27

Where do the Cycles Go

Problem hBOA Representation-Building

Program Evaluation

5-Parity 28 43 29

11-multiplex 5 5 89

CFS 80 10 11

Complexity O(Nl2a2) O(Nla) O(Nlc)

N is population size O(n105)l is program size a is the arity of the spacen is representation size O(aprogram size)c is number of test cases

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

28

Supervised Classification

Goals accuracies comparable to SVM

superior accuracy vs GP

simpler classifiers vs SVM and GP

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

29

Supervised Classification

How much simpler Consider average-sized formulae learned for the 6-multiplexer

MOSES 21 nodes max depth 4

and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3)))))

or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))

GP (after reduction to ENF) 50 nodes max depth 7

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

30

Supervised Classification

Datasets taken from recent comp bio papers

Chronic fatigue syndrome (101 cases) based on 26 SNPs genes either in homozygosis in heterozygosis or not expressed 56 binary features

Lymphoma (77 cases) amp aging brains (19 cases) based on gene expression levels (continuous) 50 most-differentiating genes selected preprocessed into binary features based on medians

All experiments based on 10 independent runs of 10-fold cross-validation

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

31

Quantitative Results

Classification average test accuracy

Technique CFS Lymphoma Aging Brain

SVM 662 975 950

GP 673 779 700

MOSES 679 946 953

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

32

Quantitative Results

Benchmark performance artificial ant

6x less computational effort vs EP 20x less vs GP

parity problems 133x less vs EP 4x less vs GP on 5-parity found solutions to 6-parity (none found by EP or GP)

multiplexer problems 9x less vs GP on 11-multiplexer

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

33

Qualitative Results

Requirements for competent program evolution all requirements for competent optimization

+ exploit semantics

+ recombine programs only within bounded subspaces

Bipartite conception of problem difficulty program-level adapted from the optimization case

deme-level theory based on global properties of the space (deme-level neutrality deceptiveness etc)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

34

Qualitative Results

Representation-building for programs

parameterization based on semantics

transforms program space properties to facilitate program evolution

probabilistic modeling over sets of program transformations

models compactly represent problem structure

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

35

Competent Program Evolution

Competent not just good performance explainability of good results robustness

Vision representations are important program learning is unique representations must be specialized based on semantics

MOSES meta-optimizing semantic evolutionary search exploiting semantics and managing demes

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)

36

Committee

Dr Ron Loui (WashU chair)

Dr Guy Genin (WashU) Dr Ben Goertzel (Virginia Tech

Novamente LLC) Dr David E Goldberg (UIUC) Dr John Lockwood (WashU) Dr Martin Pelikan (UMSL) Dr Robert Pless (WashU) Dr William Smart (WashU)