1 simulated annealing (reading – section 10.9 of nrc) genetic algorithms optimisation methods
Post on 22-Dec-2015
225 views
TRANSCRIPT
2
Simulated Annealing
Optimisation methods to date only find minimum of current basin on hyper-surface
SA (and Gas) are optimisation methods that can handle multiple local minima.
Analogy with thermodynamics, cooling and annealing of metals, cooling and freezing of liquids
Must provide the following elements: A description of possible system states A generator of random changes in the system (options
for next system state) An objective function (analogue of system energy) A control parameter (analogue of temperature) and a
cooling schedule which describes how the control parameter is lowered from high to low values.
11
Genetic Algorithms in a slide
Premise Evolution worked once (it produced us!), it might
work again Basics
Pool of solutions
Mate existing solutions to produce new solutions
Mutate current solutions for long-term diversity
Cull population
12
Genetic Algorithms in a slide
randomly initialise a pool of solutionsNext_Generation
Mutation_Loop
select a solution from pool using relative fitnessmutate solution and save
end_LoopCrossover_Loop
select pairs of solutions from pool using relative fitnesscrossover and save both child solutions
end_LoopTermination_Check
if not finishedcreate new pool from saved solutionsgoto Next_Generation GA Demo
13
Originator
John Holland
Seminal work Adaptation in Natural and Artificial Systems
introduced main GA concepts, 1975
14
Introduction
Computing pioneers (especially in AI) looked to natural systems as guiding metaphors
Evolutionary computation Any biologically-motivated computing activity
simulating natural evolution
Genetic Algorithms are one form of this activity
Original goals Formal study of the phenomenon of adaptation
John Holland
An optimization tool for engineering problems
15
Main idea
Take a population of candidate solutions to a given problem
Use operators inspired by the mechanisms of natural genetic variation
Apply selective pressure toward certain properties
Evolve a more fit solution
16
Why evolution as a metaphor
Ability to efficiently guide a search through a large solution space
Ability to adapt solutions to changing environments
“Emergent” behavior is the goal
“The hoped-for emergent behavior is the design of high-quality solutions to difficult problems and the ability to adapt these solutions in the face of a changing environment”
Melanie Mitchell, An Introduction to Genetic Algorithms
17
Evolutionary terminology
Abstractions imported from biology Chromosomes, Genes, Alleles Fitness, Selection Crossover, Mutation
18
GA terminology
In the spirit – but not the letter – of biology GA chromosomes are strings of genes
Each gene has a number of alleles; i.e., settings
Each chromosome is an encoding of a solution to a problem
A population of such chromosomes is operated on by a GA
19
Encoding
A data structure for representing candidate solutions Often takes the form of a bit string
Usually has internal structure; i.e., different parts of the string represent different aspects of the solution)
20
Crossover
Mimics biological recombination Some portion of genetic material is swapped between
chromosomes Typically the swapping produces an offspring
Mechanism for the dissemination of “building blocks” (schemas)
21
Mutation
Selects a random locus – gene location – with some probability and alters the allele at that locus
The intuitive mechanism for the preservation of variety in the population
22
Fitness
A measure of the goodness of the organism
Expressed as the probability that the organism will live another cycle (generation)
Basis for the natural selection simulation Organisms are selected to mate with probabilities
proportional to their fitness
Probabilistically better solutions have a better chance of conferring their building blocks to the next generation (cycle)
23
A Simple GA
Generate initial populationdo
Calculate the fitness of each member// simulate another generationdo
Select parents from current populationPerform crossover add offspring to the
new populationwhile new population is not full
Merge new population into the current population
Mutate current population
while not converged
24
How do GAs work
The structure of a GA is simple to comprehend, but the dynamic behavior is complex
Holland has done significant work on the theoretical foundations of GAs
“GAs work by discovering, emphasizing, and recombining good ‘building blocks’ of solutions in a highly parallel fashion.”
Melanie Mitchell, paraphrasing John Holland
Using formalism Notion of a building block is formalized as a schema Schemas are propagated or destroyed according to
the laws of probability
25
Genetic algorithm (GA), I
GA works by using a large population to explore many options in parallel.
The state of a GA is given by a population, with each member of the population being a complete set of parameters
26
Genetic algorithm - Overview
The whole population is updated in generations by four steps: Fitness: evaluate the function being searched Reproduction: the members of the new population
are selected based on their fitness. Members with a low fitness might disappear, and one with a high fitness can be duplicated.
Crossover: After two parents are randomly chosen based on their fitness, the offspring gets its parameter values based on some kind of random selection from the parents.
Mutation: randomly or in some other way change the parameter values
27
Genetic algorithm - Overview
Distribution of Individuals in Generation 0
Distribution of Individuals in Generation N
28
Example - Cumulative Selection
Methinks it is like a weasel 28 characters including blank 2728 random cases. Starting from random sentences, we can find the desired
sentence by the following procedure1. Generate 10 sentences of 27 randomly chosen
characters2. Select the sentence that has the most correct letters3. Duplicate this best sentence ten times4. For each duplicate, randomly replace a few letters
(mutation rate)5. Repeat step 2-4 until the target sentence is matched.
The Weasel Applet: http://home.pacbell.net/s-max/scott/weasel.html
29
Genetic algorithm - Overview
Maximization problem in 2D space ( 0<x<1, 0<y<1 )
Encoding: Individual: a point at (x, y);
P1=(0.14429628, 0.72317247), P2=(0.71281369, 0.83459991) Encoding to chromosome-like string:
P1=“1442962872317247”, P2=“7128136983459991”
30
Genetic algorithm - Overview
Breeding: Crossover:
P1 = 1442962872317247 O1 = 1448136983459991
P2 = 7128136983459991 O2 = 7122962872317247
Mutation:O2 = 7122962872317247 O2 = 7122962878317247
Decoding:O1 = 1448136983459991 (0.14481369, 0.83459991)O2 = 7122962878317247 (0.71229628, 0.78317247)
31
Genetic algorithm - Overview
Elitism: Store away the parameters defining the fittest
member of the current population. And later copy it intact in the offspring population
Variable mutation rate At any given time, keep track of the fitness value of
the fittest population members, and of the median ranked member.
The fitness difference f between those two indivituals is a measure of population convergenece.
If f becomes too small, increase the mutation rate. If f becomes too large, decrease the mutation rate.
32
Genetic algorithm - Overview
Hamming wall & creep mutation Example: optimal = 21000, current = 19994 Choose a digit Instead of replacing the digit, add either 1 or -1 Example: creep mutation hitting the middle “9” with 1
20094
33
Schema
A template, much like a regular expression, describing a set of strings
The set of strings represented by a given schema characterizes a set of candidate solutions sharing a property
This property is the encoded equivalent of a building block
34
Example
0 or 1 represents a fixed bit Asterisk represents a “don’t care” 11****00 is the set of all solutions encoded in 8
bits, beginning with two ones and ending with two zeros Solutions in this set all share the same variants of the
properties encoded at these loci
35
Schema qualifiers
Length The inclusive distance between the two bits in a
schema which are furthest apart (the defining length of the previous example is 8)
Order The number of fixed bits in a schema (the order of the
previous example is 4)
36
Not just sum of the parts
GAs explicitly evaluate and operate on whole solutions
GAs implicitly evaluate and operate on building blocks Existing schemas may be destroyed or weakened by
crossover New schemas may be spliced together from existing
schema
Crossover includes no notion of a schema – only of the chromosomes
37
Why do they work
Schemas can be destroyed or conserved
So how are good schemas propagated through generations? Conserved – good – schemas confer higher fitness on
the offspring inheriting them
Fitter offspring are probabilistically more likely to be chosen to reproduce
38
Approximating schema dynamics
Let H be a schema with at least one instance present in the population at time t
Let m(H, t) be the number of instances of H at time t
Let x be an instance of H and f(x) be its fitness The expected number of offspring of x is
f(x)/f(pop) (by fitness proportionate selection) To know E(m(H, t +1)) (the expected number
of instances of schema H at the next time unit), sum f(x)/f(pop) for all x in H GA never explicitly calculates the average fitness of a
schema, but schema proliferation depends on its value
39
Approximating schema dynamics
Approximation can be refined by taking into account the operators
Schemas of long defining length are less likely to survive crossover Offspring are less likely to be instances of such schemas
Schemas of higher order are less likely to survive mutation
Effects can be used to bound the approximate rates at which schemas proliferate
40
Implications
Instances of short, low-order schemas whose average fitness tends to stay above the mean will increase exponentially
Changing the semantics of the operators can change the selective pressures toward different types of schemas
41
Theoretical Foundations
Empirical observation GAs can work
Goal Learn how to best use the tool
Strategy Understand the dynamics of the model Develop performance metrics in order to quantify
success
42
Theoretical Foundations
Issues surrounding the dynamics of the model What laws characterize the macroscopic behavior of
GAs?
How do microscopic events give rise to this macroscopic behavior?
43
Theoretical Foundation
Holland’s motivation Construct a theoretical framework for adaptive
systems as seen in nature Apply this framework to the design of artificial
adaptive systems
Issues in performance evaluation According to what criteria should GAs be evaluated? What does it mean for a GA to do well or poorly? Under what conditions is a GA an appropriate solution
strategy for a problem?
44
Theoretical Foundation
Holland’s observations An adaptive system must persistently identify, test,
and incorporate structural properties hypothesized to give better performance in some environment
Adaptation is impossible in a sufficiently random environment
45
Theoretical Foundation
Holland’s intuition A GA is capable of modeling the necessary tasks in an
adaptive system
It does so through a combination of explicit computation and implicit estimation of state combined with incremental change of state in directions motivated by these calculations
46
Theoretical Foundation
Holland’s assertion The ‘identify and test’ requirement is satisfied by the
calculation of the fitnesses of various schemas
The ‘incorporate’ requirement is satisfied by implication of the Schema Theorem
47
Theoretical Foundation
How does a GA identify and test properties? A schema is the formalization of a property A GA explicitly calculates fitnesses of individuals and
thereby schemas in the population It implicitly estimates fitnesses of hypothetical
individuals sharing known schemas In this way it efficiently manages information
regarding the entire search space
48
Theoretical Foundation
How does a GA incorporate observed good properties into the population? Implication of the Schema Theorem
Short, low-order, higher than average fitness schemas will receive exponentially increasing numbers of samples over time
49
Theoretical Foundation
Lemmas to the Schema Theorem Selection focuses the search Crossover combines good schemas Mutation is the insurance policy
50
Theoretical Foundation
Holland’s characterization Adaptation in natural systems is framed by a tension
between exploration and exploitation Any move toward the testing of previously unseen
schemas or of those with instances of low fitness takes away from the wholesale incorporation of known high fitness schemas
But without exploration, schemas of even higher fitness can not be discovered
51
Theoretical Foundation
Goal of Holland’s first offering The original GA was proposed as an “adaptive plan”
for accomplishing a proper balance between exploration and exploitation
52
Theoretical Foundation
GA does in fact model this Given certain assumptions, the balance is achieved
A key assumption is that the observed and actual fitnesses of schemas are correlated
This assumption creates a stumbling block to which we will return
53
Traveling Salesperson Problem
Find the minimum distance tour around a set of cities, visiting each city only once and ending back where youstarted from.
54
Initial Population for TSP
(5,3,4,6,2) (2,4,6,3,5) (4,3,6,5,2)
(2,3,4,6,5) (4,3,6,2,5) (3,4,5,2,6)
(3,5,4,6,2) (4,5,3,6,2) (5,4,2,3,6)
(4,6,3,2,5) (3,4,2,6,5) (3,6,5,1,4)
55
Select Parents
(5,3,4,6,2) (2,4,6,3,5) (4,3,6,5,2)
(2,3,4,6,5) (4,3,6,2,5) (3,4,5,2,6)
(3,5,4,6,2) (4,5,3,6,2) (5,4,2,3,6)
(4,6,3,2,5) (3,4,2,6,5) (3,6,5,1,4)
Try to pick the better ones.
56
Create Off-Spring – 1 point
(5,3,4,6,2) (2,4,6,3,5) (4,3,6,5,2)
(2,3,4,6,5) (4,3,6,2,5) (3,4,5,2,6)
(3,5,4,6,2) (4,5,3,6,2) (5,4,2,3,6)
(4,6,3,2,5) (3,4,2,6,5) (3,6,5,1,4)
(3,4,5,6,2)
57
(3,4,5,6,2)
Create More Offspring
(5,3,4,6,2) (2,4,6,3,5) (4,3,6,5,2)
(2,3,4,6,5) (4,3,6,2,5) (3,4,5,2,6)
(3,5,4,6,2) (4,5,3,6,2) (5,4,2,3,6)
(4,6,3,2,5) (3,4,2,6,5) (3,6,5,1,4)
(5,4,2,6,3)
58
(3,4,5,6,2) (5,4,2,6,3)
Mutate
(5,3,4,6,2) (2,4,6,3,5) (4,3,6,5,2)
(2,3,4,6,5) (4,3,6,2,5) (3,4,5,2,6)
(3,5,4,6,2) (4,5,3,6,2) (5,4,2,3,6)
(4,6,3,2,5) (3,4,2,6,5) (3,6,5,1,4)
59
Mutate
(5,3,4,6,2) (2,4,6,3,5) (4,3,6,5,2)
(2,3,4,6,5) (2,3,6,4,5) (3,4,5,2,6)
(3,5,4,6,2) (4,5,3,6,2) (5,4,2,3,6)
(4,6,3,2,5) (3,4,2,6,5) (3,6,5,1,4)
(3,4,5,6,2) (5,4,2,6,3)
60
Eliminate
(5,3,4,6,2) (2,4,6,3,5) (4,3,6,5,2)
(2,3,4,6,5) (2,3,6,4,5) (3,4,5,2,6)
(3,5,4,6,2) (4,5,3,6,2) (5,4,2,3,6)
(4,6,3,2,5) (3,4,2,6,5) (3,6,5,1,4)
Tend to kill off the worst ones.
(3,4,5,6,2) (5,4,2,6,3)
61
Integrate
(5,3,4,6,2) (2,4,6,3,5)
(2,3,6,4,5) (3,4,5,2,6)
(3,5,4,6,2) (4,5,3,6,2) (5,4,2,3,6)
(4,6,3,2,5) (3,4,2,6,5) (3,6,5,1,4)
(3,4,5,6,2)
(5,4,2,6,3)
62
Restart
(5,3,4,6,2) (2,4,6,3,5)
(2,3,6,4,5) (3,4,5,2,6)
(3,5,4,6,2) (4,5,3,6,2) (5,4,2,3,6)
(4,6,3,2,5) (3,4,2,6,5) (3,6,5,1,4)
(3,4,5,6,2)
(5,4,2,6,3)
63
Genetic Algorithms
Facts Very robust but slow
Can make simulated annealing seem fast In the limit, optimal
64
Other GA-TSP Possibilities
Ordinal Representation Partially-Mapped Crossover Edge Recombination Crossover
Problem Operators are not sufficiently exploiting the proper
“building blocks” used to create new solutions.
65
Genetic Algorithms
Some ideas Parallelism Punctuated equilibria Jump starting Problem-specific information Synthesize with simulated annealing Perturbation operator