unconventional optimization david corne, school of mathematical & computer sciences (macs)...
Post on 15-Jan-2016
228 views
TRANSCRIPT
Unconventional Optimization
David Corne, School of Mathematical & Computer Sciences
(MACS)Heriot-Watt University, Edinburgh, UK
• About Heriot-Watt University - born: 1821
- address: Edinburgh, UK
- very focussed on science and technology
- photo, from close up:
MACS by numbers~50 Mathematicians (academics)
~25 Computer scientists (academics)
~50 current funded projects (~£18M)
~30 current industry collaborators
~2 Mathematics Depts
~1 world leading paper published every 6 weeks
~1 world-class paper published every 2 weeks
International Collaborations
(MACS)
Broadly speaking, we do ...
High performance computing (languages, numerical analysis) 4M
Mathematical Physics (structure of universe, & other things) 0.5M
Mathematical Biology (spread of diseases, pop dynamics, tumour growth) 0.3M
Financial Mathematics (actuarial, etc.) 0.5M
Stochastic Systems (energy supply/demand, carbon markets,...) 1.2M
Voice/Dialogue Interaction (and – touch, gesture, multi-) 5M
Intelligent Systems (designing, optimising, planning, virtual reality) 3.5M
Pervasive, Ubiquitous & Mobile Computing (World: “Hello”) 3M
Analysis (PDEs, dynamical systems, nonlinearity) 0.4M
And, of course, much much more ...
Enhancing recovery from oil wells (using various methods from optimisation and machine learning)
Modelling energy consumption and social interaction in an eco-village (social networks save energy)
Multiobjective probability collectives
Self-organisation in LTE neworks
Topic and structure discovery in research texts
Forecasting behaviour of large-scale IT systems
DWC
Framework:
problem instance k Algorithm Solution to problem instance k
Two CEC 2011 Plenaries / Two Visions of Optimization
1. A tested vision from Holger Hoos – automated optimizer parameter/configuration tuning; humans are very bad at this – algorithms are good at it
2. A less-tested vision from Nat Krasnogor - algorithms will self-generate to work well in their environment, like a network of veins grows and adapts to irrigate its environment
A less conventional vision
• Holger and Nat were (mainly) talking about designing standard optimizers (that in themselves work in a conventional way), but new ways to design them, exploiting increasingly available processing power
• My ‘vision’ is different:– Step changes in processing power and memory suggest a
different kind of optimizer may be possible– One that exploits memory
Conventional Optimization
RealProblem Instances
Conventional Optimization
RealProblem Instances
Formulateas single-objectiveproblems
Conventional Optimization
RealProblem Instances
Formulateas single-objectiveproblems
Develop slow algorithms
that do well on test
instances
Conventional Optimization
RealProblem Instances
Formulateas single-objectiveproblems
Develop slow algorithms
that do well on test
instances
...with no guarantees on how well the algorithms generalise
Unconventional Optimization
RealProblem Instances
Formulateas single-objectiveproblems
Develop slow algorithms
that do well on test
instances
...with no guarantees on how well the algorithms generalise
Unconventional Optimization
RealProblem Instances
Formulateas single-objectiveproblems
Develop slow algorithms
that do well on test
instances
...with no guarantees on how well the algorithms generalise
Formulate as
multi- objective
Unconventional Optimization
RealProblem Instances
Formulateas single-objectiveproblems
Develop slow algorithms
that do well on test
instances
...with no guarantees on how well the algorithms generalise
Formulate as
multi- objective
Develop fast
algorithms that do
well on spaces of
test instances
Unconventional Optimization
RealProblem Instances
Formulateas single-objectiveproblems
Develop slow algorithms
that do well on test
instances
...with no guarantees on how well the algorithms generalise
Formulate as
multi- objective
Develop fast
algorithms that do
well on spaces of
test instances
with a principled
understanding of
how they
generalize
Unconventional Optimization
RealProblem Instances
Formulateas single-objectiveproblems
Develop slow algorithms
that do well on test
instances
...with no guarantees on how well the algorithms generalise
Formulate as
multi- objective
Develop fast
algorithms that do
well on spaces of
test instances
with a principled
understanding of
how they
generalize
Unconventional Optimization
Develop fast
algorithms that do
well on spaces of
test instances
Hyper-
heuristics
Super-
heuristics Optimization via
Precomputation
Unconventional Optimization
• Hyper-heuristics now ~ 30,000 hits on google scholar
• Much industry uptake
• Rooted in “Fang, Corne, Ross” work in mid 90s
• Basic idea: don’t evolve a solution to a given problem instance, evolve an algorithm that builds a solution to any given instance
• Pursued in two directions:
– Not very interesting: use it as a fancy encoding, just solving one instance at a time
– Very Very interesting, evolve very fast algorithms that give workable solutions for entire subspaces of problems (Ross, Terashima-Marin, Vella, Corne, ...)
Hyper-
heuristics
Super-
heuristics
Hyper-Heuristics 1990-2008
Unconventional Computing 1960-2208
`Most’ problems in industry …
• Need to be solved hourly, daily or weekly• Need to be solved almost instantly (also think
of green computing)
Near-optimality is not very important:
• Data/conditions will change• The problem solved is not the real problem
anyway
motivation for hyper (super)-heuristics
The space of optimisation problem instances (including machine learning)
in industry, science, commerce, etc…
Very fastVery slowHow fast we solve an instance
How good the solution needs to be
Bestpossible
OK
poor/random
We have supplied lots of this, but it is not what is really needed in a lot of
scenarios
We have supplied lots of this, but it is not what is really needed in a lot of
scenarios
The space of optimisation problem instances (including machine learning)
in industry, science, commerce, etc…
Very fastVery slowHow fast we solve an instance
How good the solution needs to be
Bestpossible
OK
poor/random
We have supplied lots of this, but it is not what is really needed in a lot of
scenarios
We have supplied lots of this, but it is not what is really needed in a lot of
scenarios Supplied by sup-heuristics (also OvP)
Supplied by sup-heuristics (also OvP)
motivation for hyper (super)-heuristics
Notes...
1. Extremely fast, but low-power-consumption, algorithms are a
huge challenge for lots of problems. Think of greedy, constructive algorithms, but which achieve good,maybe optimal
solutions to large problems with the first solution they construct.
2. If such existed, they could of course be engineered to be slower, but more optimal …
3. The techniques involved can give us real, new scientific insights into the relationships between problems, heuristics and solutions. This and point 2 are salient for all problems.
Hoos space
Corne space
Standard human designed ingredientsfor standard algorithms (broadly speaking!)-- configured and optimised automatically togive better or faster solutions
Hoos space
Corne space
very fast algorithmsthat give `good enough’solutions.
Generalisation:design of training instancesfor target space of real instances
Standard human designed ingredientsfor standard algorithms (broadly speaking!)-- configured and optimised automatically togive better or faster solutions
Hoos space
Corne space
very fast algorithmsthat give `good enough’solutions.
Generalisation:design of training instancesfor target space of real instances
Slow algorithms, like EAs, SA, ILS, etc … do have their place, and I use them allthe time, but I think the best use for them is as the learning method in SH
Standard human designed ingredientsfor standard algorithms (broadly speaking!)-- configured and optimised automatically togive better or faster solutions
Unconventional Optimization
• Hyper-heuristics hits (n hits g-s)
• Much industry uptake
• Rooted in “Fang, Corne, Ross” work in 90s
• Basic idea: don’t evolve a solution to a given problem instance, evolve an algorithm that builds a solution to any given instance
• Pursued in two directions:
– Not very interesting: use it as a fancy encoding, just solving one instance at a time
– Very Very interesting, evolve very fast algorithms that give workable solutions for entire subspaces of problems (Ross, Terashima-Marin, Vella, Corne, ...)
Hyper-
heuristics
Super-
heuristics
Early days: Evolving Heuristic Choice
Fang, Ross, Corne, “A promising …”, Proc. ECAI, 1994
An example Open-Shop Scheduling problem solved to optimality
Early days: Evolving Heuristic Choice
Fang et al used an encoding like this:
a,b,c,d,… means:
use heuristic a to choose a task from the bth uncompleted job, and schedule it in the earliest place it can go;
use heuristic c to choose …
Examples of heuristics are: LPT (the task with the longest processing time) SPT (the task with the shortest processing time) SG-LPT (consider operations that can fit in a gap – choose the one which has longest processing time etc …
Results were ‘promising’
For a brief time, this was the best recorded result on the 10x10 benchmark
We noticed ...
5, 7, 2, 9, 7, 1, 1, 4, 5, 3
Is a ‘chromosome’ that specifies a solution to our 5-job OSSP
It is also an algorithm, that can be applied to any 5-job OSSP
So, we had evolved fast algorithms – but not particularly good ones: restricted to a fixed number of tasks, and seeming to
ignore instance specifics.
Better choices for fast constructive algorithms
Example of a problem state vector 5 items remaining 3 large 2 small 0 bins more than 50% packed etc ...
Another problem state vector 3 items remaining 1 large 2 small 1 bins more than 50% packed etc ...
Rules forProblem State Heuristic Choice
Problem State For bin packing, maybe:
A vector describing the distribution of sizes of items remaining to pack;
A vector describing the current state of the partly packed bins;
Etc … easy to think of corresponding characterisations for other problems
Sets of State -> Heuristic rules
If state = S1 use heuristic H3
If state = S2 use heuristic H1
If state = S3 use heuristic H7
If state = S4 use heuristic H4
If state = S5 use heuristic H2
Default
either Use best matching state
or Use a default, e.g. H1
This is the basis of a constructive algorithm
A constructive algorithm using<problem state> <heuristic> ruleset
Initialise solution Identify initial problem state S Repeat nitems times determine rule R whose LHS best matches problem state S
{schedule/place/set} next item using heuristic on RHS of R Update state S
About such ‘super-heuristic’ encodings
• Fast, constructive algorithms
• Very, very very fast, compared with something like Evolutionary Algs or simulated annealing, etc …
• We could just use this as a sophisticated encoding for an EA solver that solves one instance at a time, but that’s wasteful and blinkered.
•The idea of SH is to set up an appropriate search or learning method (EA, classifier system, NN, etc) to find a single ruleset that works ideally over a given space of instances
Ross et al evolved algorithms that looked like this
The evolved algorithms are rule-sets. The Interpretation is:
Repeat until no more items {execute rule with closest state match to current problem state}
An example rule-set with 12 rules
Some of Ross et al’s results
Note: three differentSH tested – GA,XCSs and XCSm
Reported for bestResult of each GAor XCS run
E.g. on 5.5% of thetest problems, the H found by GA found a solution with 1 bin fewer than the best found by the four heuristics LFD etc…
5.5
Tereshima-Marin et al,stock-cutting
SH
The power of automated algorithm configuration
Hoos and co-workers use “ParamILS” (a fairly standard iterated local search) to optimise algorithm parameters and decision variables
The power of automated algorithm configuration
Hoos and co-workers use “ParamILS” (a fairly standard iterated local search) to optimise algorithm parameters and decision variables; originally ParamILS was used to improve “SPEAR”, an algorithm for SAT problems.Here tested on real software verification benchmarks:
These are results on
The power of automated algorithm configuration
Hoos and co-workers use “ParamILS” (a fairly standard iterated local search) to optimise algorithm parameters and decision variables; originally ParamILS was used to improve “SPEAR”, an algorithm for SAT problems.Here tested on real software verification benchmarks:
These are results on
The power of automated algorithm configuration
• http://people.cs.ubc.ca/~hoos/Talks/lion-4-tutorial-slides.pdf
Unconventional Optimization
Develop fast
algorithms that do
well on spaces of
test instances
Hyper-
heuristics
Super-
heuristics Optimization via
Precomputation
Computers are faster and faster, storage is cheap
Here is a really simple idea
Computers are faster and faster, storage is cheap
Here is a really simple idea
Client has a daily VRP
Computers are faster and faster, storage is cheap
Here is a really simple idea
Client has a daily VRP
Develop an EA/PSO/SA/etc... to solve instances of the client’s VRP
3 months (say)
Computers are faster and faster, storage is cheap
Here is a really simple idea
Client has a daily VRP
Develop an EA/PSO/SA/etc... to solve instances of the client’s VRP
3 months (say)
Deliver app that takes
~10m to find near-opt
Computers are faster and faster, storage is cheap
Here is a really simple idea
Client has a daily VRP
Develop an EA/PSO/SA/etc... to solve instances of the client’s VRP
3 months (say)
Deliver app that takes
~10m to find near-opt
Computers are faster and faster, storage is cheap
Here is a really simple idea
Client has a daily VRP
Develop an EA/PSO/SA/etc... to solve instances of the client’s VRP
3 months (say)
Deliver app that takes
~10m to find near-opt
Client has a daily VRP
Computers are faster and faster, storage is cheap
Here is a really simple idea
Client has a daily VRP
Develop an EA/PSO/SA/etc... to solve instances of the client’s VRP
3 months (say)
Deliver app that takes
~10m to find near-opt
Client has a daily VRP
Generate and solve every problem the client might face
3 months (say)
Computers are faster and faster, storage is cheap
Here is a really simple idea
Client has a daily VRP
Develop an EA/PSO/SA/etc... to solve instances of the client’s VRP
3 months (say)
Deliver app that takes
~10m to find near-opt
Client has a daily VRP
Generate and solve every problem the client might face
Deliver app that looks up
solution to the instance
presented.
3 months (say)
Why not?
Why not?
Customers: 30
Each day’s problem is selection of 20
30C20 is only ~30,000,000 (almost 0)
We can just solve/store them all
Customers: 50
Each day’s problem is selection of 20
50C20 is ~47,000,000,000,000 (> 0)
not so easy to solve/store all
Customers: 50
Each day’s problem is selection of 20
50C20 is ~47,000,000,000,000 (> 0)
not so easy to solve/store all
but we can solve/store some
The continuum of opportunity
All possible instances
can be solved
10−1 instances
can be solved
...
10−2 instances
can be solved
10−3 instances
can be solved
10−4 instances
can be solved
10−5 instances
can be solved
The continuum of opportunity
All possible instances
can be solved
10−1 instances
can be solved
...
10−2 instances
can be solved
10−3 instances
can be solved
10−4 instances
can be solved
10−5 instances
can be solvedNew instance to solve
Tablelookup
Exploit ‘closest’ solved instances
Suppose a problem instance has m features
Suppose a problem instance has m features
each with k possible values
Suppose a problem instance has m features
each with k possible values
we have a solved instance a
we have a new instance b
Suppose a problem instance has m features
each with k possible values
we have a solved instance a
we have a new instance b
Prob that a and b coincide in c features is
When k=5, chance of a and b sharing2/3 of their features is 3.8×10−8
So if we have 100M pre-solvedinstances, ~3 will share 67%
Nota Bene ...
- instance space may be far smaller than an instance’s search space
- there are often good quality priors over instances
Optimization via Precomputationopportunities and issues
• the distance measure between problem instances– E.g. 10-job Single-machine job shop problem,
instances defined by processing time and due dates:
• Order tasks by proc. time, normalize resulting due date vector – Euclidean distance on that?
• Distance between 12-job and 10-job instance?
Optimization via Precomputationopportunities and issues
• the distance measure between problem instances– E.g. 30-customer vehicle routing problem
• Set overlap between the subsets of customers in the two instances?
• An optimized geographical alignment between the two sets of customers?
Optimization via Precomputationopportunities and issues
• the distance measure between problem
instances• How to exploit the close pre-computed
instances– Seed a population-based search with the solutions
of the closest k , perhaps with mutations ?– Use the closest k to bias neighbourhood
operators? To impose constraints?
Optimization via Precomputationopportunities and issues
• the distance measure between problem instances
• How to exploit the close pre-computed instances
• Mapping the solution of instance a into a candidate solution for instance b– in some cases far from obvious!
Optimization via Precomputationopportunities and issues
• the distance measure between problem instances
• How to exploit the close pre-computed instances
• Mapping the solution of instance a into a candidate solution for instance b
• Fast retrieval of closest instances
CBR
Of course, OvP is similar in nature to Case-Based Reasoning, but completely different in application context and research issues
Potential Applicability
• Where regular instances come from a ‘stable’ with its own characteristic space– E.g. a specific florist in Helsinki, a button
manufacturer in Ningbo province, a commercial TV station in southern USA
• Where there is need to quickly change solutions to adapt to new circumstances
• Wherever else there is a great need or desire to reduce the time to find solutions
the crucial element ...
OvP can only be effective if:
given pre-solved instance a
given new instance n
the solution of a is helpful in solving n
Perturbation distance between instance and seed instance
Single-Machine-tardiness, 50 tasks
To what extent is it helpful to seed a population with the solution to a similar instance?
Vehicle Routing Problem with 50 deliveries
To what extent is it helpful to seed a population with the solution to a similar instance?
SMT
Number of tasks
Suc
cess
of
seed
ing
(out
of
1000
)
VRPS
ucce
ss o
f se
edin
g (o
ut o
f 10
00)
Number of deliveries
Realistic VRPs
@Marta_Vallejo’s work
Full-featured VRP; several vehicles, 1,000 customers in 25km2 region
instance-distance metric and
solution solution mapping
Mediated by geographical customer alignment
Good news: it’s over• Industry (etc…) usually needs: very fast solvers that produce fairly
good solutions
• Humans are very, very, very bad at designing and configuring such solvers, but they can often propose good ingredients.
• Directions:– Automated algorithm configuration –very successful, but not necessarily oriented
towards fast solvers– Hyper-heuristics / super-heuristics – very successful– Optimisation via Precomputation – promising ...
• Theoretical issues highly under-explored in each case– E.g. Generalisation performance of solvers (I am looking at computational
learning theory for this)– Instance-distance/Exploitation landscapes around an optimum