unconventional optimization david corne, school of mathematical & computer sciences (macs)...

Unconventional Optimization

David Corne, School of Mathematical & Computer Sciences

(MACS)Heriot-Watt University, Edinburgh, UK

• About Heriot-Watt University - born: 1821

- address: Edinburgh, UK

- very focussed on science and technology

- photo, from close up:

MACS by numbers~50 Mathematicians (academics)

~25 Computer scientists (academics)

~50 current funded projects (~£18M)

~30 current industry collaborators

~2 Mathematics Depts

~1 world leading paper published every 6 weeks

~1 world-class paper published every 2 weeks

International Collaborations

(MACS)

Broadly speaking, we do ...

High performance computing (languages, numerical analysis) 4M

Mathematical Physics (structure of universe, & other things) 0.5M

Mathematical Biology (spread of diseases, pop dynamics, tumour growth) 0.3M

Financial Mathematics (actuarial, etc.) 0.5M

Stochastic Systems (energy supply/demand, carbon markets,...) 1.2M

Voice/Dialogue Interaction (and – touch, gesture, multi-) 5M

Intelligent Systems (designing, optimising, planning, virtual reality) 3.5M

Pervasive, Ubiquitous & Mobile Computing (World: “Hello”) 3M

Analysis (PDEs, dynamical systems, nonlinearity) 0.4M

And, of course, much much more ...

Enhancing recovery from oil wells (using various methods from optimisation and machine learning)

Modelling energy consumption and social interaction in an eco-village (social networks save energy)

Multiobjective probability collectives

Self-organisation in LTE neworks

Topic and structure discovery in research texts

Forecasting behaviour of large-scale IT systems

DWC

Framework:

problem instance k Algorithm Solution to problem instance k

Two CEC 2011 Plenaries / Two Visions of Optimization

1. A tested vision from Holger Hoos – automated optimizer parameter/configuration tuning; humans are very bad at this – algorithms are good at it

2. A less-tested vision from Nat Krasnogor - algorithms will self-generate to work well in their environment, like a network of veins grows and adapts to irrigate its environment

A less conventional vision

• Holger and Nat were (mainly) talking about designing standard optimizers (that in themselves work in a conventional way), but new ways to design them, exploiting increasingly available processing power

• My ‘vision’ is different:– Step changes in processing power and memory suggest a

different kind of optimizer may be possible– One that exploits memory

Conventional Optimization

RealProblem Instances



Formulateas single-objectiveproblems




Develop slow algorithms

that do well on test

instances






instances

...with no guarantees on how well the algorithms generalise






instances







instances


Formulate as

multiobjective






instances


Formulate as

multiobjective

Develop fast

algorithms that do

well on spaces of

test instances






instances


Formulate as

multiobjective

Develop fast

algorithms that do

well on spaces of

test instances

with a principled

understanding of

how they

generalize


Develop fast

algorithms that do

well on spaces of

test instances

Hyper-

heuristics

Super-

heuristics Optimization via

Precomputation


• Hyper-heuristics now ~ 30,000 hits on google scholar

• Much industry uptake

• Rooted in “Fang, Corne, Ross” work in mid 90s

• Basic idea: don’t evolve a solution to a given problem instance, evolve an algorithm that builds a solution to any given instance

• Pursued in two directions:

– Not very interesting: use it as a fancy encoding, just solving one instance at a time

– Very Very interesting, evolve very fast algorithms that give workable solutions for entire subspaces of problems (Ross, Terashima-Marin, Vella, Corne, ...)

Hyper-

heuristics

Super-

heuristics

Hyper-Heuristics 1990-2008

Unconventional Computing 1960-2208

`Most’ problems in industry …

• Need to be solved hourly, daily or weekly• Need to be solved almost instantly (also think

of green computing)

Near-optimality is not very important:

• Data/conditions will change• The problem solved is not the real problem

anyway

motivation for hyper (super)-heuristics

The space of optimisation problem instances (including machine learning)

in industry, science, commerce, etc…

Very fastVery slowHow fast we solve an instance

How good the solution needs to be

Bestpossible

OK

poor/random

We have supplied lots of this, but it is not what is really needed in a lot of

scenarios


scenarios

The space of optimisation problem instances (including machine learning)

in industry, science, commerce, etc…

Very fastVery slowHow fast we solve an instance

How good the solution needs to be

Bestpossible

OK

poor/random


scenarios


scenarios Supplied by sup-heuristics (also OvP)

Supplied by sup-heuristics (also OvP)

motivation for hyper (super)-heuristics

Notes...

1. Extremely fast, but low-power-consumption, algorithms are a

huge challenge for lots of problems. Think of greedy, constructive algorithms, but which achieve good,maybe optimal

solutions to large problems with the first solution they construct.

2. If such existed, they could of course be engineered to be slower, but more optimal …

3. The techniques involved can give us real, new scientific insights into the relationships between problems, heuristics and solutions. This and point 2 are salient for all problems.

Hoos space

Corne space

Standard human designed ingredientsfor standard algorithms (broadly speaking!)-- configured and optimised automatically togive better or faster solutions

Hoos space

Corne space

very fast algorithmsthat give `good enough’solutions.

Generalisation:design of training instancesfor target space of real instances


Hoos space

Corne space

very fast algorithmsthat give `good enough’solutions.

Generalisation:design of training instancesfor target space of real instances

Slow algorithms, like EAs, SA, ILS, etc … do have their place, and I use them allthe time, but I think the best use for them is as the learning method in SH



• Hyper-heuristics hits (n hits g-s)

• Much industry uptake

• Rooted in “Fang, Corne, Ross” work in 90s

• Basic idea: don’t evolve a solution to a given problem instance, evolve an algorithm that builds a solution to any given instance

• Pursued in two directions:

– Not very interesting: use it as a fancy encoding, just solving one instance at a time

– Very Very interesting, evolve very fast algorithms that give workable solutions for entire subspaces of problems (Ross, Terashima-Marin, Vella, Corne, ...)

Hyper-

heuristics

Super-

heuristics

Early days: Evolving Heuristic Choice

Fang, Ross, Corne, “A promising …”, Proc. ECAI, 1994

An example Open-Shop Scheduling problem solved to optimality

Early days: Evolving Heuristic Choice

Fang et al used an encoding like this:

a,b,c,d,… means:

use heuristic a to choose a task from the bth uncompleted job, and schedule it in the earliest place it can go;

use heuristic c to choose …

Examples of heuristics are: LPT (the task with the longest processing time) SPT (the task with the shortest processing time) SG-LPT (consider operations that can fit in a gap – choose the one which has longest processing time etc …

Results were ‘promising’

For a brief time, this was the best recorded result on the 10x10 benchmark

We noticed ...

5, 7, 2, 9, 7, 1, 1, 4, 5, 3

Is a ‘chromosome’ that specifies a solution to our 5-job OSSP

It is also an algorithm, that can be applied to any 5-job OSSP

So, we had evolved fast algorithms – but not particularly good ones: restricted to a fixed number of tasks, and seeming to

ignore instance specifics.

Better choices for fast constructive algorithms

Example of a problem state vector 5 items remaining 3 large 2 small 0 bins more than 50% packed etc ...

Another problem state vector 3 items remaining 1 large 2 small 1 bins more than 50% packed etc ...

Rules forProblem State Heuristic Choice

Problem State For bin packing, maybe:

A vector describing the distribution of sizes of items remaining to pack;

A vector describing the current state of the partly packed bins;

Etc … easy to think of corresponding characterisations for other problems

Sets of State -> Heuristic rules

If state = S1 use heuristic H3





Default

either Use best matching state

or Use a default, e.g. H1

This is the basis of a constructive algorithm

A constructive algorithm using<problem state> <heuristic> ruleset

Initialise solution Identify initial problem state S Repeat nitems times determine rule R whose LHS best matches problem state S

{schedule/place/set} next item using heuristic on RHS of R Update state S

About such ‘super-heuristic’ encodings

• Fast, constructive algorithms

• Very, very very fast, compared with something like Evolutionary Algs or simulated annealing, etc …

• We could just use this as a sophisticated encoding for an EA solver that solves one instance at a time, but that’s wasteful and blinkered.

•The idea of SH is to set up an appropriate search or learning method (EA, classifier system, NN, etc) to find a single ruleset that works ideally over a given space of instances

Ross et al evolved algorithms that looked like this

The evolved algorithms are rule-sets. The Interpretation is:

Repeat until no more items {execute rule with closest state match to current problem state}

An example rule-set with 12 rules

Some of Ross et al’s results

Note: three differentSH tested – GA,XCSs and XCSm

Reported for bestResult of each GAor XCS run

E.g. on 5.5% of thetest problems, the H found by GA found a solution with 1 bin fewer than the best found by the four heuristics LFD etc…

5.5

Tereshima-Marin et al,stock-cutting

The power of automated algorithm configuration

Hoos and co-workers use “ParamILS” (a fairly standard iterated local search) to optimise algorithm parameters and decision variables


Hoos and co-workers use “ParamILS” (a fairly standard iterated local search) to optimise algorithm parameters and decision variables; originally ParamILS was used to improve “SPEAR”, an algorithm for SAT problems.Here tested on real software verification benchmarks:

These are results on


• http://people.cs.ubc.ca/~hoos/Talks/lion-4-tutorial-slides.pdf

http://people.cs.ubc.ca/~hoos/Talks/lion-4-tutorial-slides.pdf


Develop fast

algorithms that do

well on spaces of

test instances

Hyper-

heuristics

Super-

heuristics Optimization via

Precomputation

Computers are faster and faster, storage is cheap

Here is a really simple idea



Client has a daily VRP




Develop an EA/PSO/SA/etc... to solve instances of the client’s VRP

3 months (say)





3 months (say)

Deliver app that takes

~10m to find near-opt





3 months (say)








3 months (say)




Generate and solve every problem the client might face

3 months (say)





3 months (say)




Generate and solve every problem the client might face

Deliver app that looks up

solution to the instance

presented.

3 months (say)

Why not?

Customers: 30

Each day’s problem is selection of 20

30C20 is only ~30,000,000 (almost 0)

We can just solve/store them all

Customers: 50


50C20 is ~47,000,000,000,000 (> 0)

not so easy to solve/store all

Customers: 50


50C20 is ~47,000,000,000,000 (> 0)

not so easy to solve/store all

but we can solve/store some

The continuum of opportunity

All possible instances

can be solved

10−1 instances

can be solved

...

10−2 instances

can be solved

10−3 instances

can be solved

10−4 instances

can be solved

10−5 instances

can be solved

The continuum of opportunity

All possible instances

can be solved

10−1 instances

can be solved

...

10−2 instances

can be solved

10−3 instances

can be solved

10−4 instances

can be solved

10−5 instances

can be solvedNew instance to solve

Tablelookup

Exploit ‘closest’ solved instances

Suppose a problem instance has m features


each with k possible values



we have a solved instance a

we have a new instance b



we have a solved instance a

we have a new instance b

Prob that a and b coincide in c features is

When k=5, chance of a and b sharing2/3 of their features is 3.8×10−8

So if we have 100M pre-solvedinstances, ~3 will share 67%

Nota Bene ...

- instance space may be far smaller than an instance’s search space

- there are often good quality priors over instances

Optimization via Precomputationopportunities and issues

• the distance measure between problem instances– E.g. 10-job Single-machine job shop problem,

instances defined by processing time and due dates:

• Order tasks by proc. time, normalize resulting due date vector – Euclidean distance on that?

• Distance between 12-job and 10-job instance?


• the distance measure between problem instances– E.g. 30-customer vehicle routing problem

• Set overlap between the subsets of customers in the two instances?

• An optimized geographical alignment between the two sets of customers?


• the distance measure between problem

instances• How to exploit the close pre-computed

instances– Seed a population-based search with the solutions

of the closest k , perhaps with mutations ?– Use the closest k to bias neighbourhood

operators? To impose constraints?


• the distance measure between problem instances

• How to exploit the close pre-computed instances

• Mapping the solution of instance a into a candidate solution for instance b– in some cases far from obvious!


• the distance measure between problem instances

• How to exploit the close pre-computed instances

• Mapping the solution of instance a into a candidate solution for instance b

• Fast retrieval of closest instances

CBR

Of course, OvP is similar in nature to Case-Based Reasoning, but completely different in application context and research issues

Potential Applicability

• Where regular instances come from a ‘stable’ with its own characteristic space– E.g. a specific florist in Helsinki, a button

manufacturer in Ningbo province, a commercial TV station in southern USA

• Where there is need to quickly change solutions to adapt to new circumstances

• Wherever else there is a great need or desire to reduce the time to find solutions

the crucial element ...

OvP can only be effective if:

given pre-solved instance a

given new instance n

the solution of a is helpful in solving n

Perturbation distance between instance and seed instance

Single-Machine-tardiness, 50 tasks

To what extent is it helpful to seed a population with the solution to a similar instance?

Vehicle Routing Problem with 50 deliveries

To what extent is it helpful to seed a population with the solution to a similar instance?

SMT

Number of tasks

Suc

cess

of

seed

ing

(out

of

1000

)

VRPS

ucce

ss o

f se

edin

g (o

ut o

f 10

00)

Number of deliveries

Realistic VRPs

@Marta_Vallejo’s work

Full-featured VRP; several vehicles, 1,000 customers in 25km2 region

instance-distance metric and

solution solution mapping

Mediated by geographical customer alignment

Good news: it’s over• Industry (etc…) usually needs: very fast solvers that produce fairly

good solutions

• Humans are very, very, very bad at designing and configuring such solvers, but they can often propose good ingredients.

• Directions:– Automated algorithm configuration –very successful, but not necessarily oriented

towards fast solvers– Hyper-heuristics / super-heuristics – very successful– Optimisation via Precomputation – promising ...

• Theoretical issues highly under-explored in each case– E.g. Generalisation performance of solvers (I am looking at computational

learning theory for this)– Instance-distance/Exploitation landscapes around an optimum

unconventional optimization david corne, school of mathematical & computer sciences (macs)...

Documents

nat krasnogor algorithms

tested vision

world leading paper

social interaction

mintelligent systems

dynamical systems

computer scientists

different kind of optimizer