transfer learning, soft distance-based bias, and the hierarchical boa

1
Transfer learning, so. distancebased bias, and the hierarchical BOA Transfer Learning, So. DistanceBased Bias, and the Hierarchical BOA Martin Pelikan Missouri Estimation of Distribution Algorithms Laboratory (MEDAL) University of Missouri, St. Louis, MO E-mail: [email protected] WWW: http://martinpelikan.net/ Mark W. Hauschild Missouri Estimation of Distribution Algorithms Laboratory (MEDAL) University of Missouri, St. Louis, MO E-mail: [email protected] Pier Luca Lanzi Dipartimento di Elettronica e Informazione Politecnico di Milano Milano, Italy E-mail: [email protected] WWW: http://www.pierlucalanzi.net/ Background Modeldirected op-mizers (MDOs), such as es#ma#on of distribu#on algorithms, learn and use models to solve difficult op-miza-on problems scalably and reliably. MDOs o?en provide more than the solu-on; they provide a set of models that reveal informa-on about the problem. Why not use that informa#on in future runs? Purpose Combine prior models with a problemspecific distance metric to solve new problem instances with increased speed, accuracy, reliability. Focus on hBOA algorithm and addi-vely decomposable func-ons, although the approach can be generalized to other MDOs and other problem classes. Extend previous work to mainly demonstrate that Previous MDO runs on smaller problems can be used to bias runs on larger problems. Previous MDO runs for one problem class can be used to bias runs for another problem class. Hierarchical Bayesian opAmizaAon algorithm, hBOA Models allow hBOA to learn and use problem structure. To build models, hBOA uses Bayesian metrics that combine the current popula-on and prior knowledge (via prior distribu-on of structures and parameters). Current popula-on Selected popula-on New popula-on Bayesian network Basic idea of the approach Outline of the approach 1. Define distances between problem variables. 2. Mine probabilis-c models from previous runs for model regulari-es with respect to distances. 3. Use obtained informa-on to bias model building when solving new problem instances. Apply approach to hBOA and addi-vely decomposable func-ons (ADFs) Distance metric developed for ADFs, but other metrics can be used. Bias incorporated by modifying prior probabili-es of network structures when building the models. Strength of bias can be controlled using a parameter κ>0. Distance metric for ADFs ADF Create a graph for ADF with one node per variable by connec-ng variables that are in the same subproblem. Number of edges along shortest path between two nodes defines their distance; for disconnected variables the distance is equal to the number of variables. Can use other distance metrics (e.g. QAP and scheduling). {S i } are subsets of variables. {f i } are arbitrary func-ons. Selected results Problem classes: Nearestneighbor NK landscapes. Spin glasses (2D and 3D). MAXSAT for transformed graph coloring. Minimum vertex cover for random graphs. Use bias from smaller problems on bigger problems (compare to the bias from problems of the same size) NK landscapes MVC Spin glass (2D) Use bias from another problem class NK landscapes MVC (easier) MVC (harder) Summary of results (many results omi_ed here) Significant speedups obtained in all test cases. Bias based on runs on smaller problems and other problem classes yields significant speedups as well. This proves flexibility, prac-cality and poten-al of the approach. Approach can be adapted to other MDOs (LTGA, ECGA, DSMGA, …). 0.5 1 1.5 2 2.5 3 3.5 4 1 2 3 4 5 6 7 8 9 10 Multiplicative speedup w.r.t CPU time Kappa (strength of bias) base case (no speedup) n=200, bias from n=150 n=200, bias from n=200 0.5 1 1.5 2 2.5 3 3.5 1 2 3 4 5 6 7 8 9 10 Multiplicative speedup w.r.t CPU time Kappa (strength of bias) base case (no speedup) n=200, bias from n=150 n=200, bias from n=200 0.6 0.8 1 1.2 1.4 1.6 1.8 2 1 2 3 4 5 6 7 8 9 10 Multiplicative speedup w.r.t CPU time Kappa (strength of bias) base case (no speedup) n=400, bias from n=324 n=400, bias from n=400 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 1 2 3 4 5 6 7 8 9 10 Multiplicative speedup w.r.t CPU time Kappa (strength of bias) base case (no speedup) Models from NK Models from MVC, c=2 Models from MVC, c=4 0.5 1 1.5 2 2.5 3 3.5 4 1 2 3 4 5 6 7 8 9 10 Multiplicative speedup w.r.t CPU time Kappa (strength of bias) base case (no speedup) Models from NK Models from MVC, c=2.0 Models from MVC, c=4.0 0 0.5 1 1.5 2 2.5 3 1 2 3 4 5 6 7 8 9 10 Multiplicative speedup w.r.t CPU time Kappa (strength of bias) base case (no speedup) Models from NK Models from MVC, c=2.0 Models from MVC, c=4.0 More informaAon Download MEDAL Reports No. 2012004 and 2012007: h_p://medallab.org/files/2012004.pdf h_p://medallab.org/files/2012007.pdf Acknowledgments NSF grants ECS0547013 and IIS1115352. ITS at the University of Missouri in St. Louis. University of Missouri Bioinforma-cs Consor-um. Any opinions, findings, and conclusions or recommenda-ons expressed in this material are those of the authors and do not necessarily reflect the views of the Na-onal Science Founda-on. http://medal-lab.org f (X 1 ,...,X n )= m i=1 f i (S i ),

Upload: martin-pelikan

Post on 17-May-2015

320 views

Category:

Technology


3 download

DESCRIPTION

An automated technique has recently been proposed to transfer learning in the hierarchical Bayesian optimization algorithm (hBOA) based on distance-based statistics. The technique enables practitioners to improve hBOA efficiency by collecting statistics from probabilistic models obtained in previous hBOA runs and using the obtained statistics to bias future hBOA runs on similar problems. The purpose of this paper is threefold: (1) test the technique on several classes of NP-complete problems, including MAXSAT, spin glasses and minimum vertex cover; (2) demonstrate that the technique is effective even when previous runs were done on problems of different size; (3) provide empirical evidence that combining transfer learning with other efficiency enhancement techniques can often yield nearly multiplicative speedups.

TRANSCRIPT

Page 1: Transfer Learning, Soft Distance-Based Bias, and the Hierarchical BOA

Transfer  learning,  so.  distance-­‐based  bias,  and  the  hierarchical  BOA  Transfer  Learning,  So.  Distance-­‐Based  Bias,  and  the  Hierarchical  BOA  

Martin Pelikan Missouri Estimation of Distribution Algorithms

Laboratory (MEDAL) University of Missouri, St. Louis, MO

E-mail: [email protected] WWW: http://martinpelikan.net/

Mark W. Hauschild Missouri Estimation of Distribution Algorithms

Laboratory (MEDAL) University of Missouri, St. Louis, MO

E-mail: [email protected]

Pier Luca Lanzi Dipartimento di Elettronica e Informazione

Politecnico di Milano Milano, Italy

E-mail: [email protected] WWW: http://www.pierlucalanzi.net/

Background  •  Model-­‐directed  op-mizers  (MDOs),  such  as  es#ma#on  of  

distribu#on  algorithms,  learn  and  use  models  to  solve  difficult  op-miza-on  problems  scalably  and  reliably.  

•  MDOs  o?en  provide  more  than  the  solu-on;  they  provide  a  set  of  models  that  reveal  informa-on  about  the  problem.  Why  not  use  that  informa#on  in  future  runs?  

Purpose  •  Combine  prior  models  with  a  problem-­‐specific  distance  

metric  to  solve  new  problem  instances  with  increased  speed,  accuracy,  reliability.  

•  Focus  on  hBOA  algorithm  and  addi-vely  decomposable  func-ons,  although  the  approach  can  be  generalized  to  other  MDOs  and  other  problem  classes.  

•  Extend  previous  work  to  mainly  demonstrate  that  •  Previous  MDO  runs  on  smaller  problems  can  be  used  

to  bias  runs  on  larger  problems.  •  Previous  MDO  runs  for  one  problem  class  can  be  used  

to  bias  runs  for  another  problem  class.  

Hierarchical  Bayesian  opAmizaAon  algorithm,  hBOA                

•  Models  allow  hBOA  to  learn  and  use  problem  structure.  •  To  build  models,  hBOA  uses  Bayesian  metrics  that  

combine  the  current  popula-on  and  prior  knowledge  (via  prior  distribu-on  of  structures  and  parameters).  

Current  popula-on  

Selected  popula-on  

New  popula-on  

Bayesian  network  

Basic  idea  of  the  approach  •  Outline  of  the  approach  

1.  Define  distances  between  problem  variables.  2.  Mine  probabilis-c  models  from  previous  runs  for  

model  regulari-es  with  respect  to  distances.  3.  Use  obtained  informa-on  to  bias  model  building  

when  solving  new  problem  instances.  •  Apply  approach  to  hBOA  and  addi-vely  decomposable  

func-ons  (ADFs)  •  Distance  metric  developed  for  ADFs,  but  other  

metrics  can  be  used.  •  Bias  incorporated  by  modifying  prior  probabili-es  of  

network  structures  when  building  the  models.  •  Strength  of  bias  can  be  controlled  using  a  

parameter  κ>0.  

Distance  metric  for  ADFs  •  ADF  

•  Create  a  graph  for  ADF  with  one  node  per  variable  by  connec-ng  variables  that  are  in  the  same  subproblem.  

•  Number  of  edges  along  shortest  path  between  two  nodes  defines  their  distance;  for  disconnected  variables  the  distance  is  equal  to  the  number  of  variables.  

•  Can  use  other  distance  metrics  (e.g.  QAP  and  scheduling).  

{Si}  are  subsets  of  variables.  {fi}  are  arbitrary  func-ons.  

Selected  results  •  Problem  classes:  

•  Nearest-­‐neighbor  NK  landscapes.  •  Spin  glasses  (2D  and  3D).  •  MAXSAT  for  transformed  graph  coloring.  •  Minimum  vertex  cover  for  random  graphs.  

•  Use  bias  from  smaller  problems  on  bigger  problems  (compare  to  the  bias  from  problems  of  the  same  size)  

             NK  landscapes                                          MVC                                        Spin  glass  (2D)  

•  Use  bias  from  another  problem  class                NK  landscapes                          MVC  (easier)                          MVC  (harder)  

 •  Summary  of  results  (many  results  omi_ed  here)  

•  Significant  speedups  obtained  in  all  test  cases.  •  Bias  based  on  runs  on  smaller  problems  and  other  

problem  classes  yields  significant  speedups  as  well.  •  This  proves  flexibility,  prac-cality  and  poten-al  of  the  

approach.  •  Approach  can  be  adapted  to  other  MDOs  (LTGA,  

ECGA,  DSMGA,  …).    

0.2 0.4

0.6 0.8 1

1.2 1.4

1.6 1.8 2

2.2 2.4

2.6 2.8 3

1 2 3 4 5 6 7 8 9 10

Mul

tiplic

ativ

e sp

eedu

p w

.r.t

CPU

tim

e

Kappa (strength of bias)

base case (no speedup)

NK, n=50, k=5NK, n=60, k=5NK, n=70, k=5

0 10 20 30 40 50

60 70 80 90

100

1 2 3 4 5 6 7 8 9 10

Perc

enta

ge o

f ins

tanc

esw

ith im

prov

ed e

xecu

tion

time

Kappa (strength of bias)

NK, n=50, k=5NK, n=60, k=5NK, n=70, k=5

0.2 0.4

0.6 0.8 1

1.2 1.4

1.6 1.8 2

2.2 2.4

2.6 2.8 3

50 55 60 65 70

Aver

age

CPU

spe

edup

(mul

tiplic

ativ

e)

Problem size (number of bits, n)

base case (no speedup)

Kappa=10Kappa=8

Kappa=6

Kappa=4Kappa=2

(a) NK landscapes with nearest neighbors.

0 0.2 0.4

0.6 0.8 1

1.2 1.4

1.6 1.8 2

2.2 2.4

2.6

1 2 3 4 5 6 7 8 9 10

Mul

tiplic

ativ

e sp

eedu

p w

.r.t

CPU

tim

e

Kappa (strength of bias)

base case (no speedup)

SG 2D, n=144 (12x12)SG 2D, n=100 (10x10)

SG 2D, n=64 (8x8)

0 10 20 30 40 50

60 70 80 90

100

1 2 3 4 5 6 7 8 9 10

Perc

enta

ge o

f ins

tanc

esw

ith im

prov

ed e

xecu

tion

time

Kappa (strength of bias)

SG 2D, n=144 (12x12)SG 2D, n=100 (10x10)

SG 2D, n=64 (8x8)

0.2 0.4

0.6 0.8 1

1.2 1.4

1.6 1.8 2

2.2 2.4

2.6

64 100 144

Aver

age

CPU

spe

edup

(mul

tiplic

ativ

e)

Problem size (number of bits, n)

base case(no speedup)

Kappa=10Kappa=8

Kappa=6

Kappa=4Kappa=2

(b) 2D ±J Ising spin glass

Figure 9: Speedups obtained on NK landscapes and 2D spin glasses without using local search.

0.5

1

1.5

2

2.5

3

3.5

4

1 2 3 4 5 6 7 8 9 10

Mul

tiplic

ativ

e sp

eedu

p w

.r.t

CPU

tim

e

Kappa (strength of bias)

base case (no speedup)

n=200, bias from n=150n=200, bias from n=200

(a) NK landscapes with nearest neighbors,n = 200, k = 5.

0.5

1

1.5

2

2.5

3

3.5

1 2 3 4 5 6 7 8 9 10

Mul

tiplic

ativ

e sp

eedu

p w

.r.t

CPU

tim

e

Kappa (strength of bias)

base case (no speedup)

n=200, bias from n=150n=200, bias from n=200

(b) Minimum vertex cover, n = 200, c = 2.

0 0.2 0.4

0.6 0.8 1

1.2 1.4

1.6 1.8 2

2.2 2.4

2.6 2.8

1 2 3 4 5 6 7 8 9 10

Mul

tiplic

ativ

e sp

eedu

p w

.r.t

CPU

tim

e

Kappa (strength of bias)

base case(no speedup)

n=200, bias from n=150n=200, bias from n=200

(c) Minimum vertex cover, n = 200, c = 4.

0.6 0.8

1 1.2 1.4

1.6 1.8

2

1 2 3 4 5 6 7 8 9 10

Mul

tiplic

ativ

e sp

eedu

p w

.r.t

CPU

tim

e

Kappa (strength of bias)

base case (no speedup)

n=400, bias from n=324n=400, bias from n=400

(d) 2D ±J Ising spin glass, n = 20 ! 20 =

400

0.2 0.4

0.6 0.8 1

1.2 1.4

1.6 1.8

1 2 3 4 5 6 7 8 9 10

Mul

tiplic

ativ

e sp

eedu

p w

.r.t

CPU

tim

e

Kappa (strength of bias)

base case (no speedup)

n=343, bias from n=216n=343, bias from n=343

(e) 3D ±J Ising spin glass, n = 7! 7! 7 =

343

Figure 10: Speedups obtained on all test problems except for MAXSAT with bias based on models obtainedfrom problems of smaller size, compared to the base case with bias based on the same problem size.

17

0.2 0.4

0.6 0.8 1

1.2 1.4

1.6 1.8 2

2.2 2.4

2.6 2.8 3

1 2 3 4 5 6 7 8 9 10

Mul

tiplic

ativ

e sp

eedu

p w

.r.t

CPU

tim

e

Kappa (strength of bias)

base case (no speedup)

NK, n=50, k=5NK, n=60, k=5NK, n=70, k=5

0 10 20 30 40 50

60 70 80 90

100

1 2 3 4 5 6 7 8 9 10

Perc

enta

ge o

f ins

tanc

esw

ith im

prov

ed e

xecu

tion

time

Kappa (strength of bias)

NK, n=50, k=5NK, n=60, k=5NK, n=70, k=5

0.2 0.4

0.6 0.8 1

1.2 1.4

1.6 1.8 2

2.2 2.4

2.6 2.8 3

50 55 60 65 70

Aver

age

CPU

spe

edup

(mul

tiplic

ativ

e)

Problem size (number of bits, n)

base case (no speedup)

Kappa=10Kappa=8

Kappa=6

Kappa=4Kappa=2

(a) NK landscapes with nearest neighbors.

0 0.2 0.4

0.6 0.8 1

1.2 1.4

1.6 1.8 2

2.2 2.4

2.6

1 2 3 4 5 6 7 8 9 10

Mul

tiplic

ativ

e sp

eedu

p w

.r.t

CPU

tim

e

Kappa (strength of bias)

base case (no speedup)

SG 2D, n=144 (12x12)SG 2D, n=100 (10x10)

SG 2D, n=64 (8x8)

0 10 20 30 40 50

60 70 80 90

100

1 2 3 4 5 6 7 8 9 10

Perc

enta

ge o

f ins

tanc

esw

ith im

prov

ed e

xecu

tion

time

Kappa (strength of bias)

SG 2D, n=144 (12x12)SG 2D, n=100 (10x10)

SG 2D, n=64 (8x8)

0.2 0.4

0.6 0.8 1

1.2 1.4

1.6 1.8 2

2.2 2.4

2.6

64 100 144

Aver

age

CPU

spe

edup

(mul

tiplic

ativ

e)

Problem size (number of bits, n)

base case(no speedup)

Kappa=10Kappa=8

Kappa=6

Kappa=4Kappa=2

(b) 2D ±J Ising spin glass

Figure 9: Speedups obtained on NK landscapes and 2D spin glasses without using local search.

0.5

1

1.5

2

2.5

3

3.5

4

1 2 3 4 5 6 7 8 9 10

Mul

tiplic

ativ

e sp

eedu

p w

.r.t

CPU

tim

e

Kappa (strength of bias)

base case (no speedup)

n=200, bias from n=150n=200, bias from n=200

(a) NK landscapes with nearest neighbors,n = 200, k = 5.

0.5

1

1.5

2

2.5

3

3.5

1 2 3 4 5 6 7 8 9 10

Mul

tiplic

ativ

e sp

eedu

p w

.r.t

CPU

tim

e

Kappa (strength of bias)

base case (no speedup)

n=200, bias from n=150n=200, bias from n=200

(b) Minimum vertex cover, n = 200, c = 2.

0 0.2 0.4

0.6 0.8 1

1.2 1.4

1.6 1.8 2

2.2 2.4

2.6 2.8

1 2 3 4 5 6 7 8 9 10

Mul

tiplic

ativ

e sp

eedu

p w

.r.t

CPU

tim

e

Kappa (strength of bias)

base case(no speedup)

n=200, bias from n=150n=200, bias from n=200

(c) Minimum vertex cover, n = 200, c = 4.

0.6 0.8 1

1.2 1.4

1.6 1.8 2

1 2 3 4 5 6 7 8 9 10

Mul

tiplic

ativ

e sp

eedu

p w

.r.t

CPU

tim

e

Kappa (strength of bias)

base case (no speedup)

n=400, bias from n=324n=400, bias from n=400

(d) 2D ±J Ising spin glass, n = 20 ! 20 =

400

0.2 0.4

0.6 0.8 1

1.2 1.4

1.6 1.8

1 2 3 4 5 6 7 8 9 10

Mul

tiplic

ativ

e sp

eedu

p w

.r.t

CPU

tim

e

Kappa (strength of bias)

base case (no speedup)

n=343, bias from n=216n=343, bias from n=343

(e) 3D ±J Ising spin glass, n = 7! 7! 7 =

343

Figure 10: Speedups obtained on all test problems except for MAXSAT with bias based on models obtainedfrom problems of smaller size, compared to the base case with bias based on the same problem size.

17

0.4 0.6 0.8 1

1.2 1.4

1.6 1.8 2

2.2 2.4

2.6 2.8 3

1 2 3 4 5 6 7 8 9 10

Mul

tiplic

ativ

e sp

eedu

p w

.r.t

CPU

tim

e

Kappa (strength of bias)

base case (no speedup)

Models from NKModels from MVC, c=2Models from MVC, c=4

(a) NK landscapes with nearest neighbors,n = 200, k = 5.

0.5 1

1.5 2

2.5 3

3.5 4

1 2 3 4 5 6 7 8 9 10

Mul

tiplic

ativ

e sp

eedu

p w

.r.t

CPU

tim

e

Kappa (strength of bias)

base case (no speedup)

Models from NKModels from MVC, c=2.0Models from MVC, c=4.0

(b) Minimum vertex cover, n = 200, c = 2.

0

0.5

1

1.5

2

2.5

3

1 2 3 4 5 6 7 8 9 10

Mul

tiplic

ativ

e sp

eedu

p w

.r.t

CPU

tim

e

Kappa (strength of bias)

base case(no speedup)

Models from NKModels from MVC, c=2.0Models from MVC, c=4.0

(c) Minimum vertex cover, n = 200, c = 4.

Figure 11: Speedups obtained on NK landscapes and minimum vertex cover with the bias from modelson another problem class compared to the base case (the base case corresponds to 10-fold crossvalidationusing the models obtained on the problems from the same class).

of experiments; the results of these experiments can be found in fig. 11. Specifically, the figure presents theresults of using bias from the NK landscapes while solving the minimum vertex cover instances for the twosets of graphs (with c = 2 and c = 4), the results of using bias from the minimum vertex cover instancesfor both the classes of graphs when solving the NK landscapes, and also the results of using bias from theminimum vertex cover on another class of graphs (with different c). It can be seen that significant speedupswere obtained even when using models from another problem class, although in some cases the speedupsobtained when using the same problem class were significantly better. Some combinations of problemclasses discussed in this paper were omitted because of the high computational demands of testing everypair of problems against each other. Despite that, the results presented here indicate that even the biasbased on problems from a different class can be beneficial.

5.3.5 Combining Distance-Based Bias and Sporadic Model Building

While most efficiency enhancement techniques provide significant speedups on their own right (Hauschildand Pelikan, 2008; Hauschild et al., 2012; Lozano et al., 2001; Mendiburu et al., 2007; Ocenasek, 2002; Pelikanet al., 2008; Sastry et al., 2006), their combinations are usually significantly more powerful because thespeedups often multiply. For example, with the speedup of 4 due to sporadic model building (Pelikanet al., 2008) and the speedup of 20 due to parallelization using 20 computational nodes (Lozano et al.,2001; Ocenasek and Schwarz, 2000), combining the two techniques can be expected to yield speedups ofabout 80; obtaining comparable speedups using parallelization on its own would require at least 4 times asmany computational nodes. How will the distance-based bias from prior hBOA runs combine with otherefficiency enhancements?

To illustrate the benefits of combining the distance-based bias discussed in this paper with other ef-ficiency enhancement techniques, we combined the approach with sporadic model building and testedthe two efficiency enhancement techniques separately and in combination. Fig. 12 shows the results ofthese experiments. The results show that even in cases where the speedup due to distance-based bias isonly moderate, the overall speedup increases substantially due to the combination of the two techniques.Nearly multiplicative speedups can be observed in most scenarios.

6 Thoughts on Extending the Approach to Other Model-Directed Optimizers

6.1 Extended Compact Genetic Algorithm

The extended compact genetic algorithm (ECGA) (Harik, 1999) uses marginal product models (MPM) toguide optimization. An MPM splits variables to disjoint linkage groups. The distribution of variables ineach group is encoded by a probability table that specifies a probability for each combination of valuesof these variables. The overall probability distribution is given by the product of the distributions of theindividual linkage groups. In comparison with Bayesian network models, instead of connecting variablesthat influence each other strongly using directed edges, ECGA puts such variables into the same linkage

18

More  informaAon  •  Download  MEDAL  Reports  No.  2012004  and  2012007:  

h_p://medal-­‐lab.org/files/2012004.pdf  h_p://medal-­‐lab.org/files/2012007.pdf  

 

Acknowledgments  •  NSF  grants  ECS-­‐0547013  and  IIS-­‐1115352.  •  ITS  at  the  University  of  Missouri  in  St.  Louis.  •  University  of  Missouri  Bioinforma-cs  Consor-um.  •  Any  opinions,  findings,  and  conclusions  or  recommenda-ons  expressed  in  

this  material  are  those  of  the  authors  and  do  not  necessarily  reflect  the  views  of  the  Na-onal  Science  Founda-on.  

 

http://medal-lab.org  

in hBOA for additively decomposable problems based on a problem-specific distance metric. However,note that the framework can be applied to many other model-directed optimization techniques and thefunction ! can be defined in many other ways. To illustrate this, we outline how this approach can beextended to several other model-directed optimization techniques in section 6.

4 Distance-Based Bias

4.1 Additively Decomposable Functions

For many optimization problems, the objective function (fitness function) can be expressed in the form ofan additively decomposable function (ADF) of m subproblems:

f(X1, . . . , Xn) =m!

i=1

fi(Si), (5)

where (X1, . . . , Xn) are problem’s decision variables, fi is the ith subfunction, and Si ! {X1, X2, . . . , Xn}is the subset of variables contributing to fi (subsets {Si} can overlap). While they may often exist multipleways of decomposing the problem using additive decomposition, one would typically prefer decomposi-tions that minimize the sizes of subsets {Si}. As an example, consider the following objective function fora problem with 6 variables:

fexample(X1, X2, X3, X4, X5, X6) = f1(X1, X2, X5) + f2(X3, X4) + f3(X2, X5, X6).

In the above objective function, there are three subsets of variables, S1 = {X1, X2, X5}, S2 = {X3, X4} andS3 = {X2, X5, X6}, and three subfunctions {f1, f2, f3}, each of which can be defined arbitrarily.

It is of note that the difficulty of ADFs is not fully determined by the order (size) of subproblems, butalso by the definition of the subproblems and their interaction. In fact, there exist a number of NP-completeproblems that can be formulated as ADFs with subproblems of order 2 or 3, such as MAXSAT for 3-CNFformulas. On the other hand, one may easily define ADFs with subproblems of order n that can be solvedby a simple bit-flip hill climbing in low-order polynomial time.

4.2 Measuring Variable Distances for ADFs

The definition of a distance between two variables of an ADF used in this paper is based on the workof Hauschild and Pelikan (2008) and Hauschild et al. (2012). Given an additively decomposable problemwith n variables, we define the distance between two variables using a graph G of n nodes, one node pervariable. For any two variables Xi and Xj in the same subset Sk, that is, Xi, Xj " Sk, we create an edge in Gbetween the nodes Xi and Xj . See fig. 2 for an example of an ADF and the corresponding graph. Denotingby li,j the number of edges along the shortest path between Xi and Xj in G (in terms of the number ofedges), we define the distance between two variables as

D(Xi, Xj) =

"

li,j if a path between Xi and Xj exists, andn otherwise.

(6)

Fig. 2 illustrates the distance metric on a simple example. The above distance measure makes variables inthe same subproblem close to each other, whereas for the remaining variables, the distances correspond tothe length of the chain of subproblems that relate the two variables. The distance is maximal for variablesthat are completely independent (the value of a variable does not influence the contribution of the othervariable in any way).

Since interactions between problem variables are encoded mainly in the subproblems of the additiveproblem decomposition, the above distance metric should typically correspond closely to the likelihoodof dependencies between problem variables in probabilistic models discovered by EDAs. Specifically, thevariables located closer with respect to the metric should more likely interact with each other. Fig. 3 illus-trates this on two ADFs discussed later in this paper—the NK landscape with nearest neighbor interactionsand the two-dimensional Ising spin glass (for a description of these problems, see section 5.1). The figureanalyzes probabilistic models discovered by hBOA in 10 independent runs on each of the 1,000 random in-stances for each problem and problem size. For a range of distances d between problem variables, the figure

6