a survey of model-based methods for global...

103
A Survey of Model-based Methods for Global Optimization Thomas Bartz-Beielstein SPOTSeven Lab www.spotseven.de Technology Arts Sciences TH Köln BIOMA Bartz-Beielstein MBO 1 / 94

Upload: others

Post on 29-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

A Survey of Model-based Methods for GlobalOptimization

Thomas Bartz-Beielstein

SPOTSeven Labwww.spotseven.de

TechnologyArts Sciences

TH Köln

BIOMA

Bartz-Beielstein MBO 1 / 94

Page 2: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Keywords

I Abelson & SussmannI WolpertI NetflixI Deep Learning

Bartz-Beielstein MBO 2 / 94

Page 3: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Agenda

I Part 1: BasicsI Part 2: ExampleI Part 3: ConsiderationsI Part 4: ExampleI Part 5: Conclusion

Bartz-Beielstein MBO 3 / 94

Page 4: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Introduction

Overview

Introduction

Stochastic Search Algorithms

Quality Criteria: How to Select Surrogates

Examples

Ensembles: Considerations

SPO2 Part 2

Stacking: Considerations

Bartz-Beielstein MBO 4 / 94

Page 5: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Introduction

Model-based optimization (MBO)

I Prominent role in todays modeling, simulation, and optimizationprocesses

I Most efficient technique for expensive and time-demanding real-worldoptimization problems

I Engineering domain, MBO is an important practiceI Recent advances in

I computer science,I statistics, andI engineeringI in combination with progress in high-performance computing

I Tools for handling problems, considered unsolvable only a few decadesago

Bartz-Beielstein MBO 5 / 94

Page 6: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Introduction

Global optimization (GO)

I GO can be categorized based on different criteria.I Properties of problems

I continuous versus combinatorialI linear versus nonlinearI convex versus multimodal, etc.

I We present an algorithmic view, i.e., properties of algorithmsI The term GO will be used in this talk for algorithms that are trying to find

and explore global optimal solutions with complex, multimodal objectivefunctions [Preuss, 2015].

I GO problems are difficult: nearly no structural information (e.g., numberof local extrema) available

I GO problems belong to the class of black-box functions, i.e., the analyticform is unknown

I Class of black-box function contains also functions that are easy to solve,e.g., convex functions

Bartz-Beielstein MBO 6 / 94

Page 7: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Introduction

Problem

I Optimization problem given by

Minimize: f (~x) subject to ~xl ≤ ~x ≤ ~xu,

where f : Rn → R is referred to as the objective function and ~xl and ~xudenote the lower and upper bounds of the search space (region ofinterest), respectively

I Setting arises in many real-world systems:I when the explicit form of the objective function f is not readily available,I e.g., user has no access to the source code of a simulator

I We cover stochastic (random) search algorithms, deterministic GOalgorithms are not further discussed

I Random and stochastic used synonymously

Bartz-Beielstein MBO 7 / 94

Page 8: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Introduction

Taxonomy of model-based approaches in GO

[1] Deterministic[2] Random Search

[2.1] Instance based[2.2] Model based optimization (MBO)

[2.2.1] Distribution based[2.2.2] Surrogate Model Based Optimization (SBO)

[2.2.2.1] Single surrogate based[2.2.2.2] Multi-fidelity based[2.2.2.3] Evolutionary surrogate based[2.2.2.4] Ensemble surrogate based

Bartz-Beielstein MBO 8 / 94

Page 9: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stochastic Search Algorithms

Overview

Introduction

Stochastic Search Algorithms

Quality Criteria: How to Select Surrogates

Examples

Ensembles: Considerations

SPO2 Part 2

Stacking: Considerations

Bartz-Beielstein MBO 9 / 94

Page 10: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stochastic Search Algorithms

Random Search

I Stochastic search algorithm: Iterative search algorithm that uses astochastic procedure to generate the next iterate

I Next iterate can beI a candidate solution to the GO orI a probabilistic model, where solutions can be drawn from

I Do not depend on any structural information of the objective function suchas gradient information or convexity⇒ robust and easy to implement

I Stochastic search algorithms can further be categorized asI instance-based orI model-based algorithms [Zlochin et al., 2004]

Bartz-Beielstein MBO 10 / 94

Page 11: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stochastic Search Algorithms

[2.1] Instance-based Algorithms

I Instance-based algorithms: use a single solution, ~x , or population, P(t),of candidate solutions

I Construction of new candidates depends explicitly on previouslygenerated solutions

I Examples: Simulated annealing, evolutionary algorithms

1: t = 0. InitPopulation(P).2: Evaluate(P).3: while not TerminationCriterion() do4: Generate new candidate solutions P’(t ) according to a specified

random mechanism.5: Update the current population P(t+1) based on population P(t) and

candidate solutions in P’(t).6: Evaluate(P(t + 1)).7: t = t + 1.8: end while

Bartz-Beielstein MBO 11 / 94

Page 12: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stochastic Search Algorithms

[2.2] MBO: Model-based Algorithms

I MBO algorithms: generate a population of new candidate solutions P ′(t)by sampling from a model

I In statistics: model ≡ distributionI Model (distribution) reflects structural properties of the underlying true

function, say fI Adapting the model (or the distribution), the search is directed into

regions with improved solutionsI One of the key ideas: replacement of expensive, high fidelity, fine grained

function evaluations, f (~x), with evaluations, f (~x), of an adequate cheap,low fidelity, coarse grained model, M

Bartz-Beielstein MBO 12 / 94

Page 13: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stochastic Search Algorithms

[2.2.1] Distribution-based Approaches

I Metamodel is a distributionI Generate a sequence of iterates (probability distributions) {p(t)} with the

hope thatp(t)→ p∗ as t →∞,

where p∗: limiting distribution, assigns most of its probability mass to theset of optimal solutions

I Probability distribution is propagated from one iteration to the nextI Instance-based algorithms propagate candidate solutions

1: t = 0. Let p(t) be a probability distribution.2: while not TerminationCriterion() do3: Randomly generate a population of candidate solutions P(t) from p(t).4: Evaluate(P(t)).5: Update the distribution using population (samples) P(t) to generate a

new distribution p(t + 1).6: t = t + 1.7: end while

Bartz-Beielstein MBO 13 / 94

Page 14: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stochastic Search Algorithms

[2.2.1] Estimation of distribution algorithms (EDA)

I EDA: very popular in the field of evolutionary algorithms (EA)I Variation operators such as mutation and recombination replaced by a

distribution based procedure:I Probability distribution estimated from promising candidate solutions from

the current population⇒ generate new populationI Larraaga and Lozano [2002] review different ways for using probabilistic

modelsI Hauschild and Pelikan [2011] discuss advantages and outline many of

the different types of EDAsI Hu et al. [2012] present recent approaches and a unified view

Bartz-Beielstein MBO 14 / 94

Page 15: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stochastic Search Algorithms

[2.2.2] Focus on Surrogates

I Although distribution-based approaches play an important role in GO,they will not be discussed further in this talk

I We will concentrate on surrogate model-based approachesI Origin in statistical design and analysis of experiments, especially in

response surface methodology [G E P Box, 1951, Montgomery, 2001]

Bartz-Beielstein MBO 15 / 94

Page 16: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stochastic Search Algorithms

[2.2.2] Surrogate Model-based Approaches

I In general: Surrogates used, when outcome of a process cannot bedirectly measured

I Imitate the behavior of the real model as closely as possible while beingcomputationally cheaper to evaluate

I Surrogate models also known asI the cheap model, orI a response surface,I meta model,I approximation,I coarse grained model

I Simple surrogate models constructed using a data-driven approachI Refined by integrating additional points or domain knowledge, e.g.,

constraints

Bartz-Beielstein MBO 16 / 94

Page 17: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stochastic Search Algorithms

[2.2.2] Surrogate Model-based Approaches

Sample designspace

Optimize onmetamodel

Initial design

Buildmetamodel

Validatemetamodel

I Validation step (e.g., via CV) is optionalI Samples generated iteratively to improve the surrogate model accuracy

Bartz-Beielstein MBO 17 / 94

Page 18: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stochastic Search Algorithms

[2.2.2] Surrogate Model Based Optimization (SBO)Algorithm

1: t = 0. InitPopulation(P(t))2: Evaluate(P(t))3: while not TerminationCriterion() do4: Use P(t) to build a cheap model M(t)5: P ′(t + 1) = GlobalSearch(M(t))6: Evaluate(P ′(t + 1))7: P(t + 1) ⊆ P(t) + P ′(t + 1)8: t = t + 19: end while

Bartz-Beielstein MBO 18 / 94

Page 19: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stochastic Search Algorithms

[2.2.2] Surrogates

I Wide range of surrogates developed in the last decades⇒ complexdesign decisions [Wang and Shan, 2007]:

I (a) MetamodelsI (b) DesignsI (c) Model fit

I (a) Metamodels:I Classical regression models such as polynomial regression or response

surface methodology [G E P Box, 1951, Montgomery, 2001]I support vector machines (SVM) [Vapnik, 1998],I neural networks [Zurada, 1992],I radial basis functions [Powell, 1987], orI Gaussian process (GP) models, design and analysis of computer

experiments, Kriging [Schonlau, 1997], [Büche et al., 2005], [Antognini andZagoraiou, 2010], [Kleijnen, 2009], [Santner et al., 2003]

I Comprehensive introduction to SBO in [Forrester et al., 2008]

Bartz-Beielstein MBO 19 / 94

Page 20: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stochastic Search Algorithms

[2.2.2] Surrogates: Popular metamodeling techniques

I (b) Designs [Wang and Shan, 2007]:

I ClassicalI Fractional factorialI Central compositeI Box-BehnkenI A-, D-optimal (alphabetically)I Plackett-Burmann

I Space fillingI Simple grids

I Latin hypercubeI OrthogonalI UniformI Minimax and Maximin

I Hybrid methodsI Random or human selectionI Sequential methods

Bartz-Beielstein MBO 20 / 94

Page 21: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stochastic Search Algorithms

[2.2.2] Surrogates: Popular metamodeling techniques

I (c) Model fitting [Wang and Shan, 2007]:

I Weighted least squaresregression

I Best linear unbiased predictor(BLUP)

I Likelihood

I Multipoint approximationI Sequential metamodelingI Neural networks:

backpropagationI Decision trees: entropy

Bartz-Beielstein MBO 21 / 94

Page 22: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stochastic Search Algorithms

[2.2.2] Applications of SBO

I One of the most popular application areas for SBO:I Simulation-based design of complex engineering problems

I computational fluid dynamics (CFD)I finite element modeling (FEM) methods

I Exact solutions⇒ solvers require a large number of expensive computersimulations

I Two variants of SBOI (i) metamodel [2.2.2.1]: uses one or several different metamodelsI (ii) multi-fidelity approximation [2.2.2.2]: uses several instances with different

parameterizations of the same metamodel

Bartz-Beielstein MBO 22 / 94

Page 23: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stochastic Search Algorithms

[2.2.2.1] Applications of Metamodels and [2.2.2.2]Multi-fidelity Approximation

I Meta-modeling approachesI 31 variable helicopter rotor design [Booker et al., 1998]I Aerodynamic shape design problem [Giannakoglou, 2002]I Multi-objective optimal design of a liquid rocket injector [Queipo et al., 2005]I Airfoil shape optimization with CFD [Zhou et al., 2007]I Aerospace design [Forrester and Keane, 2009]

I Multi-fidelity ApproximationI Several simulation models with different grid sizes in FEM [Huang et al.,

2015]I Sheet metal forming process [Sun et al., 2011]

I “How far have we really come?” [Simpson et al., 2012]

Bartz-Beielstein MBO 23 / 94

Page 24: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stochastic Search Algorithms

[2.2.2.3] Surrogate-assisted Evolutionary Algorithms

I Surrogate-assisted EA: EA that decouple the evolutionary search and thedirect evaluation of the objective function

I Cheap surrogate model replaces evaluations of expensive objectivefunction

Bartz-Beielstein MBO 24 / 94

Page 25: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stochastic Search Algorithms

[2.2.2.3] Surrogate-assisted Evolutionary Algorithms

I Combination of a genetic algorithm and neural networks for aerodynamicdesign optimization [Hajela and Lee, 1997]

I Approximate model of the fitness landscape using Kriging interpolation toaccelerate the convergence of EAs [Ratle, 1998]

I Evolution strategy (ES) with neural network based fitness evaluations [Jinet al., 2000]

I Surrogate-assisted EA framework with online learning [Zhou et al., 2007]I Not evaluate every candidate solution (individual), but to just estimate the

objective function value of some of the neighboring individuals [Brankeand Schmidt, 2005]

I Survey of surrogate-assisted EA approaches [Jin, 2003]I SBO approaches for evolution strategies [Emmerich et al., 2002]

Bartz-Beielstein MBO 25 / 94

Page 26: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stochastic Search Algorithms

[2.2.2.4] Multiple Models

I Instead of using one surrogate model only, several models Mi ,i = 1,2, . . . ,p, generated and evaluated in parallel

I Each model Mi : X → y usesI same candidate solutions, X , from the population P andI same results, y , from expensive function evaluations

I Multiple models can also be used to partition the search spaceI The tree-based Gaussian process (TGP): regression trees to partition the

search space, fit local GP surrogates in each region [Gramacy, 2007].I Tree-based partitioning of an aerodynamic design space, independent

Kriging surfaces in each partition [Nelson et al., 2007]I Combination of an evolutionary model selection (EMS) algorithm with

expected improvement (EI) criterion: select best performing surrogatemodel type at each iteration of the EI algorithm [Couckuyt et al., 2011]

Bartz-Beielstein MBO 26 / 94

Page 27: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stochastic Search Algorithms

[2.2.2.4] Multiple Models: Ensembles

I Ensembles of surrogate models gained popularity:I Adaptive weighted average model of the individual surrogates [Zerpa

et al., 2005]I Use the best surrogate model or a weighted average surrogate model

instead [Goel et al., 2006]I Weighted-sum approach for the selection of model ensembles [Sanchez

et al., 2006]I Models for the ensemble chosen based on their performanceI Weights are adaptive and inversely proportional to the local modeling errors

Bartz-Beielstein MBO 27 / 94

Page 28: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Quality Criteria: How to Select Surrogates

Overview

Introduction

Stochastic Search Algorithms

Quality Criteria: How to Select Surrogates

Examples

Ensembles: Considerations

SPO2 Part 2

Stacking: Considerations

Bartz-Beielstein MBO 28 / 94

Page 29: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Quality Criteria: How to Select Surrogates

Model Refinement: Selection Criteria for SamplePoints

I An initial model refined during the optimization⇒ Adaptive samplingI Identify new points, so-called infill pointsI Balance between

I exploration, i.e., improving the model quality (related to the model, global),and

I exploitation, i.e., improving the optimization and determining the optimum(related to the objective function, local)

I Expected improvement (EI): popular adaptive sampling method [Mockuset al., 1978], [Jones et al., 1998]

Bartz-Beielstein MBO 29 / 94

Page 30: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Quality Criteria: How to Select Surrogates

Model Selection Criteria

I EI approach handles the initialization and refinement of a surrogatemodel

I But not the selection of the model itselfI Popular efficient global optimization (EGO) algorithm uses a Kriging

modelI Because Kriging inherently determines the prediction variance (necessary

for the EI criterion)I But there is no proof that Kriging is the best choiceI Alternative surrogate models, e.g., neural networks, regression trees,

support vector machines, or lasso and ridge regression may be bettersuited

I An a priory selection of the best suited surrogate model is conceptuallyimpossible in the framework treated in this talk, because of the black-boxsetting

Bartz-Beielstein MBO 30 / 94

Page 31: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Quality Criteria: How to Select Surrogates

Single or Ensemble

I Regarding the model choice, the user can decide whether to useI one single model, i.e., one unique global model orI multiple models, i.e., an ensemble of different, possibly local, models

The static SBO uses a single, global surrogate model, usually refined byadaptive sampling, but did not change⇒ category [2.2.2.1]

Bartz-Beielstein MBO 31 / 94

Page 32: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Quality Criteria: How to Select Surrogates

Criteria for Selecting a Surrogate

I Here, we do not consider the selection of a new sample point (as done inEI)

I Instead: Criteria for the selection of one (or several) surrogate modelsI Usually, surrogate models chosen according to their estimated true error

[Jin et al., 2001], [Shi and Rasheed, 2010]I Commonly used performance metrics:

I mean absolute error (MAE)I root mean square error (RMSE)

I Generally, attaining a surrogate model that has minimal error is thedesired feature

I Methods from statistics, statistical learning [Hastie, 2009], and machinelearning [Murphy, 2012]:

I Simple holdoutI Cross-validationI Bootstrap

Bartz-Beielstein MBO 32 / 94

Page 33: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples

Overview

Introduction

Stochastic Search Algorithms

Quality Criteria: How to Select Surrogates

Examples

Ensembles: Considerations

SPO2 Part 2

Stacking: Considerations

Bartz-Beielstein MBO 33 / 94

Page 34: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples

Criteria for Selecting a Surrogate: Evolvability

I Model error is not the only criterion for selecting surrogate modelsI Evolvability learning of surrogates approach (EvoLS) [Le et al., 2013]:

I Use fitness improvement for determining the quality of surrogate modelsI EvoLS belongs to the category of surrogate-assisted evolutionary

algorithms ([2.2.2.3])I Distributed, local information

Bartz-Beielstein MBO 34 / 94

Page 35: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples

Evolvability Learning of Surrogates

I EvoLS: select a surrogate models that enhance search improvement inthe context of optimization

I Process information about theI (i) different fitness landscapes,I (ii) state of the search, andI (iii) characteristics of the search algorithm to statistically determine the

so-called evolvability of each surrogate modelI Evolvability of a surrogate model estimates the expected improvement of

the objective function value that the new candidate solution has gainedafter a local search has been performed on the related surrogatemodel [Le et al., 2013]

Bartz-Beielstein MBO 35 / 94

Page 36: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples

Evolvability

I Local search: After recombination and mutation, a local search isperformed

I It uses an individual local meta-model, M, for each offspringI The local optimizer, ϕM , uses an offspring ~y as an input and returns ~y∗ as

the refined offspringI Evolvability measure can be estimated as follows [Le et al., 2013]:

EvM(~x) = f (~x)−K∑

i=1

f (~y∗i )× wi(~x)

with weights (selection probabilities of the offsprings):

wi(~x) =P(~yi |P(t), ~x)∑Kj=1 P(~yj |P(t), ~x)

Bartz-Beielstein MBO 36 / 94

Page 37: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples

SPO

I EvoLS: distributed, local information. Now: more centralized, globalinformation⇒ sequential parameter optimization (SPO)

I Goal: Analysis and understanding of algorithmsI Early versions of the SPO [Bartz-Beielstein, 2003, Bartz-Beielstein et al.,

2005] combined methods fromI design of experiments (DOE) [Pukelsheim, 1993]I response surface methodology (RSM) [Box and Draper, 1987, Montgomery,

2001]I design and analysis of computer experiments (DACE) [Lophaven et al.,

2002, Santner et al., 2003]I regression trees [Breiman et al., 1984]

I Also: SPO as an optimizer

Bartz-Beielstein MBO 37 / 94

Page 38: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples

SPO

I SPO: sequential, model based approach to optimizationI Nowadays: established parameter tuner and an optimization algorithmI Extended in several ways:

I For example, Hutter et al. [2013] benchmark an SPO derivative, theso-called sequential model-based algorithm configuration (SMAC)procedure, on the BBOB set of blackbox functions.

I Small budget of 10× d evaluations of d-dimensional functions, SMAC inmost cases outperforms the state-of- the-art blackbox optimizer CMA-ES

Bartz-Beielstein MBO 38 / 94

Page 39: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples

SPO

I The most recent version, SPO2, is currently under developmentI Integration of state-of-the-art ensemble learnersI SPO2 ensemble engine:

I Portfolio of surrogate modelsI regression trees and random forest, least angle regression (lars), and KrigingI Uses cross validation to select an improved model from the portfolio of

candidate modelsI Creates a weighted combination of several surrogate models to build the

improved modelI Use stacked generalization to combine several level-0 models of different

types with one level-1 model into an ensemble [Wolpert, 1992]I Level-1 training algorithm: simple linear model

Bartz-Beielstein MBO 39 / 94

Page 40: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples

SPO

I Promising preliminary resultsI SPO2 ensemble engine can lead to significant performance

improvementsI Rebolledo Coy et al. [2016] present a comparison of different data driven

modeling methodsI Bayesian modelI Several linear regression modelsI Kriging modelI Genetic programming

I Models build on industrial data for the development of a robust gas sensorI Limited amount of samples and a high variance

Bartz-Beielstein MBO 40 / 94

Page 41: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples

SPO

I Two sensors are comparedI 1st sensor (MSE)

I Linear model (0.76), OLS (0.79), Lasso (0.56), Kriging (0.57), Bayes (0.79),and genetic programming (0.58)

I SPO2 0.38I 2nd sensor (MSE)

I Linear model (0.67), OLS (0.80), Lasso (0.49), Kriging (0.49), Bayes (0.79),and genetic programming (0.27)

I SPO2 0.29

Bartz-Beielstein MBO 41 / 94

Page 42: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples

Begin: Jupyter Interactive Document

I The following part of this talk is based on an interactive jupyternotebook [Pérez and Granger, 2007]

I The next slides (43 - 56) summarize the output from the jupyter notebook

BIOMA

Bartz-Beielstein MBO 42 / 94

Page 43: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples DSPO: Deep Sequential Parameter Optimization

Preparation [jupyter]

I BasicallyI 1. import libraries andI 2. set the SPO2 parameters, i.e., the number of folds

I Code implements ideas from Wolpert [1992], based on Olivetti [2012]I Libraries shown explicitly, because we will comment on this topic later!

import matplotlib.pyplot as pltfrom IPython.display import set_matplotlib_formatsfrom sklearn.linear_model import LassoCV, LassoLarsCV, LassoLarsIC...import statsmodels.formula.api as sm

Bartz-Beielstein MBO 43 / 94

Page 44: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples Data

The Complete Data Set [jupyter]I Problem description in Rebolledo Coy et al. [2016]I One training data set and one test data set

In [2]: dfTrain = read_csv('/Users/bartz/workspace/svnbib/Python.d/projects/pyspot2/data/training.csv')dfTest = read_csv('/Users/bartz/workspace/svnbib/Python.d/projects/pyspot2/data/validation.csv')

RangeIndex: 80 entries, 0 to 79Data columns (total 9 columns):X1 80 non-null float64X2 80 non-null float64..X7 80 non-null float64Y1 80 non-null float64Y2 80 non-null float64dtypes: float64(9)memory usage: 5.7 KB

X1 X2 X3 ...0 -1.132054 -1.206144 .......

Bartz-Beielstein MBO 44 / 94

Page 45: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples Data

Data Used in this Study [jupyter]

I Here, we consider data from the seconds sensorI There are seven input values and one output value (y )I The goal of this study: predict the outcome y using the seven input

measurements (X1, . . . ,X7)I Output (y ) plotted against input (X1, . . . ,X7)

−2−1012345−3

−2

−1

0

1

2

3

−2−1012345−3

−2

−1

0

1

2

3

−2−1012345−3

−2

−1

0

1

2

3

−2−1012345−3

−2

−1

0

1

2

3

−2−1012345−3

−2

−1

0

1

2

3

−2−1012345−3

−2

−1

0

1

2

3

−2−1012345−3

−2

−1

0

1

2

3

Bartz-Beielstein MBO 45 / 94

Page 46: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples Data

2. Cross Validation (CV Splits) [jupyter]

I The training data are split into foldsI KFold divides all the samples in k = nfolds folds, i.e., groups of samples of

equal sizes (if possible)I If k = n, this is equivalent to the Leave One Out strategyI Prediction function is learned using (k − 1) folds, and the fold left out is

used for test

Bartz-Beielstein MBO 46 / 94

Page 47: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples Data

3. Models in the Ensemble [jupyter]

I Linear RegressionI 1. Normalized predictors

I LinearRegression(normalize=False)I LinearRegression(normalize=True)

I 2. InterceptI LinearRegression(normalize=False, fit_intercept = False)I LinearRegression(normalize=True, fit_intercept = False)

I Random ForestI RandomForestRegressor(n_estimators = 10, random_state=0),I RandomForestRegressor(n_estimators = 100, oob_score = True,

random_state=2),I Lasso

I Lasso(alpha=0.1, fit_intercept = False)I LassoCV(positive = True)

Bartz-Beielstein MBO 47 / 94

Page 48: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples Data

3. Models in the Ensemble [jupyter]

I Gaussian ProcessesI The kernel specifying the covariance function of the GPI Parameterized with different kernels:I For example RBF, Matern, RationalQuadratic, ExpSineSquared, DotProduct,

ConstantKernelI If none is passed, the kernel RBF() is used as defaultI Kernel’s hyperparameters are optimized during fittingI Kernel combinations are allowed.I Example:kern = 1.0 * RBF(length_scale=100.0

, length_scale_bounds = (1e-2, 1e3))+ WhiteKernel(noise_level=1, noise_level_bounds=(1e-10, 1e+1))

⇒ GaussianProcessRegressor(kern)

Bartz-Beielstein MBO 48 / 94

Page 49: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples Data

3. Models in the Ensemble [jupyter]

In [5]: clfs = [ LinearRegression(), RandomForestRegressor()#, GaussianProcessRegressor() ]

Bartz-Beielstein MBO 49 / 94

Page 50: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples Data

Matrix Preparation, Dimensions [jupyter]

I n: size of the training set (samples): X.shape[0]I k : number of folds for CV: n_foldsI p: number of models: len(clfs)I m: size of the test data (samples): XTest.shape[0]I We will use two matrices:

I 1. YCV is a (n × p)-matrix. It stores the results from the cross validation foreach model. The training set is partitioned into k folds (n_folds=k).

I 2. YBT is a (m × p)-matrix. It stores the aggregated results from the crossvalidation models on the test data. For each fold, p separate models arebuild, which are used for prediction on the test data. The predicted valuesfrom the k folds are averaged for each model, which results in (m × p)different values.

In [6]: YCV = np.zeros((X.shape[0], len(clfs)))YBT = np.zeros((XTest.shape[0], len(clfs)))

Bartz-Beielstein MBO 50 / 94

Page 51: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples Data

Cross-Validation [jupyter]

I Each of the p algorithms is run separately on the k foldsI The training data set is split into foldsI Each fold contains training data (train) and validation data (val)I For each fold, a model is build using the train dataI The model is used to make predictions on

I 1. the validation data from the k -th fold These values are stored in the matrixYCV

I 2. the test data⇒ YBT[i]I The average of these predictions are stored in the matrix YBT

Bartz-Beielstein MBO 51 / 94

Page 52: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples Data

The YCV Matrix [jupyter]

I The YCV matrix has p columns and n rowsI Each row contains the prediction for the n-th sample from the training

data set

(80, 2)[[ 0.98940238 1.72943761][ 0.14445345 0.17159034][-0.0969411 0.86240816][ 1.24820656 0.27214466][ 0.0897393 0.1542136 ]...[-0.56801858 -0.50319222]]

Bartz-Beielstein MBO 52 / 94

Page 53: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples Data

The YBT Matrix [jupyter]

I The YBT matrix has p columns and m rowsI Each row contains the prediction for the m-th sample from the test data

setI These values are the mean values of the k fold predictions calculated

with the training dataI Each fold generates one model, which is used for prediction on the test

dataI The mean value from these k predictions are stored in the matrix YBT

(60, 2)[[ 6.85725928e-01 1.18237211e+00][ 1.19810293e+00 -2.93516856e-01][ -8.16703631e-01 -1.87229011e+00][ 2.06323037e+00 7.25281970e-01][ 1.37816630e+00 -1.35272556e-01]...[ 3.22714124e+00 1.36502285e+00]]

Bartz-Beielstein MBO 53 / 94

Page 54: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples Blending: Level-1 Model

Model Building [jupyter]

I The level-1 model is a function of the CV-values of each model to theknown, training y -values

I It provides an estimate of the influence of the single modelsI For example, if a linear level-1 model is used, the coefficient βi represents

the effect of the i-th model

Bartz-Beielstein MBO 54 / 94

Page 55: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples Blending: Level-1 Model

Model Prediction [jupyter]

I The level-1 model is used for predictions on the YBT data, i.e., on theaveraged predictions of the CV-models

I Constructed using the effects of the predicted values of the single models(determined by linear regression) on the true values of the training data

I If a model predicts a similar value as the true value during the CV, then ithas a strong effect

I The final predictions are made using the coefficients (weights) of thesingle models on the YBT data

I YBT data: predicted values from the corresponding models on the finaltest data

[[ 0.98940238 1.72943761 0.55617508][ 0.14445345 0.17159034 0.28026176]...

[-0.56801858 -0.50319222 -0.28009123]]Intercept: Coefficients:-0.0199327428866 [ 0.34705597 0.68483785]

Bartz-Beielstein MBO 55 / 94

Page 56: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples Blending: Level-1 Model

Comparing MSE [jupyter]I Comparison of the mean squared error from the SPO2 ensemble and the

single models:

SPO2 (MSE): 0.284948273406L (MSE): 0.673695001324R (MSE): 0.367652881967

−3 −2 −1 0 1 2 3

Measured

−3

−2

−1

0

1

2

3

4

Pre

dic

ted

SPO2

L

R

Bartz-Beielstein MBO 56 / 94

Page 57: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Examples Blending: Level-1 Model

End: Jupyter Interactive Document

I End of the interactive jupyter notebook [Pérez and Granger, 2007]I The next slides (59 - 64) are based on statics slides

BIOMA

Bartz-Beielstein MBO 57 / 94

Page 58: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Ensembles: Considerations

Overview

Introduction

Stochastic Search Algorithms

Quality Criteria: How to Select Surrogates

Examples

Ensembles: Considerations

SPO2 Part 2

Stacking: Considerations

Bartz-Beielstein MBO 58 / 94

Page 59: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Ensembles: Considerations

Why are Ensembles Better?

I The following considerations are based on van Veen [2015]

Example 1 (Error Correcting Codes)I Signal in the form of a binary string like:

0111101010100000111001010101 gets corrupted with just one bitflipped as in the following:0110101010100000111001010101

I Repetition code: Simplest solutionI Repeat signal multiple times in equally sized chunks and have a majority

voteI 1. Original signal: 001010101 2. Encoded: (Relay multiple times)

000000111000111000111000111 3. Decoding: Take three bit andcalculate majority

Bartz-Beielstein MBO 59 / 94

Page 60: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Ensembles: Considerations

A Simple Machine Learning Example

I Test set of ten samplesI The ground truth is all positive: 1111111111I Three binary classifiers (A ,B, C) with a 70% accuracyI View classifiers as pseudo-random number generators

I output a 1 70% of the time andI a 0 30% of the time

I Pseudo-classifiers are able to obtain 78% accuracy through a votingensemble [van Veen, 2015]

Bartz-Beielstein MBO 60 / 94

Page 61: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Ensembles: Considerations

A Simple Machine Learning Example

I All three are correct: (0.7× 0.7× 0.7) = 0.3429I Two are correct: (0.7× 0.7× 0.3 + 0.7× 0.3× 0.7 + 0.3× 0.7× 0.7) =

0.4409I Two are wrong : (0.3× 0.3× 0.7 + 0.3× 0.7× 0.3 + 0.7× 0.3× 0.3) =

0.189I All three are wrong (0.3× 0.3× 0.3) = 0.027I This majority vote ensemble will be correct an average of ≈ 78%:

(0.3429 + 0.4409 = 0.7838)

Bartz-Beielstein MBO 61 / 94

Page 62: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Ensembles: Considerations

Correlation

I Uncorrelated models clearly do better when ensembled than correlatedI 1111111100 = 80% accuracyI 1111111100 = 80% accuracyI 1011111100 = 70% accuracy

I Models highly correlated in their predictions. Majority vote results in noimprovement:

I 1111111100 = 80% accuracyI Now 3 less-performing, but highly uncorrelated models:

I 1111111100 = 80% accuracyI 0111011101 = 70% accuracyI 1000101111 = 60% accuracy

I Ensembling with a majority vote results in:I 1111111101 = 90% accuracy

I Lower correlation between ensemble model members seems to result inan increase in the error-correcting capability [van Veen, 2015]

Bartz-Beielstein MBO 62 / 94

Page 63: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Ensembles: Considerations

Beyond Averaging: Stacking

I Averaging, i.e., taking the mean of individual model predictions, workswell for a wide range of problems (classification and regression) andmetrics

I Often referred to as baggingI Averaging predictions often reduces overfit [van Veen, 2015]I Wolpert [1992] introduced stacked generalization before bagging was

proposed by Breiman [1996]I Wolpert is famous for There is no free lunch in search and optimization

theoremI Basic idea behind stacking: use a pool of base classifiers, then using

another classifier to combine their predictions

Bartz-Beielstein MBO 63 / 94

Page 64: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Ensembles: Considerations

2-fold Stacking

I Stacker model gets information on the problem space by using thefirst-stage predictions as features [van Veen, 2015]:

I 1. Split the train set in 2 parts: train_a and train_bI 2. Fit a first-stage model on train_a and create predictions for train_bI 3. Fit the same model on train_b and create predictions for train_aI 4. Finally fit the model on the entire train set and create predictions for the

test set.I 5. Now train a second-stage stacker model on the predictions from the

first-stage model(s).

⇒ Python Example

Bartz-Beielstein MBO 64 / 94

Page 65: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Ensembles: Considerations

Begin: Jupyter Interactive Document

I The following part of this talk is based on an interactive jupyternotebook [Pérez and Granger, 2007]

I The next slides (67 - 85) summarize the output from the jupyter notebook

BIOMA

Bartz-Beielstein MBO 65 / 94

Page 66: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

SPO2 Part 2

Overview

Introduction

Stochastic Search Algorithms

Quality Criteria: How to Select Surrogates

Examples

Ensembles: Considerations

SPO2 Part 2

Stacking: Considerations

Bartz-Beielstein MBO 66 / 94

Page 67: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

SPO2 Part 2 Artificial Test Functions

Function Definitions [jupyter]

I Motivated by van der Laan and Polley [2010], we consider six testfunctions

I All simulations involve a univariate X drawn from a uniform distribution in[-4, +4]

I Test functions:I f1(x): return -2 * I(x < -3) + 2.55 * I(x > -2) - 2 * I(x > 0) + 4 * I(x > 2) - 1 * I(x

>3 ) + εI f2(x): return 6 + 0.4 * x - 0.36x * x + 0.005x * x * x + εI f3(x): return 2.83 * np.sin(math.pi/2 * x) + εI f4(x): return 4.0 * np.sin(3 * math.pi * x) * I(x >= 0) + εI f5(x): return x + εI f6(x): return np.random.normal(0,1,len(x)) + ε

I I(·) indicator function, ε drawn from an independent standard normaldistribution, sample size r = 100 (repeats)

Bartz-Beielstein MBO 67 / 94

Page 68: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

SPO2 Part 2 Artificial Test Functions

Function Definitions [jupyter]

I f1: Step function

−4 −3 −2 −1 0 1 2 3 4−4

−2

0

2

4

6

Bartz-Beielstein MBO 68 / 94

Page 69: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

SPO2 Part 2 Artificial Test Functions

Function Definitions [jupyter]

I f2: Polynomial function

−4 −3 −2 −1 0 1 2 3 4−2

0

2

4

6

8

Bartz-Beielstein MBO 69 / 94

Page 70: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

SPO2 Part 2 Artificial Test Functions

Function Definitions [jupyter]

I f3: Sine function

−4 −3 −2 −1 0 1 2 3 4

−4

−2

0

2

4

Bartz-Beielstein MBO 70 / 94

Page 71: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

SPO2 Part 2 Artificial Test Functions

Function Definitions [jupyter]

I f4: Composite function

−4 −3 −2 −1 0 1 2 3 4−8

−6

−4

−2

0

2

4

6

Bartz-Beielstein MBO 71 / 94

Page 72: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

SPO2 Part 2 Artificial Test Functions

Function Definitions [jupyter]

I f5: Linear function

−4 −3 −2 −1 0 1 2 3 4−8

−6

−4

−2

0

2

4

6

Bartz-Beielstein MBO 72 / 94

Page 73: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

SPO2 Part 2 Artificial Test Functions

Function Definitions [jupyter]

I f6: Noise function

−4 −3 −2 −1 0 1 2 3 4−5

−4

−3

−2

−1

0

1

2

3

Bartz-Beielstein MBO 73 / 94

Page 74: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

SPO2 Part 2 Experiment 1: Step Function

f1: Coefficients of the Level-1 Model [jupyter]

I The coefficients can be interpreted as weights in the linear combinationof the models. 0 = intercept; 1, 2, and 3 denote the β1, β2, and β3 values,respectively

0 1 2 3−0.4

−0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Bartz-Beielstein MBO 74 / 94

Page 75: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

SPO2 Part 2 Experiment 1: Step Function

f1: R2 Values [jupyter]I R2 (larger values are better) and standard deviation.

I SPO: 0.78211976, 0.03308847I L: 0.4024831, 0.07134356I R: 0.78556947, 0.03187105I G: 0.76547433, 0.03564519

0 1 2 30.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Bartz-Beielstein MBO 75 / 94

Page 76: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

SPO2 Part 2 Experiment 2: Polynomial Function

f2: Coefficients of the Level-1 Model [jupyter]

I The coefficients can be interpreted as weights in the linear combinationof the models. 0 = intercept; 1, 2, and 3 denote the β1, β2, and β3 values,respectively

0 1 2 3−0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Bartz-Beielstein MBO 76 / 94

Page 77: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

SPO2 Part 2 Experiment 2: Polynomial Function

f2: R2 Values [jupyter]I R2 (larger values are better) and standard deviation.

I SPO: 0.79514735 0.03602018I L: 0.21445917 0.07656562I R: 0.79488344 0.03604606I G: 0.79514727 0.03602018

0 1 2 30.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Bartz-Beielstein MBO 77 / 94

Page 78: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

SPO2 Part 2 Experiment 3: Sine Function

f3: Coefficients of the Level-1 Model [jupyter]

I The coefficients can be interpreted as weights in the linear combinationof the models. 0 = intercept; 1, 2, and 3 denote the β1, β2, and β3 values,respectively

0 1 2 3−0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Bartz-Beielstein MBO 78 / 94

Page 79: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

SPO2 Part 2 Experiment 3: Sine Function

f3: R2 Values [jupyter]I R2 (larger values are better) and standard deviation.

I SPO: 0.7939634 0.02777211I L: 0.11677184 0.05688847I R: 0.79244941 0.02743085I G: 0.79396338 0.02777211

0 1 2 30.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Bartz-Beielstein MBO 79 / 94

Page 80: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

SPO2 Part 2 Experiment 4: Linear-Sine Function

f4: Coefficients of the Level-1 Model [jupyter]

I The coefficients can be interpreted as weights in the linear combinationof the models. 0 = intercept; 1, 2, and 3 denote the β1, β2, and β3 values,respectively

0 1 2 3−7

−6

−5

−4

−3

−2

−1

0

1

2

Bartz-Beielstein MBO 80 / 94

Page 81: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

SPO2 Part 2 Experiment 4: Linear-Sine Function

f4: R2 Values [jupyter]I R2 (larger values are better) and standard deviation.

I SPO: 0.74144195 0.05779718I L: 0.00651219 0.01489886I R: 0.75301025 0.05133169I G: 0.31721598 0.07939812

0 1 2 3

0.0

0.2

0.4

0.6

0.8

Bartz-Beielstein MBO 81 / 94

Page 82: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

SPO2 Part 2 Experiment 5: Linear Function

f5: Coefficients of the Level-1 Model [jupyter]

I The coefficients can be interpreted as weights in the linear combinationof the models. 0 = intercept; 1, 2, and 3 denote the β1, β2, and β3 values,respectively

0 1 2 3−0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Bartz-Beielstein MBO 82 / 94

Page 83: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

SPO2 Part 2 Experiment 5: Linear Function

f5: R2 Values [jupyter]I R2 (larger values are better) and standard deviation.

I SPO: 0.8362937 0.02381472I L: 0.8362937 0.02381472I R: 0.83628043 0.02374492I G: 0.8362937 0.02381472

0 1 2 30.76

0.78

0.80

0.82

0.84

0.86

0.88

0.90

Bartz-Beielstein MBO 83 / 94

Page 84: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

SPO2 Part 2 Experiment 6: Random Noise (normal)

f5: Coefficients of the Level-1 Model [jupyter]

I The coefficients can be interpreted as weights in the linear combinationof the models. 0 = intercept; 1, 2, and 3 denote the β1, β2, and β3 values,respectively

0 1 2 3−12

−10

−8

−6

−4

−2

0

2

Bartz-Beielstein MBO 84 / 94

Page 85: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

SPO2 Part 2 Experiment 6: Random Noise (normal)

f5: R2 Values [jupyter]I R2 (larger values are better) and standard deviation.

I SPO: -0.02025601 0.10308039I L: -0.00035958 0.01505964I R: 0.3586063 0.06232495I G: 0.10037904 0.05356867

0 1 2 3−0.3

−0.2

−0.1

0.0

0.1

0.2

0.3

0.4

0.5

Bartz-Beielstein MBO 85 / 94

Page 86: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

SPO2 Part 2 Experiment 6: Random Noise (normal)

End: Jupyter Interactive Document

I End of the interactive jupyter notebook [Pérez and Granger, 2007]I The next slides (88 - 93) are based on statics slides

BIOMA

Bartz-Beielstein MBO 86 / 94

Page 87: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stacking: Considerations

Overview

Introduction

Stochastic Search Algorithms

Quality Criteria: How to Select Surrogates

Examples

Ensembles: Considerations

SPO2 Part 2

Stacking: Considerations

Bartz-Beielstein MBO 87 / 94

Page 88: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stacking: Considerations

Data

Training:{(Xi, yi)}i=1,...n{(Xi, yi)}i=1,...n

{yti}{yti}

A1A1 A2A2Val_Training:{(Xi, yi)}i=1,...q{(Xi, yi)}i=1,...q

Val_Test:{(Xi, yi)}i=q+1,...n{(Xi, yi)}i=q+1,...n

AV1AV1 AV

2AV2

Val_Test:{(Xi)}i=q+1,...n{(Xi)}i=q+1,...n

Val_Test:{(yi)}i=q+1,...n{(yi)}i=q+1,...nyAV

1yAV

1

yAV2

yAV2

A3A3 A(y,y)3A(y,y)3

ATr2ATr2ATr

1ATr1

yAT r1

yAT r1

yAT r2

yAT r2

yy

{Xti}{Xti}

Test:{(Xti, yti)}i=1,...,m{(Xti, yti)}i=1,...,m

Bartz-Beielstein MBO 88 / 94

Page 89: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stacking: Considerations

Blending and Meta-Meta Models

I Continue with the summary of van Veen [2015]:I Blending is a word introduced by the Netflix winnersI Instead of creating out-of-fold predictions for the train set: use a small

holdout set of say 10% of the train setI Benefit: Simpler than stackingI Con: The final model may overfit to the holdout set

I Further improvements: Combine multiple ensembled modelsI Use averaging or voting on manually-selected well-performing ensembles:

I Start with a base ensembleI Add a model when it increases the train set score the mostI By allowing put-back of models, a single model may be picked multiple times

(weighting)I Use of genetic algorithms and CV-scores as the fitness function

I van Veen [2015] proposes a fully random method:I Create a 100 or so ensembles from randomly selected ensembles (without

placeback)I Then pick the highest scoring model

Bartz-Beielstein MBO 89 / 94

Page 90: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stacking: Considerations

“Frankenstein Ensembles"

I Why stacking and combining 1000s of models and computational hours?I van Veen [2015] lists the following pros for these “monster models":

I Win competitions (Netflix, Kaggle, . . .)I Beat most state-of-the-art academic benchmarks with a single approachI Transfer knowledge from the ensemble back to a simpler shallow modelI Loss of one model is not fatal for creating good predictionsI Automated large ensembles don’t require much tuning or selectionI 1% increase in accuracy may push an investment fund from making a loss,

into making a little less loss. More seriously: Improving healthcare screeningmethods helps save lives

Bartz-Beielstein MBO 90 / 94

Page 91: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stacking: Considerations

Video Sequence

I The following part of this talk refers to a sequence from the video “GeraldJay Sussman on Flexible Systems, The Power of Generic Operations”,which is available on https://vimeo.com/151465912

I 1:01:42 - 1:04:25 (to be deleted from the videolectures.net version)

BIOMA

Bartz-Beielstein MBO 91 / 94

Page 92: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stacking: Considerations

Summary: Structure and Interpretation of ComputerPrograms (SICP)

I PROGRAMMING BY POKING: WHY MIT STOPPED TEACHING SICPhttp://www.posteriorscience.net/?p=206

I 1. Hal Abelson and Gerald Jay Sussman and got tired of teaching itI 2. SICP curriculum no longer prepared engineers for what engineering is

like todayI In the 80s and 90s, engineers built complex systems by combining simple

and well-understood partsI Programming today is

” [m]ore like science. You grab this piece of library and you pokeat it. You write programs that poke it and see what it does. Andyou say, ‘Can I tweak it to do the thing I want?’”

I MIT chose Python as an alternative for SCIP

Bartz-Beielstein MBO 92 / 94

Page 93: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stacking: Considerations

Summary

I Keywords:I Abelson & Sussmann XI Wolpert XI Netflix XI Deep learning X

I Related report “Stacked Generalization of Surrogate Models—A PracticalApproach” [Bartz-Beielstein, 2016]

I More: http://www.spotseven.de

Bartz-Beielstein MBO 93 / 94

Page 94: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stacking: Considerations

Acknowledgement

I This work has been supported by the Bundesministeriums für Wirtschaftund Energie under the grants KF3145101WM3 und KF3145103WM4.

I This work is part of a project that has received funding from the EuropeanUnion’s Horizon 2020 research and innovation program under grantagreement No 692286.

Bartz-Beielstein MBO 94 / 94

Page 95: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

References

Alessandro Baldi Antognini and Maroussa Zagoraiou. Exact optimal designsfor computer experiments via Kriging metamodelling. Journal of StatisticalPlanning and Inference, 140(9):2607–2617, September 2010.

Thomas Bartz-Beielstein. Experimental Analysis of EvolutionStrategies—Overview and Comprehensive Introduction. Technical report,November 2003.

Thomas Bartz-Beielstein. Stacked Generalization of Surrogate Models—APratical Approach. Technical Report 05/2016, Cologne Open Science,Cologne, 2016. URL https://cos.bibl.th-koeln.de/solrsearch/index/search/searchtype/series/id/8.

Thomas Bartz-Beielstein, Christian Lasarczyk, and Mike Preuß. SequentialParameter Optimization. In B McKay et al., editors, Proceedings 2005Congress on Evolutionary Computation (CEC’05), Edinburgh, Scotland,pages 773–780, Piscataway NJ, 2005. IEEE Press.

Andrew J Booker, J E Dennis Jr, Paul D Frank, David B Serafini, and VirginiaTorczon. Optimization Using Surrogate Objectives on a Helicopter TestExample. In Computational Methods for Optimal Design and Control, pages49–58. Birkhäuser Boston, Boston, MA, 1998.

G E P Box and N R Draper. Empirical Model Building and ResponseSurfaces. Wiley, New York NY, 1987.

Bartz-Beielstein MBO 94 / 94

Page 96: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

References

J Branke and C Schmidt. Faster convergence by means of fitness estimation.Soft Computing, 9(1):13–20, January 2005.

L Breiman, J H Friedman, R A Olshen, and C J Stone. Classification andRegression Trees. Wadsworth, Monterey CA, 1984.

Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.ISSN 1573-0565. doi: 10.1023/A:1018054314350. URLhttp://dx.doi.org/10.1023/A:1018054314350.

D Büche, N N Schraudolph, and P Koumoutsakos. Accelerating EvolutionaryAlgorithms With Gaussian Process Fitness Function Models. IEEETransactions on Systems, Man and Cybernetics, Part C (Applications andReviews), 35(2):183–194, May 2005.

Ivo Couckuyt, Filip De Turck, Tom Dhaene, and Dirk Gorissen. Automaticsurrogate model type selection during the optimization of expensiveblack-box problems. In 2011 Winter Simulation Conference - (WSC 2011),pages 4269–4279. IEEE, 2011.

Michael Emmerich, Alexios Giotis, Mutlu özdemir, Thomas Bäck, andKyriakos Giannakoglou. Metamodel-assisted evolution strategies. InJ J Merelo Guervós, P Adamidis, H G Beyer, J L Fernández-Villacañas, andH P Schwefel, editors, Parallel Problem Solving from Nature—PPSN VII,

Bartz-Beielstein MBO 94 / 94

Page 97: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

References

Proceedings~Seventh International Conference, Granada, pages 361–370,Berlin, Heidelberg, New York, 2002. Springer.

Alexander Forrester, András Sóbester, and Andy Keane. Engineering Designvia Surrogate Modelling. Wiley, 2008.

Alexander I J Forrester and Andy J Keane. Recent advances insurrogate-based optimization. Progress in Aerospace Sciences, 45(1-3):50–79, January 2009.

K B Wilson G E P Box. On the Experimental Attainment of OptimumConditions. Journal of the Royal Statistical Society. Series B(Methodological), 13(1):1–45, 1951.

K C Giannakoglou. Design of optimal aerodynamic shapes using stochasticoptimization methods and computational intelligence. Progress inAerospace Sciences, 38(1):43–76, January 2002.

Tushar Goel, Raphael T Haftka, Wei Shyy, and Nestor V Queipo. Ensemble ofsurrogates. Struct. Multidisc. Optim., 33(3):199–216, September 2006.

Robert B Gramacy. tgp: An R Package for Bayesian Nonstationary,Semiparametric Nonlinear Regression and Design by Treed GaussianProcess Models. Journal of Statistical Software, 19(9):1–46, June 2007.

Bartz-Beielstein MBO 94 / 94

Page 98: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

References

P Hajela and E Lee. Topological optimization of rotorcraft subfloor structuresfor crashworthiness considerations. Computers & Structures, 64(1-4):65–76, July 1997.

Trevor Hastie. The elements of statistical learning : data mining, inference,and prediction. Springer, New York, 2nd ed. edition, 2009.

Mark Hauschild and Martin Pelikan. An introduction and survey of estimationof distribution algorithms. Swarm and Evolutionary Computation, 1(3):111–128, September 2011.

Jiaqiao Hu, Yongqiang Wang, Enlu Zhou, Michael C Fu, and Steven I Marcus.A Survey of Some Model-Based Methods for Global Optimization. In DanielHernández-Hernández and J Adolfo Minjárez-Sosa, editors, Optimization,Control, and Applications of Stochastic Systems, pages 157–179.Birkhäuser Boston, Boston, 2012.

Edward Huang, Jie Xu, Si Zhang, and Chun Hung Chen. Multi-fidelity ModelIntegration for Engineering Design. Procedia Computer Science, 44:336–344, 2015.

Frank Hutter, Holger Hoos, and Kevin Leyton-Brown. An Evaluation ofSequential Model-based Optimization for Expensive Blackbox Functions. InProceedings of the 15th Annual Conference Companion on Genetic and

Bartz-Beielstein MBO 94 / 94

Page 99: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

References

Evolutionary Computation, pages 1209–1216, New York, NY, USA, 2013.ACM.

R Jin, W Chen, and T W Simpson. Comparative studies of metamodellingtechniques under multiple modelling criteria. Struct. Multidisc. Optim., 23(1):1–13, December 2001.

Y Jin. A comprehensive survey of fitness approximation in evolutionarycomputation. Soft Computing, 9(1):3–12, October 2003.

Y Jin, M Olhofer, and B Sendhoff. On Evolutionary Optimization withApproximate Fitness Functions. GECCO, 2000.

D R Jones, M Schonlau, and W J Welch. Efficient Global Optimization ofExpensive Black-Box Functions. Journal of Global Optimization, 13:455–492, 1998.

Jack P C Kleijnen. Kriging metamodeling in simulation: A review. EuropeanJournal of Operational Research, 192(3):707–716, February 2009.

P Larraaga and J A Lozano. Estimation of Distribution Algorithms. A New Toolfor Evolutionary Computation. Kluwer, Boston MA, 2002.

Minh Nghia Le, M N Le, Yew Soon Ong, Y S Ong, S Menzel, Stefan Menzel,Yaochu Jin, Y Jin, B Sendhoff, and Bernhard Sendhoff. Evolution byadapting surrogates. Evolutionary Computation, 21(2):313–340, 2013.

Bartz-Beielstein MBO 94 / 94

Page 100: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

References

S N Lophaven, H B Nielsen, and J Søndergaard. DACE—A Matlab KrigingToolbox. Technical report, 2002.

J Mockus, V Tiesis, and A Zilinskas. Bayesian Methods for Seeking theExtremum. In L C W Dixon and G P Szegö, editors, Towards GlobalOptimization, pages 117–129. Amsterdam, 1978.

D C Montgomery. Design and Analysis of Experiments. Wiley, New York NY,5th edition, 2001.

K P Murphy. Machine learning: a probabilistic perspective, 2012.Andrea Nelson, Juan Alonso, and Thomas Pulliam. Multi-Fidelity

Aerodynamic Optimization Using Treed Meta-Models. In Fluid Dynamicsand Co-located Conferences. American Institute of Aeronautics andAstronautics, Reston, Virigina, June 2007.

Emanuele Olivetti. blend.py, 2012. Code available on https://github.com/emanuele/kaggle_pbr/blob/master/blend.py.Published under a BSD 3 license.

Fernando Pérez and Brian E. Granger. IPython: a system for interactivescientific computing. Computing in Science and Engineering, 9(3):21–29,May 2007. ISSN 1521-9615. doi: 10.1109/MCSE.2007.53. URLhttp://ipython.org.

MJD Powell. Radial Basis Functions. Algorithms for Approximation, 1987.Bartz-Beielstein MBO 94 / 94

Page 101: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

References

Mike Preuss. Multimodal Optimization by Means of Evolutionary Algorithms.Natural Computing Series. Springer International Publishing, Cham, 2015.

F Pukelsheim. Optimal Design of Experiments. Wiley, New York NY, 1993.Nestor V Queipo, Raphael T Haftka, Wei Shyy, Tushar Goel, Rajkumar

Vaidyanathan, and P Kevin Tucker. Surrogate-based analysis andoptimization. Progress in Aerospace Sciences, 41(1):1–28, January 2005.

Alain Ratle. Parallel Problem Solving from Nature — PPSN V: 5thInternational Conference Amsterdam, The Netherlands September 27–30,1998 Proceedings. pages 87–96. Springer Berlin Heidelberg, Berlin,Heidelberg, 1998.

Margarita Alejandra Rebolledo Coy, Sebastian Krey, Thomas Bartz-Beielstein,Oliver Flasch, Andreas Fischbach, and Jörg Stork. Modeling andOptimization of a Robust Gas Sensor. Technical Report 03/2016, CologneOpen Science, Cologne, 2016.

E Sanchez, S Pintos, and N V Queipo. Toward an Optimal Ensemble ofKernel-based Approximations with Engineering Applications. In The 2006IEEE International Joint Conference on Neural Network Proceedings,pages 2152–2158. IEEE, 2006.

T J Santner, B J Williams, and W I Notz. The Design and Analysis ofComputer Experiments. Springer, Berlin, Heidelberg, New York, 2003.

Bartz-Beielstein MBO 94 / 94

Page 102: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

References

M Schonlau. Computer Experiments and Global Optimization. PhD thesis,University of Waterloo, Ontario, Canada, 1997.

L Shi and K Rasheed. A Survey of Fitness Approximation Methods Applied inEvolutionary Algorithms. In Computational Intelligence in ExpensiveOptimization Problems, pages 3–28. Springer Berlin Heidelberg, Berlin,Heidelberg, 2010.

Timothy Simpson, Vasilli Toropov, Vladimir Balabanov, and Felipe Viana.Design and Analysis of Computer Experiments in Multidisciplinary DesignOptimization: A Review of How Far We Have Come - Or Not. In 12thAIAA/ISSMO Multidisciplinary Analysis and Optimization Conference,pages 1–22, Reston, Virigina, June 2012. American Institute of Aeronauticsand Astronautics.

G Sun, G Li, S Zhou, W Xu, X Yang, and Q Li. Multi-fidelity optimization forsheet metal forming process. Structural and Multidisciplinary . . . , 2011.

M J van der Laan and E C Polley. Super Learner in Prediction. UC BerkeleyDivision of Biostatistics Working Paper . . . , 2010.

Henk van Veen. Using Ensembles in Kaggle Data Science Competitions,2015. URL http://www.kdnuggets.com/2015/06/ensembles-kaggle-data-science-competition-p1.html.

V N Vapnik. Statistical learning theory. Wiley, 1998.Bartz-Beielstein MBO 94 / 94

Page 103: A Survey of Model-based Methods for Global Optimizationsynergy-twinning.eu/files/bart16ccomplete-170104-1438.pdf · 2017-01-04 · Introduction Problem I Optimization problem given

Stacking: Considerations

G Gary Wang and S Shan. Review of Metamodeling Techniques in Support ofEngineering Design Optimization. Journal of Mechanical . . . , 129(4):370–380, 2007.

David H Wolpert. Stacked generalization. Neural Networks, 5(2):241–259,January 1992.

Luis E Zerpa, Nestor V Queipo, Salvador Pintos, and Jean-Louis Salager. Anoptimization methodology of alkaline–surfactant–polymer floodingprocesses using field scale numerical simulation and multiple surrogates.Journal of Petroleum Science . . . , 47(3-4):197–208, June 2005.

Z Zhou, Y S Ong, P B Nair, A J Keane, and K Y Lum. Combining Global andLocal Surrogate Models to Accelerate Evolutionary Optimization. IEEETransactions on Systems, Man and Cybernetics, Part C (Applications andReviews), 37(1):66–76, 2007.

Mark Zlochin, Mauro Birattari, Nicolas Meuleau, and Marco Dorigo.Model-Based Search for Combinatorial Optimization: A Critical Survey.Annals of Operations Research, 131(1-4):373–395, 2004.

J M Zurada. Analog implementation of neural networks. IEEE Circuits andDevices Magazine, 8(5):36–41, 1992.

Bartz-Beielstein MBO 94 / 94