an efficient class of direct search surrogate methods for ...galois, inc., 421 sw 6th ave. suite...

12
Struct Multidisc Optim (2012) 45:53–64 DOI 10.1007/s00158-011-0658-3 RESEARCH PAPER An efficient class of direct search surrogate methods for solving expensive optimization problems with CPU-time-related functions Mark A. Abramson · Thomas J. Asaki · John E. Dennis Jr. · Raymond Magallanez Jr. · Matthew J. Sottile Received: 26 July 2010 / Revised: 1 April 2011 / Accepted: 11 April 2011 / Published online: 28 May 2011 c Springer-Verlag (outside the USA) 2011 Abstract In this paper, we characterize a new class of com- putationally expensive optimization problems and introduce an approach for solving them. In this class of problems, objective function values may be directly related to the computational time required to obtain them, so that, as the optimal solution is approached, the computational time required to evaluate the objective is significantly less than at points farther away from the solution. This is moti- vated by an application in which each objective function M. A. Abramson (B ) The Boeing Company, PO Box 3707, MC 7L-21, Seattle, WA 98124-2207, USA e-mail: [email protected] T. J. Asaki Department of Mathematics, Washington State University, PO Box 643113, Neill Hall 103, Pullman, WA 99164-3113, USA e-mail: [email protected] URL: http://geometricanalysis.org/TomAsaki J. E. Dennis Jr. Department of Computational and Applied Mathematics, RiceUniversity, 8419 42nd Avenue SW, Seattle, WA 98136-2360, USA e-mail: [email protected] URL: http://www.rice.edu/dennis R. Magallanez Jr. Department of Mathematical Sciences, United States Air Force Academy, Colorado Springs, CO, USA e-mail: [email protected] M. J. Sottile Galois, Inc., 421 SW 6th Ave. Suite 300, Portland, OR 97204, USA e-mail: [email protected] evaluation requires both a numerical fluid dynamics sim- ulation and an image registration process, and the goal is to find the parameter values of a predetermined reference image by comparing the flow dynamics from the numeri- cal simulation and the reference image through the image comparison process. In designing an approach to numeri- cally solve the more general class of problems in an efficient way, we make use of surrogates based on CPU times of previously evaluated points, rather than their function val- ues, all within the search step framework of mesh adaptive direct search algorithms. Because of the expected positive correlation between function values and their CPU times, a time cutoff parameter is added to the objective function evaluation to allow its termination during the compari- son process if the computational time exceeds a specified threshold. The approach was tested using the NOMADm and DACE MATLAB software packages, and results are presented. Keywords Surrogate optimization · Derivative-free optimization · Black box optimization · Mesh Adaptive Direct Search (MADS) · Pattern search · Image registration · Kriging 1 Introduction In this paper, we introduce a new class of optimization problems and a novel approach for numerically solving them. This class consists of minimizing an objective func- tion f : X R n R that is computationally expensive to evaluate, but becomes significantly less so as the solu- tion is approached. That is, there is a reasonably strong correlation between objective function values and the CPU

Upload: others

Post on 12-Sep-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An efficient class of direct search surrogate methods for ...Galois, Inc., 421 SW 6th Ave. Suite 300, Portland, OR 97204, USA e-mail: matt@galois.com evaluation requires both a numerical

Struct Multidisc Optim (2012) 45:53–64DOI 10.1007/s00158-011-0658-3

RESEARCH PAPER

An efficient class of direct search surrogate methods for solvingexpensive optimization problems with CPU-time-related functions

Mark A. Abramson · Thomas J. Asaki ·John E. Dennis Jr. · Raymond Magallanez Jr. ·Matthew J. Sottile

Received: 26 July 2010 / Revised: 1 April 2011 / Accepted: 11 April 2011 / Published online: 28 May 2011c© Springer-Verlag (outside the USA) 2011

Abstract In this paper, we characterize a new class of com-putationally expensive optimization problems and introducean approach for solving them. In this class of problems,objective function values may be directly related to thecomputational time required to obtain them, so that, asthe optimal solution is approached, the computational timerequired to evaluate the objective is significantly less thanat points farther away from the solution. This is moti-vated by an application in which each objective function

M. A. Abramson (B)The Boeing Company, PO Box 3707, MC 7L-21,Seattle, WA 98124-2207, USAe-mail: [email protected]

T. J. AsakiDepartment of Mathematics, Washington State University,PO Box 643113, Neill Hall 103, Pullman, WA 99164-3113, USAe-mail: [email protected]: http://geometricanalysis.org/TomAsaki

J. E. Dennis Jr.Department of Computational and Applied Mathematics,Rice University, 8419 42nd Avenue SW, Seattle,WA 98136-2360, USAe-mail: [email protected]: http://www.rice.edu/∼dennis

R. Magallanez Jr.Department of Mathematical Sciences, United States Air ForceAcademy, Colorado Springs, CO, USAe-mail: [email protected]

M. J. SottileGalois, Inc., 421 SW 6th Ave. Suite 300,Portland, OR 97204, USAe-mail: [email protected]

evaluation requires both a numerical fluid dynamics sim-ulation and an image registration process, and the goal isto find the parameter values of a predetermined referenceimage by comparing the flow dynamics from the numeri-cal simulation and the reference image through the imagecomparison process. In designing an approach to numeri-cally solve the more general class of problems in an efficientway, we make use of surrogates based on CPU times ofpreviously evaluated points, rather than their function val-ues, all within the search step framework of mesh adaptivedirect search algorithms. Because of the expected positivecorrelation between function values and their CPU times,a time cutoff parameter is added to the objective functionevaluation to allow its termination during the compari-son process if the computational time exceeds a specifiedthreshold. The approach was tested using the NOMADmand DACE MATLAB� software packages, and results arepresented.

Keywords Surrogate optimization · Derivative-freeoptimization · Black box optimization · Mesh AdaptiveDirect Search (MADS) · Pattern search ·Image registration · Kriging

1 Introduction

In this paper, we introduce a new class of optimizationproblems and a novel approach for numerically solvingthem. This class consists of minimizing an objective func-tion f : X ⊂ R

n → R that is computationally expensiveto evaluate, but becomes significantly less so as the solu-tion is approached. That is, there is a reasonably strongcorrelation between objective function values and the CPU

Page 2: An efficient class of direct search surrogate methods for ...Galois, Inc., 421 SW 6th Ave. Suite 300, Portland, OR 97204, USA e-mail: matt@galois.com evaluation requires both a numerical

54 M.A. Abramson et al.

time required to compute them. We will refer to this classof problems as CPU-time-related. No assumption is maderegarding the precise nature of the correlation, since it willgenerally not be known. (This is in contrast with workin which the correlation is measured; e.g., see Romeroet al. 2008 and Marin et al. 2009.) Typically, the objec-tive function involves some type of engineering simulationor process modeling for which the computational time forsimulation runs on a large set of problem instances is eitherinfeasible (exceeds problem requirements) or impractical(months, years, or worse). While tackling the extreme com-putational requirements of large simulations is our ultimategoal, our primary motivation in this paper comes froman application whose objective function involves both afluid dynamics simulation and an image registration pro-cess, the latter of which becomes much less expensive closeto the solution. For this application, the feasible region Xis defined by simple bounds on the design variables, butthe approach described here is sufficiently general to coverproblems with general nonlinear constraints.

We note that other applications exist in which func-tion evaluations are positively correlated with their requiredCPU times. In general, certain parameter estimation prob-lems that require the numerical solution of differential equa-tions at each evaluated point may require fewer iterations(and hence less CPU time) when the parameters are close totheir true values. This property is also present in theater-level combat simulations, where one function evaluationrepresents a simulation given certain parameter settings. Inthe scenario of an enemy invasion, one common objectivewould be the employment of defensive forces in a way thatminimizes enemy penetration (in distance). Enemy penetra-tion is often highly correlated (though not perfectly) withthe time it takes to stop the penetration. Since the expensivesimulation would typically run until the enemy advance ishalted (or shortly thereafter), the computational expense ofan objective function evaluation (i.e., one simulation run) ishighly correlated with the objective function value.

The positive correlation between objective function valueand the computational time leads us to a solution approachthat integrates CPU runtime measures into the optimiza-tion process, allowing us to better utilize computationalresources and significantly reduce computational time.Because of the computational expense of the function eval-uations, our approach also involves the iterative use ofsurrogates. In this context, a surrogate can be thought ofas a much less expensive replacement for, but not necessar-ily a good approximation to, the objective function. In fact,in this paper we introduce surrogates based on CPU timesin addition to those based on objective function values. Analternative approach is to make use of coarser representa-tions of the simulation codes when further away from asolution (Bethea 2008). The present work is independent

of that course of action; i.e., there is no reason why bothideas could not be implemented together (a topic for futureresearch). For this reason, we fix the model fidelity hereand confine our focus to exploiting the relationship betweenobjective function value and CPU time; we hope to addressmodel fidelity in a separate paper.

To maintain rigorous convergence properties of ourapproach, the use of surrogates is incorporated into thesearch step of the class of mesh adaptive direct search(MADS) algorithms. This is consistent with the surro-gate management framework (SMF) introduced by Bookeret al. 1999. This is an important distinction, as the poll stepof the algorithm ensures convergence to a point satisfyingcertain necessary conditions for optimality, while the searchstep (which makes use of surrogates) is only used to makethe process of convergence significantly more efficient forcomputationally expensive problems.

The remainder of the paper is as follows. In Section 2, wefurther motivate our work by discussing the details of ourapplication. We present the MADS algorithm in Section 3and discuss surrogates in more detail in Section 4, includingsome specific surrogate types and initialization strategies.In Section 5, we introduce new strategies for incorporatingsurrogates to efficiently solve our target class of problems.Numerical results on a specific instance of our applica-tion are given in Section 6, followed by some concludingremarks in Section 7.

2 An applicable class of optimization problems

The class of problems we target is motivated by an appli-cation in which each objective function evaluation requiresboth a fluid dynamics simulation and an image registra-tion process. We consider the fluid dynamics applicationof discovering optimal simulation model parameters whichmost accurately reproduce a given experimentally obtainedtemplate image. Image registration is used to measure thedegree to which any particular model output differs fromthis template data. For our initial study, we consider noisytemplate data constructed from simulation output.

The movement of fluids in a region � ⊂ Rn, n ∈ {2, 3}

is governed by the well-known Navier-Stokes equations:

∂tv + (v · ∇)v + ∇ p = 1

Re�v + (1 − β T )g,

∂ T

∂t+ v · ∇ T = 1

Re

1

Pr�T + q ′′′,

div v = 0,

where v is a velocity field on Rn , p is the pressure field in

�, g indicates body forces in �, Re ∈ R is the Reynolds

Page 3: An efficient class of direct search surrogate methods for ...Galois, Inc., 421 SW 6th Ave. Suite 300, Portland, OR 97204, USA e-mail: matt@galois.com evaluation requires both a numerical

An efficient class of direct search surrogate methods for solving expensive optimization problems with CPU-time-related functions 55

number of the flow, Pr ∈ R is Prandtl number of the flow,β ∈ R is the coefficient of thermal expansion, q ′′′ is the heatsource, T is the temperature, and � denotes the Laplaceoperator (sum of the unmixed second partial derivatives).

Our test example is that of the well-known lid-drivencavity problem (Griebel et al. 1998), in which an initiallystationary 2d fluid in a rectangular container is subject toforces imposed by the top boundary (lid) moving at a uni-form horizontal velocity. This causes a circular pattern offlow to appear within the fluid over time. Since the Navier-Stokes equations cannot be solved analytically, they mustbe solved numerically using a finite element method andassociated finite differencing scheme. We use the methodof Griebel et al. (1998). Figure 1(a) shows a snapshot ofthe fluid velocity at some positive time. Figure 1(b) showsthe corresponding representation of the heat function H ,which we will use as reference data. The heat functiondefines the two-dimensional heat flux �q = ∇ × H , andis analogous to the hydrodynamic stream function whichdefines the (two-dimensional) velocity. For each combina-tion of Reynolds number and simulation length (or time),the velocity and viscosity of the fluid form a different cir-cular heat flux pattern throughout the region. The goal willbe to recover from the snapshot the (unknown) Reynoldsnumber and simulation length.

Image registration is the process of estimating someoptimal transformation u∗ between two images. Thus, atransformation u is realized as a path through the spaceof images. A particular choice of u will depend on theneeds of the application. For example, in medical imagingit is desirable to compare images with minimal distortion,∇ × u. Other image comparison tasks benefit from min-imizing the work required to “move” the intensities fromone image to another. Different types of transformations aredescribed in Modersitzki (2004). If we consider the classicalinner product space L2(�) of squared Lebesgue-integrablefunctions with its standard induced norm, a transformationof an image T is given by Tu(x) = T (x − u(x)), whereu(x) is the displacement of the point x . The objective isto minimize the distance between a reference image R anda template image T through an optimal warp transforma-tion Tu ∈ L2(�), as defined by some distance measurement

D, and a smoothing or regularizing term S. This problem isgiven by

minu

D[R, Tu] + αS[u], (1)

where α > 0 governs the relative contributions of the twoterms. We choose to illustrate our techniques using thecurvature registration method of Fischer and Modersitzki(2003), where

D[R, Tu] = 1

2‖Tu − R‖L2(�), (2)

S[u] = 1

2

nd∑

i=1

(�ui )2 dx, (3)

and nd is the dimension of the data space �. Equation (3)can be viewed as an approximation to the curvature ofthe transformation u. The optimal transformation u∗ is theone that simultaneously minimizes image differences andtransformation curvature.

The Euler-Lagrange equation for (1) is

(Tu(x) − R(x)) ∇Tu(x) + α�2u(x) = 0, x ∈ �, (4)

where we have applied Neumann boundary conditions∇ui = ∇�ui = 0, i = 1, 2, . . . , nd , on the boundary ∂�

of �. One method of solution is to take the associatedtime-dependent equation,

ut (x, t) + α�2u(x, t) = (R(x) − Tu(x, t))∇Tu(x, t), (5)

and compute the steady state solution u∗t (x, t) = 0 using the

iteration,

u(k+1)(x, t) = u(k)(x, t) + τu(k)t (x, t), (6)

with appropriately chosen artificial time step τ .Figure 2 shows an example of an image registration for

images of the temperature of two different parameteriza-tions of a fluid flow simulation. The top left picture is the

Fig. 1 Example time snapshotsof a lid-driven cavity simulation.The velocity field (a) at sometime t > 0 shows a vorticalstructure. The heat function (b)at time t serves as our examplereference image

Page 4: An efficient class of direct search surrogate methods for ...Galois, Inc., 421 SW 6th Ave. Suite 300, Portland, OR 97204, USA e-mail: matt@galois.com evaluation requires both a numerical

56 M.A. Abramson et al.

Fig. 2 Image registrationexample. The noisy intensitiesof the template image (b) aretransformed into a registeredimage (c) intended to match areference image (a). Theoptimal transformation u∗ isdetermined by (1). The residualR − Tu is shown in (d)

reference image R for a specific set of (unknown) parametervalues (e.g., Reynolds number of the fluid). For a differentset of parameter values, the simulation is run, resulting inthe template image T shown in the top right picture. Theimage registration is then applied using (6) to solve (1)–(3).The resulting warped template image Tu and the differencebetween R and Tu are shown in the bottom left and rightimages, respectively.

Given pixel points {xi }n pi=1 ⊂ R

n , where n p is the numberof pixels in the image, the objective measure of the good-ness of simulation parameter choices is given by the optimaltransformation; namely,

f = ‖u∗ − u‖2, u = 1

n p

n p∑

i=1

u∗(xi ) (7)

The subtraction of the means in (7) allows for a zero objec-tive function value for images that are identical except fortranslational alignment issues. If the images are very sim-ilar, the numerical image registration scheme (6) requiresonly a few iterations to transform T into R, resulting in lesscomputational time as well as a small distance value. Onthe other hand, images that are relatively dissimilar requiremore iterations, increasing the computational time. Thus,the objective function value f and its associated computa-tional time are expected to be strongly correlated. We shouldalso point out that the computational cost of an image reg-istration is not trivial. When image deformation methodsare applied instead of simpler affine transformations, thecost due to the additional computational workload can besignificant.

3 Mesh Adaptive Direct Search (MADS)

The class of mesh adaptive direct search (MADS) algo-rithms was introduced by Audet and Dennis (2006) as away of extending generalized pattern search (GPS) (Audetand Dennis 2003; Lewis and Torczon 1999, 2000; Torczon1997) to optimization problems with nonlinear constraintswithout the use of penalty parameters (Lewis and Torczon2002) and with a stronger theoretical justification than isavailable with filters (Audet and Dennis 2004). Each iter-ation of GPS and MADS algorithms consists of a searchand a poll step performed on a mesh formed by a set of nD

directions D that positively spans Rn . These algorithms are

applied not to the objective function f , but to the barrierobjective fX ≡ f + ψ f , where ψ f is the indicator functionfor f ; it is zero at x ∈ X and ∞ for x /∈ X .

The mesh at iteration k, which is not actually constructed,can be expressed as

Mk =⋃

x∈Sk

{x + �mk Dz : z ∈ N

nD },

where Sk is the finite set of points where the objective func-tion f had been evaluated by the start of iteration k (soS0 is the set of initial feasible points), and the mesh sizeparameter �m

k controls how coarse or fine the mesh is.The search step is very flexible, as it consists of evalu-

ating fX at any finite number of mesh points. One couldchoose to sample randomly, sample at points generated byan experimental design, run a few iterations of a favoriteheuristic, such as a genetic algorithm, or simply do noth-ing. For computationally expensive functions, the searchstep usually consists of constructing (or recalibrating) surro-gate functions and solving a surrogate optimization problem

Page 5: An efficient class of direct search surrogate methods for ...Galois, Inc., 421 SW 6th Ave. Suite 300, Portland, OR 97204, USA e-mail: matt@galois.com evaluation requires both a numerical

An efficient class of direct search surrogate methods for solving expensive optimization problems with CPU-time-related functions 57

on the mesh. The resulting solution, along with any otherpromising points, is then evaluated by fX .

If the search step fails to yield a mesh point with lowerobjective function value, the poll step is performed. It con-sists of evaluating the set of adjacent mesh points Pk , calledthe poll set; namely,

Pk = {xk + �mk d : d ∈ Dk} ⊂ Mk,

where the current iterate xk is called the frame center, andDk is a positive spanning set of directions satisfying certainproperties that will ensure appropriate convergence prop-erties of the algorithm. In GPS methods, the condition ofDk ⊂ D must hold at each iteration, and Dk is cho-sen to include directions that conform to the boundary ofany nearby constraint (Lewis and Torczon 2000). This issufficient for problems with a finite number of linear con-straints because we include in D all possible conformingdirections at any point in the feasible region. However,for more general problems with nonlinear constraints, thiscondition is insufficient to ensure convergence.

In the more general class of MADS algorithms, a newparameter �

pk , called the poll size parameter is introduced,

which satisfies �mk ≤ �

pk for all k, and

limk∈K

�mk = 0 ⇔ lim

k∈K�

pk = 0

for any infinite subset of indices K . (8)

Under this construction, GPS now becomes the specificinstance of MADS with �k = �m

k = �pk . In MADS, Dk

is not a subset of D, but instead must satisfy the followingproperties:

• Each nonzero d ∈ Dk can be written as a nonnegativeinteger combination of the directions in D; i.e., d =Du for some vector u ∈ N

nD that may depend on theiteration number k.

• The distance from the frame center xk to a poll pointxk + �m

k d is bounded by a constant times the poll sizeparameter; i.e., �m

k ≤ ‖d‖ ≤ �pk max{‖d ′‖ : d ′ ∈ D}.

• Limits (as defined in Coope and Price 2000) of thenormalized sets Dk are positive spanning sets.

The idea in MADS is that �mk approaches zero faster

than �pk , which increases the number of possible directions

from which to select for inclusion in Dk . This is illus-trated in Fig. 3, where GPS and MADS frames (in rows1 and 2, respectively), constructed from the standard set of2n positive and negative standard coordinate directions, aredepicted in two dimensions. In each case, the thick-linedbox is called a frame (with frame center xk), and the pointswhere it intersects the mesh are at a relative distance of �

pk

from the frame center xk . In MADS, any set of positivespanning directions that yields mesh points inside the frame(e.g., p1, p2, and p3) may be chosen.

Fig. 3 GPS and MADS framesin R

2

Page 6: An efficient class of direct search surrogate methods for ...Galois, Inc., 421 SW 6th Ave. Suite 300, Portland, OR 97204, USA e-mail: matt@galois.com evaluation requires both a numerical

58 M.A. Abramson et al.

Fig. 4 A general MADSalgorithm

If either the search or poll succeeds in finding animproved mesh point, it becomes the new iterate xk+1 ∈ X ,and the mesh size is retained or increased. If neither stepsucceeds, then the current iterate is retained and the meshsize is reduced. More specifically, given a fixed rationalnumber τ > 1 and two integers w− ≤ −1 and w+ ≥ 0,�m

k is updated according to the rule

�mk+1 =τwk �m

k

for some wk∈

⎧⎪⎪⎨

⎪⎪⎩

{0, 1, . . . , w+}if an improved mesh point is found,

{w−, w− + 1, . . . , −1}otherwise.

(9)

A general MADS algorithm is given in Fig. 4.Convergence of MADS depends on selecting direc-

tions so that the union of normalized poll directions usedbecomes asymptotically dense on the unit sphere (Audet andDennis 2006). Specific instances of MADS have beenshown to achieve this condition with probability one (byrandomly selecting Dk) (Audet and Dennis 2006) anddeterministically (through the use of Halton sequences)(Abramson et al. 2009).

More details, including proofs of convergence to appro-priately defined first-order stationary points (even for non-smooth functions), are given for GPS and MADS inAudet and Dennis (2003) and (2006), respectively. Second-order stationarity results for GPS and MADS are studiedin Abramson (2005) and Abramson and Audet (2006),respectively.

4 Surrogate functions

The idea for surrogates first appeared as “approximationconcepts” in the work of Schmit and Miura (1976). Bookeret al. (1999) characterize a class of problems for which sur-rogate functions would be an appropriate approach, suggesta surrogate composition, and set forth a general Surro-gate Management Framework (SMF) for using surrogatesto numerically solve optimization problems. More infor-mation about surrogate-based optimization can be found inForrester and Keane (2009), Viana et al. (2010a), and Wangand Shan (2007).

Most surrogates are one of two types: simplified physicsor response-based. A simplified physics model, also knownas a low-fidelity model, makes certain physical assumptionsthat significantly reduce the computational cost by eliminat-ing complex equations and even the number of variables.Although several novel approaches exist in the literature fortreating this class of surrogates (e.g., see Alexandrov et al.1998 or Robinson et al. 2006), the actual construction of themodels is problem-dependent.

To handle our target class of problems, we use as sur-rogates a class of response-based kriging approximationmodels (Sacks et al. 1989) (see also, Kleijnen 2009, Martinand Simpson 2005, and Stein 1999). These are not the onlyresponse-based models, and in fact, there have even beencases where multiple surrogate types are applied to the samedata (Ginsbourger et al. 2008; Viana et al. 2010b; Viana andHaftka 2008; Voutchkov and Keane 2006) in an effort toimprove performance.

In kriging, given a set of known data points (or sites){si }ns

i=1 ⊂ Rn and their deterministic response or function

values ys ∈ Rns (i.e., [ys]i = y(si ) for i = 1, 2, . . . , ns),

the deterministic function y(z) is modeled as a realizationof a stochastic process,

Y (z) =p∑

j=1

β jφ j (z) + Z(z) = βT φ(z) + Z(z),

where Y (z) is the sum of a regression model withcoefficients β = [β1, β2, . . . , βp] ∈ R

p and basis func-tions φ = [φ1, φ2, . . . , φp], and a random variable Z(z),where Z : R

n −→ R, with mean zero and covarianceV (w, z) = σ 2 R(θ, w, z) between Z(w) and Z(z), σ 2 isthe process variance, and R(θ, w, z) is the correlation func-tion of w and z. The parameter θ ∈ R

n controls the shape ofthe correlation function. Kriging produces an approximatefunction value at an unknown point s0 ∈ R

n using weightson known responses; namely,

y(s0) = c(s0)T ys,

where c(s0) ∈ Rns is a vector of weights. Details for com-

puting c(s0) are given in Lophaven et al. (2002) or Sackset al. (1989), for example.

Page 7: An efficient class of direct search surrogate methods for ...Galois, Inc., 421 SW 6th Ave. Suite 300, Portland, OR 97204, USA e-mail: matt@galois.com evaluation requires both a numerical

An efficient class of direct search surrogate methods for solving expensive optimization problems with CPU-time-related functions 59

The parameter θ is set via an optimization process whichrequires the computation of R(θ, x, x)−1. However, this isdifficult numerically because R can become ill-conditionedas points cluster together during the convergence processof MADS (Booker 2000). The increase in the conditionnumber of R(θ, x, x) (which can be estimated and mon-itored) can greatly impact the computed value of θ orprevent the calculation of the regression coefficients alto-gether. Booker (2000) alleviates this problem in practice byintroducing a second correlation function for all the pointsthat are generated after the initial surrogate is formed. Healso suggests the use of additional correlation functions ifthe ill-conditioning recurs. An alternative approach for alle-viating numerical instability due to the clustering of pointswas studied in Martin (2010). For problems with CPU-time-related functions, since the clustering of points coincideswith smaller CPU times, we can also choose to simply skipthe search step whenever an estimate of the condition num-ber (based on Hager 1984; Higham and Tisseur 2000) of thekriging correlation matrix becomes too large.

5 New surrogate strategies

In this section, we introduce a new search step for handlingCPU-time-related functions. First, if objective function val-ues and CPU times are positively correlated, then improve-ment in the objective function should not be expected oncethe computational time exceeds a certain amount. To avoidwasting unnecessary CPU time, we introduce a CPU timecutoff parameter tcut

k > 0 to allow a function evaluationto be aborted if it is taking too long to perform. A valueof tcut

k = ∞ means that the function is evaluated normallywithout being aborted.

At each point x ∈ X that is evaluated by f , we recordits function value z = f (x) and the CPU time t = t (x)

required to evaluate f (x). For a specified time cutoff param-eter value of tcut

k , we can represent a function evaluationby [z, t] = f (x, tcut

k ). Once the time for computing the

function value exceeds the value specified by tcutk , evalu-

ation of f (x) is aborted without returning a value for z (orz is set to be infinity or an arbitrarily large number) andwith t set to tcut

k . One possible approach we considered isto set tcut

k+1 = αtk , α ≥ 1, where tk is the recorded CPUtime for the current best iterate xk (i.e., in our notation,[zk, tk] = f (xk, tcut

k ).The CPU time relation also means that a surrogate based

on either the objective function values or CPU times wouldprobably be a good predictor of decrease in the objective. Infact, a surrogate on the CPU time has the added advantagethat it always returns a value, whereas, the objective func-tion would be aborted if tcut

k is exceeded at iteration k. Wedenote these surrogates on objective values and CPU timesby fk(·) and tk(·) (at iteration k), respectively, and we con-sider the following four surrogate optimization problems,whose solutions we would expect to be good subsequenttrial points for the true objective function:

minx∈X

fk(x), (10)

minx∈X

tk(x), (11)

minx∈X

fk(x), s.t. tk(x) ≤ tcutk + ε, (12)

minx∈X

tk(x), s.t. fk(x) ≤ zk . (13)

The parameter ε > 0 added to the constraint in (12) isa small constant offset to allow for variability in compu-tational time. We do this to prevent the situation whereunfortunate variations in CPU time result in feasible pointsthat are flagged as infeasible with respect to (12).

We anticipate that the amount of correlation betweenfunction values and CPU times may be different fordifferent problems. If the correlation is not as strong, thenthe time-based surrogates will not predict improvement aswell. From the standpoint of convergence, the MADS pollstep overcomes poor correlation, but at a higher cost. Infact, the class of MADS algorithms has been shown tobe robust in solving problems, even when the surrogatepredicts poorly (Booker et al. 1998).

Fig. 5 MADS search step k for CPU-time-related functions

Page 8: An efficient class of direct search surrogate methods for ...Galois, Inc., 421 SW 6th Ave. Suite 300, Portland, OR 97204, USA e-mail: matt@galois.com evaluation requires both a numerical

60 M.A. Abramson et al.

The surrogate optimization problem is typically solvedusing a recursive call to MADS (though theoretically, thesurrogate optimization problem could be solved by manyoptimization codes). Constraints in (12)–(13) are treatedby the extreme barrier approach of only allowing feasiblepoints (or setting the function value at any infeasible trialpoint to infinity).

We should note that the combination of MADS with abarrier and the use of the tcut

k in the original optimizationproblem causes a dilemma when using surrogate functions.Using the parameter tcut

k to stop unprofitable function eval-uations is good for saving computational time, but it resultsin infinite or arbitrarily large function values, which can-not be used to construct a good surrogate. To overcome this,we simply set the function value to the largest value seenthus far in the iteration process, whenever the time cutoffthreshold is exceeded. Our algorithm can be summarizedas simply MADS with the specific kth search step given inFig. 5.

6 A numerical example

We now present some numerical results from a specificexample of the class of problems described in the Section 2,the lid-driven cavity problem (Griebel et al. 1998). Addi-tional results and discussion are given in Magallanez (2007).At one particular Reynolds number and simulation length, areference image of the heat flux pattern is captured and thennoise is added to the image, so as to represent what onemight to see in experimentally obtained physical measure-ments. As stated in Section 2, the goal is to run a simulationfor different Reynolds numbers and simulation lengths, cap-ture the template image, and compare the template andreference images of heat in an attempt to determine theoriginal Reynolds number and simulation length set for thereference image. Figure 2 actually shows reference andtemplate images for this very problem.

Since this problem has only bound constraints, weapplied GPS with the search step described by Fig. 5.To do this, we used two MATLAB� software packages,NOMADm (Abramson 2008) for the implementation ofMADS/GPS, and DACE (Nielsen 2007) to build the krig-ing surrogates (with some custom-built files) for the searchstep. The use of kriging functions as surrogates requiresspecification of the data sites, and regression and correla-tion functions. For constructing an initial surrogate, the setof initial data sites can be chosen via experimental design(Santner et al. 2003) or by sampling a set of “space-filling”points, such as Latin hypercube designs (Stein 1987; Tang1993) or orthogonal arrays (Owen 1992). Experimentaldesigns generally require more function evaluations, espe-cially in larger dimensions. However, since our problem is

of small dimension, we chose to use a 9-point central com-posite design (CCD). This also allowed us to use a second-order regression function for the kriging model, which wefound to be more accurate than lower-order functions.

The surrogate optimization problem was solved by arecursive call to MADS from within the search step. Anestimate of the condition number of the kriging correla-tion matrix was monitored using the MATLAB condestfunction, which is based on Hager (1984), Higham andTisseur (2000). The bounds on Reynolds number and simu-lation length (in seconds) were set to be [0, 5000] and [3, 8],respectively.

For each scenario, different variations of the algorithmare applied and compared to a base case. The base caseimplementation uses GPS with an empty search step (i.e.,it is skipped), a single initial point or set of CCD points,and tcut

k = ∞ for all k. This allows for a full evalua-tion of all points and a comparative analysis of the pro-posed algorithm. The other cases use the partial and fullimplementation of the search step presented in Fig. 5.

We first made some preliminary runs using a strategy ofsetting tcut

k+1 = tk at each iteration. However, these runsturned out to be unsuccessful in forming good surrogates(i.e., surrogates that routinely found good trial points toevaluate) because our main assumption of CPU time corre-lation turned out not to hold in certain parts of the domain.If the template image is too dissimilar from the referenceimage, the image registration process actually terminatesprematurely—with a lower CPU time and a much worseobjective function value. This is seen in Fig. 6, which showsthe image registration time and objective function value foreach trial point evaluated in the subsequent run, in whichwe set tcut

k+1 = 2tk at each iteration. This choice seemed torectify the situation. Note that we only include image regis-tration time here because the simulation time was roughlyconstant over all evaluated points, and for this particularproblem, the image registration time was actually the moredominant time.

We also experienced ill-conditioning of the correlationmatrix as a solution is approached, which is caused bytrial points becoming more clustered together. This wasremedied by invoking an empty search (i.e., not optimizingthe surrogate) whenever the matrix became ill-conditioned.This is not unreasonable in this case because the CPU timecorrelation means that function values are probably muchless expensive at this point in the iteration process, and theuse of surrogates is then not as important.

The results for each case are shown in Table 1, wherethe column headings denote, respectively, the type of searchstep executed (Search), type of initial points used (x0), finalsolution (x∗), number of iterations (nIter), number of func-tion evaluations (nFunc), CPU minutes required (CPU),and the ratio of successful to total surrogate search steps

Page 9: An efficient class of direct search surrogate methods for ...Galois, Inc., 421 SW 6th Ave. Suite 300, Portland, OR 97204, USA e-mail: matt@galois.com evaluation requires both a numerical

An efficient class of direct search surrogate methods for solving expensive optimization problems with CPU-time-related functions 61

Fig. 6 Time correlation for thelid-driven cavity problem

0

5

10

15

20

25

30

35

40

0 50 100 150 200 250Objective Function Value

Imag

e C

om

par

isio

n T

ime

(min

)

executed (Successes). For the latter, a success refers toa surrogate optimizer that actually improves the objectivefunction (as opposed to one that does not, due to poorprediction of the surrogate). Except for the “None” desig-nation, the search types refer to the four surrogate strategiesshown in (10)–(13). The first letter indicates the objectivefunction ( f (x) vs. t(x)), and the second (if present) indi-cates the constraint. The initial point types consist of usinga single initial point in the geometric center of the boundconstrained feasible region (center), a randomly chosen ini-tial feasible point (random), or central composite design(CCD). The final solution is expressed as [Reynolds num-ber, simulation length (seconds)]. The first three runs arebase cases with no search step or time cutoff parameter used,while the last seven cases are different variations of the newsearch step, the first three being similar to the base case (nosearch step) except for the use of the time cutoff param-eter to abort expensive function evaluations. (The random

point is the same random initial point that was used for thebase case.) The final four cases employ one of the surrogateoptimization problems (10)–(13), as just described.

All runs successfully found the optimal solution at essen-tially the same parameter values (with f (x∗) = 0.60 inall cases). As hoped for, the CPU time was significantlylower for the two time-based surrogates ((11) and (13)) thanall the other cases, despite costing more function evalua-tions than many other cases, including all three base cases.When using the tcut

k parameter and a CCD design, the time-based surrogates achieved a 32.3% reduction in CPU time.Since the time-based surrogates achieved this despite morefunction evaluations, this also indicates that for this class ofproblems, the number of function evaluations is not a goodmeasure of the efficiency of the different implementations.

The results with no search illustrate the importanceof finding an appropriate initial point to set up the timecutoff parameter. With only one initial point, this parameter

Table 1 GPS: lid-driven cavity results

Search x0 x∗ nIter nFunc CPU (min) Successes

Full-Time (tcutk = +∞):

None center [134.13, 4.76] 56 123 126.73

None random [134.12, 4.76] 58 118 178.31

None CCD [134.13, 4.76] 40 107 196.58

Cut-Time (tcutk = 2 × tk ):

Search x0 x∗ nIter nFunc CPU (min) Successes

None center [134.13, 4.76] 80 157 257.67

None random [134.12, 4.76] 92 162 796.40

None CCD [134.13, 4.76] 40 107 109.73

f s.t. t CCD [134.19, 4.76] 73 162 135.78 23/44

f CCD [134.19, 4.76] 72 165 182.89 15/34

t s.t. f CCD [134.19, 4.76] 52 133 74.00 7/15

t CCD [134.19, 4.76] 49 127 74.48 5/13

Page 10: An efficient class of direct search surrogate methods for ...Galois, Inc., 421 SW 6th Ave. Suite 300, Portland, OR 97204, USA e-mail: matt@galois.com evaluation requires both a numerical

62 M.A. Abramson et al.

Fig. 7 Decreasing functionvalue

needs several iterations to build enough slack to allow thesequence of points to overcome the local nature of the CPUtime correlation. The extra iterations resulted in a differentpath to a solution, which required significantly more CPUtime. However, when using a initial CCD (with an emptysearch step), the results are identical except for the CPUtime. In this case, using the time cutoff parameter savedalmost 90 CPU minutes.

Figure 7 further illustrates the performance of the foursurrogate strategies, with each color representing a differentstrategy (corresponding to (10)–(13)), and each shape (dia-mond or square) representing the source (search or pollstep, respectively) of the improvement. The figure showsthat the surrogates based on computational time achievelower function values quicker than the surrogates basedon function values. Furthermore, regardless of the surro-gate objective, the use of a constraint (see (12) and (13))in each case resulted in faster initial convergence thanthe corresponding unconstrained version ((10) and (11),respectively).

7 Concluding remarks

This paper represents a first attempt at numerically solv-ing the challenging class of optimization problems in whichfunction values and the CPU times required to computethem are correlated. Exploiting acquired knowledge aboutthe relationship between objective function values and theirCPU times appears to be a useful and efficient means ofsolving this class of problems. One challenge is dealing with

the extent to which the CPU time correlation property holdsin practice, which may not be fully understood. The imple-mentation of the time cutoff parameter was a useful wayto reduce the time required to find a numerical solution.However, while it can be used to stop the image registrationalgorithm, it cannot stop the numerical simulation, since theimage registration requires the image obtained from the fullsimulation.

Since tcutk is controlled by the user, one potential

improvement would be a more systematic approach toupdating it, rather than the trial-and-error approach usedhere. Since function values and computational times arestored in order to construct surrogates, they can also be usedto measure the correlation between the two. Higher valuesof tcut

k can be assigned whenever the correlation is low, andvice versa.

Not using the surrogates when ill-conditioning occurswas a simple tactic that made sense for the particular prob-lem we solved. However, a more effective means to combatthis problem may be the use of a trust region (e.g., seeAlexandrov et al. 1998), both to constrain the optimizationof the surrogate and to screen out points used in the con-struction of the surrogate. The size of the trust region couldbe based on the frame or mesh size parameter.

Acknowledgments The authors wish to thank David Bethea andtwo anonymous referees for some useful comments and discus-sions. Support for the first author was provided by Los AlamosNational Laboratory (LANL). Support for the third author was pro-vided by LANL, Air Force Office of Scientific Research F49620-01-1-0013, The Boeing Company, and ExxonMobil Upstream ResearchCompany.

Page 11: An efficient class of direct search surrogate methods for ...Galois, Inc., 421 SW 6th Ave. Suite 300, Portland, OR 97204, USA e-mail: matt@galois.com evaluation requires both a numerical

An efficient class of direct search surrogate methods for solving expensive optimization problems with CPU-time-related functions 63

The views expressed in this document are those of the authors anddo not reflect the official policy or position of the United States AirForce, Department of Defense, United States Government, or corporateaffiliations of the authors.

References

Abramson MA (2005) Second-order behavior of pattern search. SIAMJ Optim 16(2):315–330

Abramson MA (2008) NOMADm optimization software.http://www.gerad.ca-NOMAD-Abramson-NOMADm.html

Abramson MA, Audet C (2006) Second-order convergence of meshadaptive direct search. SIAM J Optim 17(2):606–619

Abramson MA, Audet C, Dennis JE Jr, Le Digabel S (2009)OrthoMADS: a deterministic instance with orthogonal directions.SIAM J Optim 20(2):948–966

Alexandrov N, Dennis JE Jr, Lewis R, Torczon V (1998) A trustregion framework for managing the use of approximation modelsin optimization. Struct Optim 15:16–23

Audet C, Dennis JE Jr (2003) Analysis of generalized pattern searches.SIAM J Optim 13(3):889–903

Audet C, Dennis JE Jr (2004) A pattern search filter method for nonlin-ear programming without derivatives. SIAM J Optim 14(4):980–1010

Audet C, Dennis JE Jr (2006) Mesh adaptive direct search algo-rithms for constrained optimization. SIAM J Optim 17(2):188–217

Bethea D (2008) Improving mixed variable optimization of computa-tional and model parameters using multiple surrogate functions.Master’s thesis, Graduate School of Engineering and Manage-ment, Air Force Institute of Technology, Wright-Patterson AFB,OH

Booker AJ (2000) Well-conditioned Kriging models for optimizationof computer simulations. Technical Report M&CT-TECH-00-002, Boeing Computer Services, Research and Technology, M/S7L–68, Seattle, Washington 98124

Booker AJ, Dennis JE Jr, Frank PD, Moore DW, Serafini DB (1998)Managing surrogate objectives to optimize a helicopter rotordesign—further experiments. AIAA Paper 1998–4717, Presentedat the 8th AIAA/ISSMO Symposium on Multidisciplinary Anal-ysis and Optimization, St. Louis

Booker AJ, Dennis JE Jr, Frank PD, Serafini DB, Torczon V, TrossetMW (1999) A rigorous framework for optimization of expensivefunctions by surrogates. Struct Optim 17(1):1–13

Coope ID, Price CJ (2000) Frame-based methods for unconstrainedoptimization. J Optim Theory Appl 107(2):261–274

Fischer B, Modersitzki J (2003) Curvature based image registration. JMath Imaging Vis 18(1):81–85

Forrester AIJ, Keane AJ (2009) Recent advances in surrogate-basedoptimization in aerospace sciences. Prog Aerosp Sci 45(1–3):50–79

Ginsbourger D, Helbert C, Carraro L (2008) Discrete mixtures ofkernels for kriging-based optimization. Qual Reliab Eng Int24(6):681–691

Griebel M, Dornseifer T, Neunhoeffer T (1998) Numerical Simu-lation in fluid dynamics: a practical introduction. SIAM, NewYork

Hager WW (1984) Condition estimates. SIAM J Sci Statist Comput5(2):311–316

Higham NJ, Tisseur F (2000) A block algorithm for matrix 1-normestimation with an application to 1-norm pseudospectra. SIAM JMatrix Anal Appl 21(4):1185–1201

Kleijnen JPC (2009) Kriging metamodeling in simulation: a review.Eur J Oper Res 192(3):707–716

Lewis RM, Torczon V (1999) Pattern search algorithms forbound constrained minimization. SIAM J Optim 9(4):1082–1099

Lewis RM, Torczon V (2000) Pattern search methods for lin-early constrained minimization. SIAM J Optim 10(3):917–941

Lewis RM, Torczon V (2002) A globally convergent augmentedLagrangian pattern search algorithm for optimization with gen-eral constraints and simple bounds. SIAM J Optim 12(4):1075–1089

Lophaven SN, Nielsen HB, Søndergaard J (2002) Aspects of theMATLAB toolbox DACE. Technical Report IMM-TR-2002-13,Technical University of Denmark, Copenhagen

Magallanez R (2007) Surrogate strategies for computationally expen-sive optimization problems with CPU-time correlated functions.Master’s thesis, Graduate School of Engineering and Manage-ment, Air Force Institute of Technology

Marin VE, Rincon JA, Romero DA (2009) A comparison ofmetamodel-assisted prescreening criteria for multi-objectivegenetic algorithms. In: ASME 2009 International Design Engi-neering Technical Conferences & Computers and Information inEngineering Conference, San Diego, CA, 30 Aug–2 Sep 2009.DETC2009-87736

Martin JD (2010) Robust kriging models. In: 51thAIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics,and Materials Conference, Orlando, FL, 12–15 April 2010. AIAA2010-2854

Martin JD, Simpson TW (2005) Use of kriging models to approximatedeterministic computer models. AIAA J 43(4):853–863

Modersitzki J (2004) Numerical methods for image registration.Oxford University Press

Nielsen HB (2007) DACE surrogate models. http://www2.imm.dtu.dk∼hbn/dace

Owen AB (1992) Orthogonal arrays for computer experiments, inte-gration, and visualization. Statist Sin 2:439–452

Robinson TD, Eldred MS, Willcox KE, Haimes R (2006) Strategies formultifidelity optimization with variable dimernsional hierarchicalmodels. In: Proc 47th AIAA/ASME/ASCE/AHS/ASC Structures,Structural Dynamics, and Materials Conf. (2nd AIAA Multidisci-plinary Design Optimization Specialist Conf.). Newport, RhodeIsland

Romero DA, Amon CH, Finger S (2008) A study of covariancefunctions for multi-response metamodeling for simulation-baseddesign and optimization. In: ASME 2008 International DesignEngineering Technical Conference, Brooklyn, NY, 3–6 August2008. DETC2008-50061

Sacks J, Welch WJ, Mitchell TJ, Wynn HP (1989) Design and analysisof computer experiments. Stat Sci 4(4):409–435

Santner TJ, Williams BJ, Notz WI (2003) The design and analysis ofcomputer experiments. Springer

Schmit LA Jr, Miura H (1976) Approximation concepts for efficientstructural synthesis. Technical Report CR-2552, NASA

Stein M (1987) Large sample properties of simulations using Latinhypercube sampling. Technometrics 29(2):143–151

Stein ML (1999) Interpolation of spatial data: some theory for kriging.Springer

Tang B (1993) Orthogonal array-based Latin hypercubes. J Am StatAssoc 88(424):1392–1397

Torczon V (1997) On the convergence of pattern search algorithms.SIAM J Optim 7(1):1–25

Viana FAC, Haftka RT (2008) Using multiple surrogates for meta-modeling. In: 7th ASMO-UK/ISSMO International Conferenceon Engineering Design Optimization, Bath, UK, 7–8 July2008

Page 12: An efficient class of direct search surrogate methods for ...Galois, Inc., 421 SW 6th Ave. Suite 300, Portland, OR 97204, USA e-mail: matt@galois.com evaluation requires both a numerical

64 M.A. Abramson et al.

Viana FAC, Gogu C, Haftka RT (2010a) Making the most out ofsurrogate models: tricks of the trade. In: ASME 2010 Interna-tional Design Engineering Technical Conferences & Computersand Information in Engineering Conference, Montreal, Canada,16–18 August 2010. DETC2010-28813

Viana FAC, Haftka RT, Watson LT (2010b) Why not run the efficientglobal optimization algorithm with multiple surrogates? In: 51thAIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics,

and Materials Conference, Orlando, FL, 12–15 April 2010. AIAA2010-3090

Voutchkov I, Keane AJ (2006) Multiobjective optimization using sur-rogates. In: 7th International Conference on Adaptive Computingin Design and Manufacture, Bristol, UK, pp 167–175

Wang GG, Shan S (2007) Review of metamodeling techniques in sup-port of engineering design optimization. J Mech Des 129(4):370–380