an efficient class of direct search surrogate methods for ...galois, inc., 421 sw 6th ave. suite...
TRANSCRIPT
Struct Multidisc Optim (2012) 45:53–64DOI 10.1007/s00158-011-0658-3
RESEARCH PAPER
An efficient class of direct search surrogate methods for solvingexpensive optimization problems with CPU-time-related functions
Mark A. Abramson · Thomas J. Asaki ·John E. Dennis Jr. · Raymond Magallanez Jr. ·Matthew J. Sottile
Received: 26 July 2010 / Revised: 1 April 2011 / Accepted: 11 April 2011 / Published online: 28 May 2011c© Springer-Verlag (outside the USA) 2011
Abstract In this paper, we characterize a new class of com-putationally expensive optimization problems and introducean approach for solving them. In this class of problems,objective function values may be directly related to thecomputational time required to obtain them, so that, asthe optimal solution is approached, the computational timerequired to evaluate the objective is significantly less thanat points farther away from the solution. This is moti-vated by an application in which each objective function
M. A. Abramson (B)The Boeing Company, PO Box 3707, MC 7L-21,Seattle, WA 98124-2207, USAe-mail: [email protected]
T. J. AsakiDepartment of Mathematics, Washington State University,PO Box 643113, Neill Hall 103, Pullman, WA 99164-3113, USAe-mail: [email protected]: http://geometricanalysis.org/TomAsaki
J. E. Dennis Jr.Department of Computational and Applied Mathematics,Rice University, 8419 42nd Avenue SW, Seattle,WA 98136-2360, USAe-mail: [email protected]: http://www.rice.edu/∼dennis
R. Magallanez Jr.Department of Mathematical Sciences, United States Air ForceAcademy, Colorado Springs, CO, USAe-mail: [email protected]
M. J. SottileGalois, Inc., 421 SW 6th Ave. Suite 300,Portland, OR 97204, USAe-mail: [email protected]
evaluation requires both a numerical fluid dynamics sim-ulation and an image registration process, and the goal isto find the parameter values of a predetermined referenceimage by comparing the flow dynamics from the numeri-cal simulation and the reference image through the imagecomparison process. In designing an approach to numeri-cally solve the more general class of problems in an efficientway, we make use of surrogates based on CPU times ofpreviously evaluated points, rather than their function val-ues, all within the search step framework of mesh adaptivedirect search algorithms. Because of the expected positivecorrelation between function values and their CPU times,a time cutoff parameter is added to the objective functionevaluation to allow its termination during the compari-son process if the computational time exceeds a specifiedthreshold. The approach was tested using the NOMADmand DACE MATLAB� software packages, and results arepresented.
Keywords Surrogate optimization · Derivative-freeoptimization · Black box optimization · Mesh AdaptiveDirect Search (MADS) · Pattern search ·Image registration · Kriging
1 Introduction
In this paper, we introduce a new class of optimizationproblems and a novel approach for numerically solvingthem. This class consists of minimizing an objective func-tion f : X ⊂ R
n → R that is computationally expensiveto evaluate, but becomes significantly less so as the solu-tion is approached. That is, there is a reasonably strongcorrelation between objective function values and the CPU
54 M.A. Abramson et al.
time required to compute them. We will refer to this classof problems as CPU-time-related. No assumption is maderegarding the precise nature of the correlation, since it willgenerally not be known. (This is in contrast with workin which the correlation is measured; e.g., see Romeroet al. 2008 and Marin et al. 2009.) Typically, the objec-tive function involves some type of engineering simulationor process modeling for which the computational time forsimulation runs on a large set of problem instances is eitherinfeasible (exceeds problem requirements) or impractical(months, years, or worse). While tackling the extreme com-putational requirements of large simulations is our ultimategoal, our primary motivation in this paper comes froman application whose objective function involves both afluid dynamics simulation and an image registration pro-cess, the latter of which becomes much less expensive closeto the solution. For this application, the feasible region Xis defined by simple bounds on the design variables, butthe approach described here is sufficiently general to coverproblems with general nonlinear constraints.
We note that other applications exist in which func-tion evaluations are positively correlated with their requiredCPU times. In general, certain parameter estimation prob-lems that require the numerical solution of differential equa-tions at each evaluated point may require fewer iterations(and hence less CPU time) when the parameters are close totheir true values. This property is also present in theater-level combat simulations, where one function evaluationrepresents a simulation given certain parameter settings. Inthe scenario of an enemy invasion, one common objectivewould be the employment of defensive forces in a way thatminimizes enemy penetration (in distance). Enemy penetra-tion is often highly correlated (though not perfectly) withthe time it takes to stop the penetration. Since the expensivesimulation would typically run until the enemy advance ishalted (or shortly thereafter), the computational expense ofan objective function evaluation (i.e., one simulation run) ishighly correlated with the objective function value.
The positive correlation between objective function valueand the computational time leads us to a solution approachthat integrates CPU runtime measures into the optimiza-tion process, allowing us to better utilize computationalresources and significantly reduce computational time.Because of the computational expense of the function eval-uations, our approach also involves the iterative use ofsurrogates. In this context, a surrogate can be thought ofas a much less expensive replacement for, but not necessar-ily a good approximation to, the objective function. In fact,in this paper we introduce surrogates based on CPU timesin addition to those based on objective function values. Analternative approach is to make use of coarser representa-tions of the simulation codes when further away from asolution (Bethea 2008). The present work is independent
of that course of action; i.e., there is no reason why bothideas could not be implemented together (a topic for futureresearch). For this reason, we fix the model fidelity hereand confine our focus to exploiting the relationship betweenobjective function value and CPU time; we hope to addressmodel fidelity in a separate paper.
To maintain rigorous convergence properties of ourapproach, the use of surrogates is incorporated into thesearch step of the class of mesh adaptive direct search(MADS) algorithms. This is consistent with the surro-gate management framework (SMF) introduced by Bookeret al. 1999. This is an important distinction, as the poll stepof the algorithm ensures convergence to a point satisfyingcertain necessary conditions for optimality, while the searchstep (which makes use of surrogates) is only used to makethe process of convergence significantly more efficient forcomputationally expensive problems.
The remainder of the paper is as follows. In Section 2, wefurther motivate our work by discussing the details of ourapplication. We present the MADS algorithm in Section 3and discuss surrogates in more detail in Section 4, includingsome specific surrogate types and initialization strategies.In Section 5, we introduce new strategies for incorporatingsurrogates to efficiently solve our target class of problems.Numerical results on a specific instance of our applica-tion are given in Section 6, followed by some concludingremarks in Section 7.
2 An applicable class of optimization problems
The class of problems we target is motivated by an appli-cation in which each objective function evaluation requiresboth a fluid dynamics simulation and an image registra-tion process. We consider the fluid dynamics applicationof discovering optimal simulation model parameters whichmost accurately reproduce a given experimentally obtainedtemplate image. Image registration is used to measure thedegree to which any particular model output differs fromthis template data. For our initial study, we consider noisytemplate data constructed from simulation output.
The movement of fluids in a region � ⊂ Rn, n ∈ {2, 3}
is governed by the well-known Navier-Stokes equations:
∂
∂tv + (v · ∇)v + ∇ p = 1
Re�v + (1 − β T )g,
∂ T
∂t+ v · ∇ T = 1
Re
1
Pr�T + q ′′′,
div v = 0,
where v is a velocity field on Rn , p is the pressure field in
�, g indicates body forces in �, Re ∈ R is the Reynolds
An efficient class of direct search surrogate methods for solving expensive optimization problems with CPU-time-related functions 55
number of the flow, Pr ∈ R is Prandtl number of the flow,β ∈ R is the coefficient of thermal expansion, q ′′′ is the heatsource, T is the temperature, and � denotes the Laplaceoperator (sum of the unmixed second partial derivatives).
Our test example is that of the well-known lid-drivencavity problem (Griebel et al. 1998), in which an initiallystationary 2d fluid in a rectangular container is subject toforces imposed by the top boundary (lid) moving at a uni-form horizontal velocity. This causes a circular pattern offlow to appear within the fluid over time. Since the Navier-Stokes equations cannot be solved analytically, they mustbe solved numerically using a finite element method andassociated finite differencing scheme. We use the methodof Griebel et al. (1998). Figure 1(a) shows a snapshot ofthe fluid velocity at some positive time. Figure 1(b) showsthe corresponding representation of the heat function H ,which we will use as reference data. The heat functiondefines the two-dimensional heat flux �q = ∇ × H , andis analogous to the hydrodynamic stream function whichdefines the (two-dimensional) velocity. For each combina-tion of Reynolds number and simulation length (or time),the velocity and viscosity of the fluid form a different cir-cular heat flux pattern throughout the region. The goal willbe to recover from the snapshot the (unknown) Reynoldsnumber and simulation length.
Image registration is the process of estimating someoptimal transformation u∗ between two images. Thus, atransformation u is realized as a path through the spaceof images. A particular choice of u will depend on theneeds of the application. For example, in medical imagingit is desirable to compare images with minimal distortion,∇ × u. Other image comparison tasks benefit from min-imizing the work required to “move” the intensities fromone image to another. Different types of transformations aredescribed in Modersitzki (2004). If we consider the classicalinner product space L2(�) of squared Lebesgue-integrablefunctions with its standard induced norm, a transformationof an image T is given by Tu(x) = T (x − u(x)), whereu(x) is the displacement of the point x . The objective isto minimize the distance between a reference image R anda template image T through an optimal warp transforma-tion Tu ∈ L2(�), as defined by some distance measurement
D, and a smoothing or regularizing term S. This problem isgiven by
minu
D[R, Tu] + αS[u], (1)
where α > 0 governs the relative contributions of the twoterms. We choose to illustrate our techniques using thecurvature registration method of Fischer and Modersitzki(2003), where
D[R, Tu] = 1
2‖Tu − R‖L2(�), (2)
S[u] = 1
2
nd∑
i=1
∫
�
(�ui )2 dx, (3)
and nd is the dimension of the data space �. Equation (3)can be viewed as an approximation to the curvature ofthe transformation u. The optimal transformation u∗ is theone that simultaneously minimizes image differences andtransformation curvature.
The Euler-Lagrange equation for (1) is
(Tu(x) − R(x)) ∇Tu(x) + α�2u(x) = 0, x ∈ �, (4)
where we have applied Neumann boundary conditions∇ui = ∇�ui = 0, i = 1, 2, . . . , nd , on the boundary ∂�
of �. One method of solution is to take the associatedtime-dependent equation,
ut (x, t) + α�2u(x, t) = (R(x) − Tu(x, t))∇Tu(x, t), (5)
and compute the steady state solution u∗t (x, t) = 0 using the
iteration,
u(k+1)(x, t) = u(k)(x, t) + τu(k)t (x, t), (6)
with appropriately chosen artificial time step τ .Figure 2 shows an example of an image registration for
images of the temperature of two different parameteriza-tions of a fluid flow simulation. The top left picture is the
Fig. 1 Example time snapshotsof a lid-driven cavity simulation.The velocity field (a) at sometime t > 0 shows a vorticalstructure. The heat function (b)at time t serves as our examplereference image
56 M.A. Abramson et al.
Fig. 2 Image registrationexample. The noisy intensitiesof the template image (b) aretransformed into a registeredimage (c) intended to match areference image (a). Theoptimal transformation u∗ isdetermined by (1). The residualR − Tu is shown in (d)
reference image R for a specific set of (unknown) parametervalues (e.g., Reynolds number of the fluid). For a differentset of parameter values, the simulation is run, resulting inthe template image T shown in the top right picture. Theimage registration is then applied using (6) to solve (1)–(3).The resulting warped template image Tu and the differencebetween R and Tu are shown in the bottom left and rightimages, respectively.
Given pixel points {xi }n pi=1 ⊂ R
n , where n p is the numberof pixels in the image, the objective measure of the good-ness of simulation parameter choices is given by the optimaltransformation; namely,
f = ‖u∗ − u‖2, u = 1
n p
n p∑
i=1
u∗(xi ) (7)
The subtraction of the means in (7) allows for a zero objec-tive function value for images that are identical except fortranslational alignment issues. If the images are very sim-ilar, the numerical image registration scheme (6) requiresonly a few iterations to transform T into R, resulting in lesscomputational time as well as a small distance value. Onthe other hand, images that are relatively dissimilar requiremore iterations, increasing the computational time. Thus,the objective function value f and its associated computa-tional time are expected to be strongly correlated. We shouldalso point out that the computational cost of an image reg-istration is not trivial. When image deformation methodsare applied instead of simpler affine transformations, thecost due to the additional computational workload can besignificant.
3 Mesh Adaptive Direct Search (MADS)
The class of mesh adaptive direct search (MADS) algo-rithms was introduced by Audet and Dennis (2006) as away of extending generalized pattern search (GPS) (Audetand Dennis 2003; Lewis and Torczon 1999, 2000; Torczon1997) to optimization problems with nonlinear constraintswithout the use of penalty parameters (Lewis and Torczon2002) and with a stronger theoretical justification than isavailable with filters (Audet and Dennis 2004). Each iter-ation of GPS and MADS algorithms consists of a searchand a poll step performed on a mesh formed by a set of nD
directions D that positively spans Rn . These algorithms are
applied not to the objective function f , but to the barrierobjective fX ≡ f + ψ f , where ψ f is the indicator functionfor f ; it is zero at x ∈ X and ∞ for x /∈ X .
The mesh at iteration k, which is not actually constructed,can be expressed as
Mk =⋃
x∈Sk
{x + �mk Dz : z ∈ N
nD },
where Sk is the finite set of points where the objective func-tion f had been evaluated by the start of iteration k (soS0 is the set of initial feasible points), and the mesh sizeparameter �m
k controls how coarse or fine the mesh is.The search step is very flexible, as it consists of evalu-
ating fX at any finite number of mesh points. One couldchoose to sample randomly, sample at points generated byan experimental design, run a few iterations of a favoriteheuristic, such as a genetic algorithm, or simply do noth-ing. For computationally expensive functions, the searchstep usually consists of constructing (or recalibrating) surro-gate functions and solving a surrogate optimization problem
An efficient class of direct search surrogate methods for solving expensive optimization problems with CPU-time-related functions 57
on the mesh. The resulting solution, along with any otherpromising points, is then evaluated by fX .
If the search step fails to yield a mesh point with lowerobjective function value, the poll step is performed. It con-sists of evaluating the set of adjacent mesh points Pk , calledthe poll set; namely,
Pk = {xk + �mk d : d ∈ Dk} ⊂ Mk,
where the current iterate xk is called the frame center, andDk is a positive spanning set of directions satisfying certainproperties that will ensure appropriate convergence prop-erties of the algorithm. In GPS methods, the condition ofDk ⊂ D must hold at each iteration, and Dk is cho-sen to include directions that conform to the boundary ofany nearby constraint (Lewis and Torczon 2000). This issufficient for problems with a finite number of linear con-straints because we include in D all possible conformingdirections at any point in the feasible region. However,for more general problems with nonlinear constraints, thiscondition is insufficient to ensure convergence.
In the more general class of MADS algorithms, a newparameter �
pk , called the poll size parameter is introduced,
which satisfies �mk ≤ �
pk for all k, and
limk∈K
�mk = 0 ⇔ lim
k∈K�
pk = 0
for any infinite subset of indices K . (8)
Under this construction, GPS now becomes the specificinstance of MADS with �k = �m
k = �pk . In MADS, Dk
is not a subset of D, but instead must satisfy the followingproperties:
• Each nonzero d ∈ Dk can be written as a nonnegativeinteger combination of the directions in D; i.e., d =Du for some vector u ∈ N
nD that may depend on theiteration number k.
• The distance from the frame center xk to a poll pointxk + �m
k d is bounded by a constant times the poll sizeparameter; i.e., �m
k ≤ ‖d‖ ≤ �pk max{‖d ′‖ : d ′ ∈ D}.
• Limits (as defined in Coope and Price 2000) of thenormalized sets Dk are positive spanning sets.
The idea in MADS is that �mk approaches zero faster
than �pk , which increases the number of possible directions
from which to select for inclusion in Dk . This is illus-trated in Fig. 3, where GPS and MADS frames (in rows1 and 2, respectively), constructed from the standard set of2n positive and negative standard coordinate directions, aredepicted in two dimensions. In each case, the thick-linedbox is called a frame (with frame center xk), and the pointswhere it intersects the mesh are at a relative distance of �
pk
from the frame center xk . In MADS, any set of positivespanning directions that yields mesh points inside the frame(e.g., p1, p2, and p3) may be chosen.
Fig. 3 GPS and MADS framesin R
2
58 M.A. Abramson et al.
Fig. 4 A general MADSalgorithm
If either the search or poll succeeds in finding animproved mesh point, it becomes the new iterate xk+1 ∈ X ,and the mesh size is retained or increased. If neither stepsucceeds, then the current iterate is retained and the meshsize is reduced. More specifically, given a fixed rationalnumber τ > 1 and two integers w− ≤ −1 and w+ ≥ 0,�m
k is updated according to the rule
�mk+1 =τwk �m
k
for some wk∈
⎧⎪⎪⎨
⎪⎪⎩
{0, 1, . . . , w+}if an improved mesh point is found,
{w−, w− + 1, . . . , −1}otherwise.
(9)
A general MADS algorithm is given in Fig. 4.Convergence of MADS depends on selecting direc-
tions so that the union of normalized poll directions usedbecomes asymptotically dense on the unit sphere (Audet andDennis 2006). Specific instances of MADS have beenshown to achieve this condition with probability one (byrandomly selecting Dk) (Audet and Dennis 2006) anddeterministically (through the use of Halton sequences)(Abramson et al. 2009).
More details, including proofs of convergence to appro-priately defined first-order stationary points (even for non-smooth functions), are given for GPS and MADS inAudet and Dennis (2003) and (2006), respectively. Second-order stationarity results for GPS and MADS are studiedin Abramson (2005) and Abramson and Audet (2006),respectively.
4 Surrogate functions
The idea for surrogates first appeared as “approximationconcepts” in the work of Schmit and Miura (1976). Bookeret al. (1999) characterize a class of problems for which sur-rogate functions would be an appropriate approach, suggesta surrogate composition, and set forth a general Surro-gate Management Framework (SMF) for using surrogatesto numerically solve optimization problems. More infor-mation about surrogate-based optimization can be found inForrester and Keane (2009), Viana et al. (2010a), and Wangand Shan (2007).
Most surrogates are one of two types: simplified physicsor response-based. A simplified physics model, also knownas a low-fidelity model, makes certain physical assumptionsthat significantly reduce the computational cost by eliminat-ing complex equations and even the number of variables.Although several novel approaches exist in the literature fortreating this class of surrogates (e.g., see Alexandrov et al.1998 or Robinson et al. 2006), the actual construction of themodels is problem-dependent.
To handle our target class of problems, we use as sur-rogates a class of response-based kriging approximationmodels (Sacks et al. 1989) (see also, Kleijnen 2009, Martinand Simpson 2005, and Stein 1999). These are not the onlyresponse-based models, and in fact, there have even beencases where multiple surrogate types are applied to the samedata (Ginsbourger et al. 2008; Viana et al. 2010b; Viana andHaftka 2008; Voutchkov and Keane 2006) in an effort toimprove performance.
In kriging, given a set of known data points (or sites){si }ns
i=1 ⊂ Rn and their deterministic response or function
values ys ∈ Rns (i.e., [ys]i = y(si ) for i = 1, 2, . . . , ns),
the deterministic function y(z) is modeled as a realizationof a stochastic process,
Y (z) =p∑
j=1
β jφ j (z) + Z(z) = βT φ(z) + Z(z),
where Y (z) is the sum of a regression model withcoefficients β = [β1, β2, . . . , βp] ∈ R
p and basis func-tions φ = [φ1, φ2, . . . , φp], and a random variable Z(z),where Z : R
n −→ R, with mean zero and covarianceV (w, z) = σ 2 R(θ, w, z) between Z(w) and Z(z), σ 2 isthe process variance, and R(θ, w, z) is the correlation func-tion of w and z. The parameter θ ∈ R
n controls the shape ofthe correlation function. Kriging produces an approximatefunction value at an unknown point s0 ∈ R
n using weightson known responses; namely,
y(s0) = c(s0)T ys,
where c(s0) ∈ Rns is a vector of weights. Details for com-
puting c(s0) are given in Lophaven et al. (2002) or Sackset al. (1989), for example.
An efficient class of direct search surrogate methods for solving expensive optimization problems with CPU-time-related functions 59
The parameter θ is set via an optimization process whichrequires the computation of R(θ, x, x)−1. However, this isdifficult numerically because R can become ill-conditionedas points cluster together during the convergence processof MADS (Booker 2000). The increase in the conditionnumber of R(θ, x, x) (which can be estimated and mon-itored) can greatly impact the computed value of θ orprevent the calculation of the regression coefficients alto-gether. Booker (2000) alleviates this problem in practice byintroducing a second correlation function for all the pointsthat are generated after the initial surrogate is formed. Healso suggests the use of additional correlation functions ifthe ill-conditioning recurs. An alternative approach for alle-viating numerical instability due to the clustering of pointswas studied in Martin (2010). For problems with CPU-time-related functions, since the clustering of points coincideswith smaller CPU times, we can also choose to simply skipthe search step whenever an estimate of the condition num-ber (based on Hager 1984; Higham and Tisseur 2000) of thekriging correlation matrix becomes too large.
5 New surrogate strategies
In this section, we introduce a new search step for handlingCPU-time-related functions. First, if objective function val-ues and CPU times are positively correlated, then improve-ment in the objective function should not be expected oncethe computational time exceeds a certain amount. To avoidwasting unnecessary CPU time, we introduce a CPU timecutoff parameter tcut
k > 0 to allow a function evaluationto be aborted if it is taking too long to perform. A valueof tcut
k = ∞ means that the function is evaluated normallywithout being aborted.
At each point x ∈ X that is evaluated by f , we recordits function value z = f (x) and the CPU time t = t (x)
required to evaluate f (x). For a specified time cutoff param-eter value of tcut
k , we can represent a function evaluationby [z, t] = f (x, tcut
k ). Once the time for computing the
function value exceeds the value specified by tcutk , evalu-
ation of f (x) is aborted without returning a value for z (orz is set to be infinity or an arbitrarily large number) andwith t set to tcut
k . One possible approach we considered isto set tcut
k+1 = αtk , α ≥ 1, where tk is the recorded CPUtime for the current best iterate xk (i.e., in our notation,[zk, tk] = f (xk, tcut
k ).The CPU time relation also means that a surrogate based
on either the objective function values or CPU times wouldprobably be a good predictor of decrease in the objective. Infact, a surrogate on the CPU time has the added advantagethat it always returns a value, whereas, the objective func-tion would be aborted if tcut
k is exceeded at iteration k. Wedenote these surrogates on objective values and CPU timesby fk(·) and tk(·) (at iteration k), respectively, and we con-sider the following four surrogate optimization problems,whose solutions we would expect to be good subsequenttrial points for the true objective function:
minx∈X
fk(x), (10)
minx∈X
tk(x), (11)
minx∈X
fk(x), s.t. tk(x) ≤ tcutk + ε, (12)
minx∈X
tk(x), s.t. fk(x) ≤ zk . (13)
The parameter ε > 0 added to the constraint in (12) isa small constant offset to allow for variability in compu-tational time. We do this to prevent the situation whereunfortunate variations in CPU time result in feasible pointsthat are flagged as infeasible with respect to (12).
We anticipate that the amount of correlation betweenfunction values and CPU times may be different fordifferent problems. If the correlation is not as strong, thenthe time-based surrogates will not predict improvement aswell. From the standpoint of convergence, the MADS pollstep overcomes poor correlation, but at a higher cost. Infact, the class of MADS algorithms has been shown tobe robust in solving problems, even when the surrogatepredicts poorly (Booker et al. 1998).
Fig. 5 MADS search step k for CPU-time-related functions
60 M.A. Abramson et al.
The surrogate optimization problem is typically solvedusing a recursive call to MADS (though theoretically, thesurrogate optimization problem could be solved by manyoptimization codes). Constraints in (12)–(13) are treatedby the extreme barrier approach of only allowing feasiblepoints (or setting the function value at any infeasible trialpoint to infinity).
We should note that the combination of MADS with abarrier and the use of the tcut
k in the original optimizationproblem causes a dilemma when using surrogate functions.Using the parameter tcut
k to stop unprofitable function eval-uations is good for saving computational time, but it resultsin infinite or arbitrarily large function values, which can-not be used to construct a good surrogate. To overcome this,we simply set the function value to the largest value seenthus far in the iteration process, whenever the time cutoffthreshold is exceeded. Our algorithm can be summarizedas simply MADS with the specific kth search step given inFig. 5.
6 A numerical example
We now present some numerical results from a specificexample of the class of problems described in the Section 2,the lid-driven cavity problem (Griebel et al. 1998). Addi-tional results and discussion are given in Magallanez (2007).At one particular Reynolds number and simulation length, areference image of the heat flux pattern is captured and thennoise is added to the image, so as to represent what onemight to see in experimentally obtained physical measure-ments. As stated in Section 2, the goal is to run a simulationfor different Reynolds numbers and simulation lengths, cap-ture the template image, and compare the template andreference images of heat in an attempt to determine theoriginal Reynolds number and simulation length set for thereference image. Figure 2 actually shows reference andtemplate images for this very problem.
Since this problem has only bound constraints, weapplied GPS with the search step described by Fig. 5.To do this, we used two MATLAB� software packages,NOMADm (Abramson 2008) for the implementation ofMADS/GPS, and DACE (Nielsen 2007) to build the krig-ing surrogates (with some custom-built files) for the searchstep. The use of kriging functions as surrogates requiresspecification of the data sites, and regression and correla-tion functions. For constructing an initial surrogate, the setof initial data sites can be chosen via experimental design(Santner et al. 2003) or by sampling a set of “space-filling”points, such as Latin hypercube designs (Stein 1987; Tang1993) or orthogonal arrays (Owen 1992). Experimentaldesigns generally require more function evaluations, espe-cially in larger dimensions. However, since our problem is
of small dimension, we chose to use a 9-point central com-posite design (CCD). This also allowed us to use a second-order regression function for the kriging model, which wefound to be more accurate than lower-order functions.
The surrogate optimization problem was solved by arecursive call to MADS from within the search step. Anestimate of the condition number of the kriging correla-tion matrix was monitored using the MATLAB condestfunction, which is based on Hager (1984), Higham andTisseur (2000). The bounds on Reynolds number and simu-lation length (in seconds) were set to be [0, 5000] and [3, 8],respectively.
For each scenario, different variations of the algorithmare applied and compared to a base case. The base caseimplementation uses GPS with an empty search step (i.e.,it is skipped), a single initial point or set of CCD points,and tcut
k = ∞ for all k. This allows for a full evalua-tion of all points and a comparative analysis of the pro-posed algorithm. The other cases use the partial and fullimplementation of the search step presented in Fig. 5.
We first made some preliminary runs using a strategy ofsetting tcut
k+1 = tk at each iteration. However, these runsturned out to be unsuccessful in forming good surrogates(i.e., surrogates that routinely found good trial points toevaluate) because our main assumption of CPU time corre-lation turned out not to hold in certain parts of the domain.If the template image is too dissimilar from the referenceimage, the image registration process actually terminatesprematurely—with a lower CPU time and a much worseobjective function value. This is seen in Fig. 6, which showsthe image registration time and objective function value foreach trial point evaluated in the subsequent run, in whichwe set tcut
k+1 = 2tk at each iteration. This choice seemed torectify the situation. Note that we only include image regis-tration time here because the simulation time was roughlyconstant over all evaluated points, and for this particularproblem, the image registration time was actually the moredominant time.
We also experienced ill-conditioning of the correlationmatrix as a solution is approached, which is caused bytrial points becoming more clustered together. This wasremedied by invoking an empty search (i.e., not optimizingthe surrogate) whenever the matrix became ill-conditioned.This is not unreasonable in this case because the CPU timecorrelation means that function values are probably muchless expensive at this point in the iteration process, and theuse of surrogates is then not as important.
The results for each case are shown in Table 1, wherethe column headings denote, respectively, the type of searchstep executed (Search), type of initial points used (x0), finalsolution (x∗), number of iterations (nIter), number of func-tion evaluations (nFunc), CPU minutes required (CPU),and the ratio of successful to total surrogate search steps
An efficient class of direct search surrogate methods for solving expensive optimization problems with CPU-time-related functions 61
Fig. 6 Time correlation for thelid-driven cavity problem
0
5
10
15
20
25
30
35
40
0 50 100 150 200 250Objective Function Value
Imag
e C
om
par
isio
n T
ime
(min
)
executed (Successes). For the latter, a success refers toa surrogate optimizer that actually improves the objectivefunction (as opposed to one that does not, due to poorprediction of the surrogate). Except for the “None” desig-nation, the search types refer to the four surrogate strategiesshown in (10)–(13). The first letter indicates the objectivefunction ( f (x) vs. t(x)), and the second (if present) indi-cates the constraint. The initial point types consist of usinga single initial point in the geometric center of the boundconstrained feasible region (center), a randomly chosen ini-tial feasible point (random), or central composite design(CCD). The final solution is expressed as [Reynolds num-ber, simulation length (seconds)]. The first three runs arebase cases with no search step or time cutoff parameter used,while the last seven cases are different variations of the newsearch step, the first three being similar to the base case (nosearch step) except for the use of the time cutoff param-eter to abort expensive function evaluations. (The random
point is the same random initial point that was used for thebase case.) The final four cases employ one of the surrogateoptimization problems (10)–(13), as just described.
All runs successfully found the optimal solution at essen-tially the same parameter values (with f (x∗) = 0.60 inall cases). As hoped for, the CPU time was significantlylower for the two time-based surrogates ((11) and (13)) thanall the other cases, despite costing more function evalua-tions than many other cases, including all three base cases.When using the tcut
k parameter and a CCD design, the time-based surrogates achieved a 32.3% reduction in CPU time.Since the time-based surrogates achieved this despite morefunction evaluations, this also indicates that for this class ofproblems, the number of function evaluations is not a goodmeasure of the efficiency of the different implementations.
The results with no search illustrate the importanceof finding an appropriate initial point to set up the timecutoff parameter. With only one initial point, this parameter
Table 1 GPS: lid-driven cavity results
Search x0 x∗ nIter nFunc CPU (min) Successes
Full-Time (tcutk = +∞):
None center [134.13, 4.76] 56 123 126.73
None random [134.12, 4.76] 58 118 178.31
None CCD [134.13, 4.76] 40 107 196.58
Cut-Time (tcutk = 2 × tk ):
Search x0 x∗ nIter nFunc CPU (min) Successes
None center [134.13, 4.76] 80 157 257.67
None random [134.12, 4.76] 92 162 796.40
None CCD [134.13, 4.76] 40 107 109.73
f s.t. t CCD [134.19, 4.76] 73 162 135.78 23/44
f CCD [134.19, 4.76] 72 165 182.89 15/34
t s.t. f CCD [134.19, 4.76] 52 133 74.00 7/15
t CCD [134.19, 4.76] 49 127 74.48 5/13
62 M.A. Abramson et al.
Fig. 7 Decreasing functionvalue
needs several iterations to build enough slack to allow thesequence of points to overcome the local nature of the CPUtime correlation. The extra iterations resulted in a differentpath to a solution, which required significantly more CPUtime. However, when using a initial CCD (with an emptysearch step), the results are identical except for the CPUtime. In this case, using the time cutoff parameter savedalmost 90 CPU minutes.
Figure 7 further illustrates the performance of the foursurrogate strategies, with each color representing a differentstrategy (corresponding to (10)–(13)), and each shape (dia-mond or square) representing the source (search or pollstep, respectively) of the improvement. The figure showsthat the surrogates based on computational time achievelower function values quicker than the surrogates basedon function values. Furthermore, regardless of the surro-gate objective, the use of a constraint (see (12) and (13))in each case resulted in faster initial convergence thanthe corresponding unconstrained version ((10) and (11),respectively).
7 Concluding remarks
This paper represents a first attempt at numerically solv-ing the challenging class of optimization problems in whichfunction values and the CPU times required to computethem are correlated. Exploiting acquired knowledge aboutthe relationship between objective function values and theirCPU times appears to be a useful and efficient means ofsolving this class of problems. One challenge is dealing with
the extent to which the CPU time correlation property holdsin practice, which may not be fully understood. The imple-mentation of the time cutoff parameter was a useful wayto reduce the time required to find a numerical solution.However, while it can be used to stop the image registrationalgorithm, it cannot stop the numerical simulation, since theimage registration requires the image obtained from the fullsimulation.
Since tcutk is controlled by the user, one potential
improvement would be a more systematic approach toupdating it, rather than the trial-and-error approach usedhere. Since function values and computational times arestored in order to construct surrogates, they can also be usedto measure the correlation between the two. Higher valuesof tcut
k can be assigned whenever the correlation is low, andvice versa.
Not using the surrogates when ill-conditioning occurswas a simple tactic that made sense for the particular prob-lem we solved. However, a more effective means to combatthis problem may be the use of a trust region (e.g., seeAlexandrov et al. 1998), both to constrain the optimizationof the surrogate and to screen out points used in the con-struction of the surrogate. The size of the trust region couldbe based on the frame or mesh size parameter.
Acknowledgments The authors wish to thank David Bethea andtwo anonymous referees for some useful comments and discus-sions. Support for the first author was provided by Los AlamosNational Laboratory (LANL). Support for the third author was pro-vided by LANL, Air Force Office of Scientific Research F49620-01-1-0013, The Boeing Company, and ExxonMobil Upstream ResearchCompany.
An efficient class of direct search surrogate methods for solving expensive optimization problems with CPU-time-related functions 63
The views expressed in this document are those of the authors anddo not reflect the official policy or position of the United States AirForce, Department of Defense, United States Government, or corporateaffiliations of the authors.
References
Abramson MA (2005) Second-order behavior of pattern search. SIAMJ Optim 16(2):315–330
Abramson MA (2008) NOMADm optimization software.http://www.gerad.ca-NOMAD-Abramson-NOMADm.html
Abramson MA, Audet C (2006) Second-order convergence of meshadaptive direct search. SIAM J Optim 17(2):606–619
Abramson MA, Audet C, Dennis JE Jr, Le Digabel S (2009)OrthoMADS: a deterministic instance with orthogonal directions.SIAM J Optim 20(2):948–966
Alexandrov N, Dennis JE Jr, Lewis R, Torczon V (1998) A trustregion framework for managing the use of approximation modelsin optimization. Struct Optim 15:16–23
Audet C, Dennis JE Jr (2003) Analysis of generalized pattern searches.SIAM J Optim 13(3):889–903
Audet C, Dennis JE Jr (2004) A pattern search filter method for nonlin-ear programming without derivatives. SIAM J Optim 14(4):980–1010
Audet C, Dennis JE Jr (2006) Mesh adaptive direct search algo-rithms for constrained optimization. SIAM J Optim 17(2):188–217
Bethea D (2008) Improving mixed variable optimization of computa-tional and model parameters using multiple surrogate functions.Master’s thesis, Graduate School of Engineering and Manage-ment, Air Force Institute of Technology, Wright-Patterson AFB,OH
Booker AJ (2000) Well-conditioned Kriging models for optimizationof computer simulations. Technical Report M&CT-TECH-00-002, Boeing Computer Services, Research and Technology, M/S7L–68, Seattle, Washington 98124
Booker AJ, Dennis JE Jr, Frank PD, Moore DW, Serafini DB (1998)Managing surrogate objectives to optimize a helicopter rotordesign—further experiments. AIAA Paper 1998–4717, Presentedat the 8th AIAA/ISSMO Symposium on Multidisciplinary Anal-ysis and Optimization, St. Louis
Booker AJ, Dennis JE Jr, Frank PD, Serafini DB, Torczon V, TrossetMW (1999) A rigorous framework for optimization of expensivefunctions by surrogates. Struct Optim 17(1):1–13
Coope ID, Price CJ (2000) Frame-based methods for unconstrainedoptimization. J Optim Theory Appl 107(2):261–274
Fischer B, Modersitzki J (2003) Curvature based image registration. JMath Imaging Vis 18(1):81–85
Forrester AIJ, Keane AJ (2009) Recent advances in surrogate-basedoptimization in aerospace sciences. Prog Aerosp Sci 45(1–3):50–79
Ginsbourger D, Helbert C, Carraro L (2008) Discrete mixtures ofkernels for kriging-based optimization. Qual Reliab Eng Int24(6):681–691
Griebel M, Dornseifer T, Neunhoeffer T (1998) Numerical Simu-lation in fluid dynamics: a practical introduction. SIAM, NewYork
Hager WW (1984) Condition estimates. SIAM J Sci Statist Comput5(2):311–316
Higham NJ, Tisseur F (2000) A block algorithm for matrix 1-normestimation with an application to 1-norm pseudospectra. SIAM JMatrix Anal Appl 21(4):1185–1201
Kleijnen JPC (2009) Kriging metamodeling in simulation: a review.Eur J Oper Res 192(3):707–716
Lewis RM, Torczon V (1999) Pattern search algorithms forbound constrained minimization. SIAM J Optim 9(4):1082–1099
Lewis RM, Torczon V (2000) Pattern search methods for lin-early constrained minimization. SIAM J Optim 10(3):917–941
Lewis RM, Torczon V (2002) A globally convergent augmentedLagrangian pattern search algorithm for optimization with gen-eral constraints and simple bounds. SIAM J Optim 12(4):1075–1089
Lophaven SN, Nielsen HB, Søndergaard J (2002) Aspects of theMATLAB toolbox DACE. Technical Report IMM-TR-2002-13,Technical University of Denmark, Copenhagen
Magallanez R (2007) Surrogate strategies for computationally expen-sive optimization problems with CPU-time correlated functions.Master’s thesis, Graduate School of Engineering and Manage-ment, Air Force Institute of Technology
Marin VE, Rincon JA, Romero DA (2009) A comparison ofmetamodel-assisted prescreening criteria for multi-objectivegenetic algorithms. In: ASME 2009 International Design Engi-neering Technical Conferences & Computers and Information inEngineering Conference, San Diego, CA, 30 Aug–2 Sep 2009.DETC2009-87736
Martin JD (2010) Robust kriging models. In: 51thAIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics,and Materials Conference, Orlando, FL, 12–15 April 2010. AIAA2010-2854
Martin JD, Simpson TW (2005) Use of kriging models to approximatedeterministic computer models. AIAA J 43(4):853–863
Modersitzki J (2004) Numerical methods for image registration.Oxford University Press
Nielsen HB (2007) DACE surrogate models. http://www2.imm.dtu.dk∼hbn/dace
Owen AB (1992) Orthogonal arrays for computer experiments, inte-gration, and visualization. Statist Sin 2:439–452
Robinson TD, Eldred MS, Willcox KE, Haimes R (2006) Strategies formultifidelity optimization with variable dimernsional hierarchicalmodels. In: Proc 47th AIAA/ASME/ASCE/AHS/ASC Structures,Structural Dynamics, and Materials Conf. (2nd AIAA Multidisci-plinary Design Optimization Specialist Conf.). Newport, RhodeIsland
Romero DA, Amon CH, Finger S (2008) A study of covariancefunctions for multi-response metamodeling for simulation-baseddesign and optimization. In: ASME 2008 International DesignEngineering Technical Conference, Brooklyn, NY, 3–6 August2008. DETC2008-50061
Sacks J, Welch WJ, Mitchell TJ, Wynn HP (1989) Design and analysisof computer experiments. Stat Sci 4(4):409–435
Santner TJ, Williams BJ, Notz WI (2003) The design and analysis ofcomputer experiments. Springer
Schmit LA Jr, Miura H (1976) Approximation concepts for efficientstructural synthesis. Technical Report CR-2552, NASA
Stein M (1987) Large sample properties of simulations using Latinhypercube sampling. Technometrics 29(2):143–151
Stein ML (1999) Interpolation of spatial data: some theory for kriging.Springer
Tang B (1993) Orthogonal array-based Latin hypercubes. J Am StatAssoc 88(424):1392–1397
Torczon V (1997) On the convergence of pattern search algorithms.SIAM J Optim 7(1):1–25
Viana FAC, Haftka RT (2008) Using multiple surrogates for meta-modeling. In: 7th ASMO-UK/ISSMO International Conferenceon Engineering Design Optimization, Bath, UK, 7–8 July2008
64 M.A. Abramson et al.
Viana FAC, Gogu C, Haftka RT (2010a) Making the most out ofsurrogate models: tricks of the trade. In: ASME 2010 Interna-tional Design Engineering Technical Conferences & Computersand Information in Engineering Conference, Montreal, Canada,16–18 August 2010. DETC2010-28813
Viana FAC, Haftka RT, Watson LT (2010b) Why not run the efficientglobal optimization algorithm with multiple surrogates? In: 51thAIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics,
and Materials Conference, Orlando, FL, 12–15 April 2010. AIAA2010-3090
Voutchkov I, Keane AJ (2006) Multiobjective optimization using sur-rogates. In: 7th International Conference on Adaptive Computingin Design and Manufacture, Bristol, UK, pp 167–175
Wang GG, Shan S (2007) Review of metamodeling techniques in sup-port of engineering design optimization. J Mech Des 129(4):370–380