aerodynamic shape optimization part i: theoretical...

Aerodynamic Shape OptimizationPart I: Theoretical Background

John C. Vassberg ∗

Phantom WorksThe Boeing Company

Huntington Beach, CA 92647-2099, USA

Antony Jameson †

Department of Aeronautics & AstronauticsStanford University

Stanford, CA 94305-3030, USA

Von Karman InstitueBrussels, Belgium

8 March, 2006

Nomenclature

A Hessian Matrix / Operator

a Integrand of Integral Hessian Operator

C Constant

E Error of DiscretizationF Integrand of Cost Function

G Gradient of Cost Function

G Implicitly-Smoothed Gradient

g Acceleration due to Gravity ≃ 32.2 ft

sec2

GRMS Root-Mean-Square of Gradient Vector

H Estimate of Inverse Hessian Matrix

I Objective or Cost Function

IR Discrete I based on Rectangle Rule

ITERS Iterations needed for 10−6 Reduction

i Matrix Row Index

j Spatial Index; Matrix Column Index

k Generic IndexK Upper Limit of Generic Index k

N Number of Design Variables

n Iteration Index

NMESH Number of Multigrid Levels

NX Number of Mesh Intervals

P Multigrid Forcing Function

P Error Vector of Rank-One Scheme

Q Multigrid Transfer Operator

STEP Input Parameter Similar to CFL Number

Surplus ICR − ID

R

s Arch Length

T Total Time; Multigrid Transfer Operatort Parametric Variable; Time

v Velocity

x Independent Spatial Variable

y Design Variable

Y ERR L2 Norm of y − yexact vector

α Tridiagonal Coefficient of Implicit Scheme

∗Boeing Technical Fellow†T. V. Jones Professor of Engineering

Copyright c© 2006 by Vassberg & Jameson.

Published by the VKI with permission.

β Coefficient of Parabolic Termγ Recombination Coefficientsǫ Implicit Smoothing Parameterλ Time-Step Parameter

ξ Free Variableπ 3.141592654...

∞ Infinity

δ∗ First Variation of∆∗ Change in∂∗ Partial of

O(∗) Order of

(∗)T Matrix Transpose of(∗)−1 Inverse Matrix of

∗′ First Derivative of, ∂

∂x

∗′′ Second Derivative of, ∂2

∂x2

1 Introduction

This is the first of two lectures prepared by the au-thors for the von Karman Institute that deal withthe subject of aerodynamic shape optimization. Inthis lecture we introduce some theoretical back-ground on optimization techniques commonly usedin the industry, apply some of the approaches to acouple of very simple model problems, and comparethe results of these schemes. We also discuss theirmerits and defficiencies as they relate to the classof aerodynamic shape optimization problems theauthors deal with on a daily basis. In the secondlecture, we provide a set of sample applications.Before we continue with the simple model prob-lems, let’s first review some properties of the aero-dynamic shape optimization studies the authors areregularly concerned with.

In an airplane design environment, there is noneed for an optimization based purely on the aero-dynamics of the aircraft. The driving force behind(almost) every design change is related to how the

Vassberg & Jameson, VKI Lecture-I, Brussels, Belgium, 8 March, 2006 1 of 30

modification improves the vehicle, not how it en-hances any one of the many disciplines that com-prise the design. And although we focus our lec-tures on the aerodynamics of an airplane, we alsoinclude the means by which other disciplines arelinked into and affect the aerodynamic shape opti-mization subtask; these will be addressed in detailin the second lecture. Another characteristic of theproblems we typically (but not always) work on isthat the baseline configuration is itself within 1-2%of what may be possible, given the set of constraintsthat we are asked to satisfy. This is certainly truefor commercial transport jet aircraft whose designshave been constantly evolving for the past half cen-tury or more.

Quite often the problem is very constrained; thisis the case when the shape change is required tobe a retrofitable modification that can be appliedto aircraft already in service. Occasionally, we canbegin with a clean slate, such as in the design ofan all-new airplane. And the problems cover thefull spectrum of studies in between these two ex-tremes. Let’s note a couple of items about this set-ting. First, in order to realize a true improvementto the baseline configuration, a high-fidelity andvery accurate computational fluid dynamics (CFD)method must be employed to provide the aerody-namic metrics of lift, drag, pitching moment, span-load, etc. Even with this, measures should be takento estimate the possible error band of the final anal-yses: this discussion is beyond the scope of theselectures. The second item to consider is relatedto the definition of the design space. A commonpractice is to use a set of basis functions which ei-ther describe the absolute shape of the geometry, ordefine a perturbation relative to the baseline con-figuration. In order to realize an improvement tothe baseline shape, the design space should not beartificially constrained by the choice of the set ofbasis functions. This can be accomplished with ei-ther a small set of very-well-chosen basis functions,or with a large set of reasonably-chosen basis func-tions. The former approach places the burden onthe user to establish an adequate design space; thelatter approach places the burden on the optimiza-tion software to economically accommodate prob-lems with large degrees of freedom. Over the pastdecade, the authors have focused on solving theproblem of aerodynamic shape optimization utiliz-ing a design space of very large dimension. Theinterested reader can find copious examples of thealternative approaches throughout the literature.

With some understanding of where we areheaded, let’s now return to the simple model prob-lems, review various aspects of the optimizationprocess, and discuss how they relate to the aero-

dynamic shape optimization problem at hand. Thefirst model problem introduces some of the basics;the second one is a classic example in mathematicalhistory.

2 The Spider & The Fly

In our first model problem, we will discuss how toset up a design space, and how to navigate thisdesign space from an initial state towards a localoptimum. We will also talk about some traps toavoid when setting up a problem of optimization.

The original spider and fly problem was first in-troduced in Reference [1]. In our version of thespider-fly problem, we have a wooden block withdimensions of 4 in wide, by 4 in tall, by 12 in long;the bottom of this block is resting on a solid flatsurface. See Figure 1. On one of the square endssits a spider, located 1 in from the top and centeredleft-to-right. On the opposite side a fly is trappedin the spider’s web; the fly is located 1 in from thebottom and centered left-to-right. The spider con-siders the path where he would initially travel 1 in

straight up to the top, then 12 in axially across thetop face, then 3 in downward to the fly; the lengthof this path is 16 in.

As it turns out, the spider was a mathemati-cian in a former lifetime, so he wonders if this isthe minimum-length path possible. In order tosolve this enigma, the spider sets up a problem ofoptimization. To cast this optimization probleminto a mathematical formulation, the spider mustsome how constrain his motion to the surface ofthe wooden block, and furthermore, he knows hecannot traverse the bottom side of the block as itis resting on the solid flat surface. It is clear thatthe aforementioned 16 in path is a local optimum,and due to the symmetry of the problem, there arereally only two other types of paths that need tobe studied. In one type, the spider moves later-ally 2+ in toward the right/left side of the block,then 12+ in across that side towards the back, then2+ in to the trapped fly. Here, the length of thepath is definitely in excess of 16 in, and therefore itcannot be the global optimum. The remaining pathtype allows the spider to move upward 1+ in to thetop of the block, continue diagonally towards theright/left side, then diagonally across and down-ward towards the back face, and finally 2+ in tothe fly. See Figure 2. It is not immediately obvi-ous that the local optimum of this path type cannotalso be the global optimum. Hence, the spider mustinvestigate further.

To set up the design space for this problem,the spider adopts a cartesian coordinate system


aligned with the wooden block such that the ori-gin coincides with the front-lower-left corner of theblock. The x coordinate measures positive to theright, the y coordinate measures positive along thelong side of the block away from the front face,and the z coordinate measures positive upward inthe vertical direction. In this coordinate system,the spider’s initial position on the front face is(XS, Y S, ZS) = (2, 0, 3), and the fly is trapped onthe back face at (XF, Y F, ZF ) = (2, 12, 1).

The path type to optimize can be partitioned intofour segments, corresponding to the four block facesto be traversed. The first segment is described byend points (2, 0, 3) and (X, 0, 4). The second seg-ment’s end points are (X, 0, 4) and (4, Y, 4). Thethird, (4, Y, 4) and (4, 12, Z). The fourth, (4, 12, Z)and (2, 12, 1). Hence, the complete path can be de-scribed as the piecewise linear curve that connects(2, 0, 3), (X, 0, 4), (4, Y, 4), (4, 12, Z), and (2, 12, 1).In this design space, there are precisely three de-sign variables (X, Y, Z). Further, the design spaceis constrained by the inequalities:

0 ≤ X ≤ 4,

0 ≤ Y ≤ 12,

0 ≤ Z ≤ 4.

(1)

Hence, the design space as defined in this problemis constrained to the interior of the wooden block.

The length of each segment is given as:

S1 =[

1 + (X − 2)2]

1

2 ,

S2 =[

(X − 4)2 + Y 2]

1

2 ,

S3 =[

(Y − 12)2 + (Z − 4)2]

1

2 ,

S4 =[

(Z − 1)2 + 4]

1

2 .

(2)

The total path length (or objective/cost function)is defined as:

I ≡ S = S1 + S2 + S3 + S4. (3)

The statement of optimization is to minimize I sub-ject to the constraints of Eqn (1).

There are many ways in which one can proceedto solve this problem of optimization. For exam-ple, one approach is to use evolution theory, wherea population of random guesses of (X, Y, Z)i is eval-uated for their associated set of cost-function val-ues, Ii. This establishes a generation of informa-tion which can be used to coerse subsequent gener-ations towards the optimum location within the de-sign space. An improvement to basic evolution the-ory is the Genetic Algorithm (GA). GAs attempt tospeed the evolution process by combining the genes

of promising pairs from one generation to procre-ate the next. GAs also allow some fraction of mu-tations to occur in order to improve the chance of

finding a global optimum. However in general, thisis not guaranteed. These methods are relativelyeasy to set up and program as they do not requireany gradient information, and in fact may be thebest choice if the cost function does not smoothlyvary throughout the design space. Unfortunately,they can be computationally very expensive, evenfor a problem with a modest number of design vari-ables. Nonetheless, solving the spider-fly problemwith evolution methods can be entertaining, and werecommend it to the ambitious student as a follow-up exercise to this lecture.

Let’s now consider gradient-based optimizationtechniques. These optimization methods use thederivative of the objective function with respect tothe design space. One can approximate this gra-dient by finite differences, where the amplitude ofeach design variable is independently perturbed bya small delta from the current state and the costfunction reevaluated. This requires N + 1 functionevaluations to approximate the gradient, where N

is the number of design variables. (N is also re-ferred to as the number of degrees of freedom, oras the dimension of the design space.) Because ofthe nature of the optimizations the authors are con-cerned with, this approach is too costly to estab-lish the gradient. So alternative schemes will bereviewed. One example is included next.

Let’s return to the definition of the cost functiongiven by Eqns (2-3). It is straightforward to findthe first variation of the cost function as:

δI = IXδX + IY δY + IZδZ ≡ G δX (4)

where G is the gradient vector, X is the designspace vector, and the partial derivatives with re-spect to the design space are given as:

IX =(X − 2)

S1+

(X − 4)

S2,

IY =Y

S2+

(Y − 12)

S3, (5)

IZ =(Z − 4)

S3+

(Z − 1)

S4.

Now find the Hessian matrix (second derivatives)for this problem; it will be needed to navigate thedesign space with a Newton iteration. The Hessianmatrix is:

A =

IXX IY X IZX

IXY IY Y IZY

IXZ IY Z IZZ

(6)


where,

IXX = 1S3

1

+ Y 2

S3

2

IXY = IY X = (4−X)YS3

2

IXZ = IZX = 0

IY Y = (X−4)2

S3

2

+ (Z−4)2

S3

3

IY Z = IZY = (Y −12)(4−Z)S3

3

IZZ = (Y −12)2

S3

3

+ 4S3

4

(7)

One can use the gradient to establish a steepestdescent trajectory of the cost function through thedesign space. This trajectory begins with the ini-tial state (baseline) and ends at a local minimum.From Eqn (4) it can be seen that a reduction in thecost function is realized if a sufficiently small stepis taken in the negative gradient direction, and alocal minimum is found when the magnitude of thegradient vanishes. Accordingly we can iterativelyupdate the value of the design variables in the fol-lowing manner.

Xn+1 = Xn + δXn, (8)

where n is the iterative step number, and

δXn = −λG. (9)

Here, λ > 0 is a step-size parameter. Thus,

δIn = G δXn = −λG2 ≤ 0. (10)

In a steepest descent procedure, the value of thefree parameter λ is usually limited by the stabilityof the iterative process. We will discuss this fur-ther in our second model problem. Through someexperimentation, a value of λ = 1.885 yields aboutthe fastest convergence to the optimum state for thespider-fly problem. Using 64-bit mathematical op-erations, the exact minimum-distance path is foundwithin machine-level-zero in 295 iterations. Theconvergence of the gradient is shown in Figure 3.Here, the magnitude of the gradient has been re-duced more than 7.5 orders of magnitude. Figure 4provides the trajectory of the optimization processthrough the design space from start to finish. No-tice the top view of this trajectory. Here one cansee the high-frequency zig-zag nature of this navi-gation, but it appears that the general trend of thetrajectory is correct. If the high-frequency behav-ior could be filtered and the low-frequency trendamplified, then convergence to the optimum couldbe accelerated. Such a technique will be addressedin our second model problem.

Now let us solve this optimization problem witha Newton iteration. To develop a Newton iteration,

simply replace the variation of the design variablesof Eqn (9) with

δXn = −A−1G, (11)

where A−1 is the inverse of the Hessian matrix ofEqns (6-7). Using this approach, the exact solutionto machine-level-zero is found in just 3 steps. Fig-ure 5 provides the convergence of the gradient, andas expected, this convergence has a quadratic be-havior. Figure 6 illustrates the corresponding nav-igation through the design space.

In this model problem, the superior performanceof the Newton iteration over that of the steepestdescent method may lead one to abandon steep-est descent in favor of an approach that utilizesthe Hessian. However, let’s investigate the require-ments of a Newton method in more depth. First,the construction of the Hessian matrix can costO(N) times that of the gradient, unless the Hessianhappens to be a sparse matrix. In the case of thespider-fly problem, this additional computationalwork trades very favorably, (i.e., 3(1 + 3) << 295).However, when N is moderately large, this mayno longer be the case. A second issue is that theterms of the Hessian matrix may not be explic-itly available, and in fact this is usually the casefor aerodynamic shape optimization problems. Inour second model problem we will show how onecan construct the Hessian as part of the design-space navigation, but this requires O(N) iterations.Hence, for the class of optimization problems whereN is very large, we conclude that a Newton itera-tion may not be the best choice. Instead, we revisitsteepest descent as the foundation of our optimiza-tion process, and develop techniques that acceleratethe convergence of the steepest-descent trajectory.

In the case of the spider-fly problem, the set up isstraight forward. In fact, it is somewhat difficult toenvision how to set it up otherwise. Our definitionof the design space is directly linked to the intersec-tion of the spider’s path with the three edges of thewooden block. All possible straight-line-segmentpaths that cross these three edges are representedin the constrained design space. The human mindis an amazing thing; it routinely filters informationthat does not apply to a given situation. Unfortu-nately, at times it hides some pertinent data. Inthe case of the spider-fly problem, this may indeedoccur. To explain what we mean by this, we definethe spider-fly problem from a different perspective.

The spider-fly problem falls within the class ofproblems which seek the geodesic between twopoints on an arbitrary surface. The geodesic is theminimum-length path that connects the two pointsand is confined to the surface. For example, thegeodesic between two cities on Earth is given as


the great-circle arc (assumes the Earth is a perfectsphere). Now replace the wooden-block with thatof a super-ellipsoid surface given by:[

|x − 2|

2

]p

+

[

|y − 6|

6

]p

+

[

|z − 2|

2

]p

= 1 (12)

where p ≥ 2 is an arbitrary power. The spider’sinitial position is:

XS = 2

Y S = 6

[

1 −

[

1 −1

2p

]1

p

]

(13)

ZS = 3

and the location of the trapped fly is:

XF = 2

Y F = 12 − Y S (14)

ZF = 1

In the limit as p → ∞, the super-ellipsoid surfaceapproaches that of the original wooden block.

Yet one would probably set up the correspond-ing optimization problem very different than as wedid before. Most likely the geodesic would be ap-proximated by a discrete path of N + 1 piecewise-linear segments. This can be done in any num-ber of ways. For example, one can seek a uniformsegment-length path, or possibly a path with con-stant ∆y segments. If the constant ∆y approach ischosen, then the discrete design space can be de-fined by N angular positions at each of the interiory stations.

Let’s make some observations regarding this setup of the geodesic problem. First, the evaluation ofthe discrete cost function (total length of the dis-crete path) will not be equal to that of the contin-uum geodesic. As a consequence, the nodes of theoptimum discrete path will not fall on the contin-uum geodesic. The accuracy of the discrete prob-lem can be improved by increasing the dimensionof the design space, but the computational costswill escalate as well. In this approximation of thespider-fly problem, the value of the discrete costfunction will be less than the length of the contin-uum geodesic. This is due to the fact that each dis-crete line segment takes a short cut from end-pointto end-point as compared with the curved contin-uum geodesic. Furthermore, for the constant ∆y

representation, the error will degrade to first-orderaccurate on ∆y in the limit as p → ∞. Given theseobservations, one should wonder if there may bea better way to seek the discrete geodesic besidesdriving the gradient of the discrete cost functionto zero. One approach that addresses this will beintroduced in our next model problem.

Before we finally leave the spider-fly problem, hasthe reader determined how to directly compute theexact length of the geodesic on the wooden block?The solution to this problem can be found quitesimply if one thinks of the wooden block insteadas a cardboard box which can be unfolded andflattened out as illustrated by Figures 7-8. Here,Imin = (250)

1

2 ≈ 15.811 in. The correspondingoptimum position in our design space is precisely(X, Y, Z)opt = (7

3 , 5, 53 ).

This concludes our discussion on the spider-flymodel problem. We hope this exercise has peakedthe reader’s interest, as well as challenged him/herto think beyond the norm. Our second model prob-lem is based on the classic brachistochrone.

3 Brachistochrone Problem

The second model problem that we will study inthis lecture is the classic brachistochrone. The au-thors’ original work on this problem is documentedin References [2]-[3].

Much of the fundamental theory to the branchof mathematics known as the calculus of variationsis attributable to the Bernoulli brothers, John andJames, who were friendly rivals. They would designnew mathematical problems to stimulate each otherin the form of challenges. One of which is knownas the brachistochrone.

The brachistochrone problem [4] is the determi-nation of the path y(x) connecting initial and finalpoints (x0, y0) and (x1, y1) such that the time takenby a particle traversing this path, subject only tothe force of gravity, is a minimum.

-

?

-

s

y

?

(x0, y0)

(x1, y1)

x

y

g

The total time is given by

T =

∫ x1

x0

ds

v

where the velocity of a particle falling under theinfluence of gravity, g, and starting from rest aty = 0, is

v =√

2gy


Denoting dydx

by y′, and setting ds =√

(1 + y′2)dx

one finds that T = I√2g

where

I =

∫ x1

x0

F (y, y′)dx

with

F (y, y′) =

√

1 + y′2

y.

Then

G =∂F

∂y−

d

dx

∂F

∂y′ (15)

= −

√

1 + y′2

2y3

2

−d

dx

y′√

y(1 + y′2)

which may be simplified to

G = −1 + y′2 + 2yy′′

2(y(1 + y′2))3

2

. (16)

[Note that Eqn (15) is the Euler-Lagrange equationwhen set to zero.] In this case, since F is not afunciton of x,

(

y′ ∂F

∂y′ − F

)′

= y′′ ∂F

∂y′ + y′ d

dx

∂F

∂y′ −∂F

∂y′ y′′ −

∂F

∂yy′

= y′(

d

dx

∂F

∂y′ −∂F

∂y

)

= −y′G.

On the optimal path G = 0 and hence,(

y′ ∂F

∂y′ − F

)

is constant.

Here(

y′ ∂F

∂y′ − F

)

=−1

√

y(1 + y′2).

Hence it follows that√

y(1 + y′2) = C

where C is a constant. The classical solution isobtained by the substitution

y(t) =1

2C2(1 − cos(t))

= C2sin2

(

t

2

)

. (17)

Then

y′2 =C2

y− 1

y′ =

√

C2 − y

y

= cot

(

t

2

)

and

x =

∫

dy

y′

=

∫

tan

(

t

2

)

dy

dtdt

wheredy

dt= C2sin

(

t

2

)

cos

(

t

2

)

.

Thus, if one choses x0 = 0,

x(t) = C2

∫

sin2

(

t

2

)

dt

=1

2C2

∫

(1 − cos(t))dt

=1

2C2(t − sin(t)). (18)

The optimal path described by Eqns (17) & (18)is a cycloid. However, when a numerical method isto be adopted, a discussion regarding the discreteproblem is in order. The next sections discuss nu-merical procedures for solving the brachistochroneproblem. Section 4 discusses the approximation ofthe gradient, while Section 5 discusses alternativedescent procedures.

4 Gradient Calculations

This section describes two approaches to approx-imating the gradient of the objective function, I.The first is based on deriving the gradient of thecontinuous problem, then approximating this con-tinuous gradient through numerical discretization.We will refer to this as the continuous gradient.The second scheme determines the exact gradientof the discrete cost function. We will refer to thisas the discrete gradient. In either case, the result-ing gradient calculations are only approximationsof the exact gradient of the objective function ofthe continuous problem. A discussion at the end ofthis section addresses the benefits and deficienciesof each approach.

4.1 Continuous Gradient

Numerical optimization methods use the gradientG as the basis of a search method. In developing


the gradient for the brachistochrone problem, letthe trajectory be represented by the discrete values

yj = y(xj)

atxj = j∆x

where ∆x is the mesh interval, 0 ≤ j ≤ N + 1 andN is the number of design variables. The gradientis evaluated by applying a second-order finite dif-ference approximation to the continuous gradient,Eqn (16), by substituting

y′j =

yj+1 − yj−1

2∆x

y′′j =

yj+1 − 2yj + yj−1

∆x2

for y′ and y′′ at the jth point. This yields the fol-lowing approximation of the continuous gradient atthe discrete points j.

Gj = −1 + y′2

j + 2yjy′′j

2(yj(1 + y′2j ))

3

2

(19)

4.2 Discrete Gradient

In the second approach, the discrete cost functionis calculated by approximating I by the rectanglerule of integration.

IR =N∑

j=0

Fj+ 1

2

∆x

where

Fj+ 1

2

=

√

√

√

√

1 + y′2j+ 1

2

yj+ 1

2

with

yj+ 1

2

=1

2(yj+1 + yj)

y′j+ 1

2

=(yj+1 − yj)

∆x.

The rectangle rule of integration gives second-orderaccuracy for a smooth integrand. In this case, itsuse allows the evaluation of I even when y0 = 0at the left boundary, where v = 0 and F becomesunbounded.

The discrete gradient is now evaluated by directlydifferentiating IR,

Gj =∂IR

∂yj

It may be verified that

Gj = Bj− 1

2

− Bj+ 1

2

−∆x

2(Aj+ 1

2

+ Aj− 1

2

) (20)

where A and B are calculated by evaluating

Aj+ 1

2

=

√

1 + y′2

2y3

2

Bj+ 1

2

=y′

√

y(1 + y′2)

with the values yj+ 1

2

and y′j+ 1

2

.

4.3 Continuous vs Discrete Gradient

Consider for a moment, that an optimization of thecontinuous problem is the actual goal. In general,the optimum state of the discrete problem will notcoincide with that of the continuous system, but ofcourse, one hopes that it will be “close.”

An advantage of the discrete gradient is that itdirectly relates to the objective function as it isevaluated numerically. Consequently, if a local op-timum is found by driving the discrete gradient tozero, it may be easily verified that this is indeed alocal optimum by making small perturbations andre-evaluating the objective function. If the searchprocedure involves line searches to find the mini-mum along a given search direction, the searcheswill also be compatible with the calculated gradi-ent.

When the continuous gradient is driven to zero,local perturbations of the discrete cost function donot necessarily verify that a local minimum hasbeen obtained. However, we offer a conjecture thatan optimization based on the continuous gradientmay be closer to the optimum of the continuousproblem than that based on the discrete gradient.Our reasoning is based on the following argument.Consider a smooth curve defined by I = f(y),where f(y) is a known function. Now assume thatf(y) cannot be evaluated directly, only measuredinexactly. Through these measurements, f(y) is

approximated by f(y), where

f(y) = f(y) + Ef

and Ef is the error associated with discretizationof f(y). This error can become further amplifiedif a derivative of f(y) is sought, especially if Ef isa high-frequency error. Conversely, if the quantityg(y) = f ′(y) can be measured, we have

g(y) = g(y) + Eg

where Eg is the error associated with the discretiza-tion of g(y).

Now if the inequality

||Eg|| < ||d

dyEf ||


holds true, then g(y) represents the gradient more

accurately than f ′(y) and it should follow that anoptimization based on g(y) will be more accurate

than one driven by f ′(y).In the next section, several search methods are

discussed which have been under investigation inthis effort.

5 Search Methods

A variety of search methods have been evaluated,including steepest descent, modified steepest de-scent with smoothing, implicit descent, multigridsteepest descent, Krylov acceleration, and quasi-Newton methods. These are outlined below. Inall cases, the values yj are regarded as the designvariables.

5.1 Steepest Descent

Here a simple step is taken in the negative gradi-ent direction. Such a strategy is reviewed furtherin Reference [5]. Denoting the iterations with thesuperscript n, we have

yn+1j = yn

j − λGnj .

This may be regarded as a forward Euler discretiza-tion of a time dependent process with λ = ∆t.Hence,

∂y

∂t= −G.

Substituting for G from Eqn (16) it can be seen thaty is the solution of the nonlinear parabolic equation

∂y

∂t=

1 + y′2 + 2yy′′

2(y(1 + y′2))3

2

. (21)

Thus it is possible to estimate the time step limitfor stable integration. This is dominated by theparabolic term βy′′ where

β =y

(y(1 + y′2))3

2

.

This gives the following estimate on the time steplimit for a stable forward Euler explicit scheme.

∆t⋆ =∆x2

2β.

Thus the number of iterations can be expected togrow as the square of the number of design vari-ables. Hence, this approach is prohibitively expen-sive to apply to large scale engineering problems ofinterest.

5.2 Smoothed (Sobolev) Descent

In trajectory optimization problems such as thebrachistochrone problem, one may anticipate thatthe optimum solution will be a smooth curve. Thissuggests that neighboring points defining the tra-jectory should not be perturbed independently dur-ing the optimization process, but rather be movedso that the modified trajectory remains smooth. Inthe case of aerodynamic design, the described aero-dynamic shapes will generally be smooth (except atoccasional corners such as the trailing edge of thewing or at component intersections). This moti-vates the introduction of smoothing directly intothe optimization process.

Here, the gradient G is replaced by a smoothedgradient G defined by the implicit smoothing equa-tion.

G −∂

∂xǫ∂G

∂x= G. (22)

Now one setsδy = −λG (23)

whereλ = ∆t ≡ STEP ∆t⋆

and STEP is a CFL-like input parameter.Then to first order the variation in I is

δI =

∫ x1

x0

Gδydx = −λ

∫ x1

x0

(

G −∂

∂xǫ∂G

∂x

)

Gdx.

Then, integrating by parts and using the fact thatthe end points are fixed,

δI = −λ

∫ x1

x0

(

G2 + ǫ

(

∂G

∂x

)2)

dx.

Thus descent is assured with an arbitrarily largevalue of the smoothing parameter ǫ.

The smoothing acts as a preconditioner. In prac-tice it is noted that λ (or∆t) can be doubled whenǫ is doubled over a wide range, and it is possible toincrease ∆t to very large values. It should also benoted that if the optimimum shape turns out not tobe a smooth shape, this preconditioner still allowsthe optimum to be recovered. However, the precon-ditioner forces the trajectory to approach the non-smooth optimum shape from a set of increasingly-conforming smooth shapes. Next, it is shown thatthe implicit smoothing operator is equivalent tocasting the gradient in a Sobolev space.

Define a modified Sobolev inner product

〈u, v〉 =

∫

Ω

(uv + ǫ∇u · ∇v)dΩ ,

then〈u, v〉 = (u, v) + (ǫ∇u,∇v)


where the (u, v) is the standard inner product inL2. Integration by parts yields

〈u, v〉 = (u −∇ (ǫ∇u) , v) +

∫

∂Ω

ǫv∂u

∂nd∂Ω.

Using the inner product notation the variation ofthe cost function I can be expressed as

δI = (G, δy) = 〈G, δy〉 =(

G − ∇(

ǫ∇G)

, δy)

.

Therefore we can solve implicitly for G

G − ∇(

ǫ∇G)

= G.

Then, if one sets δy = −λG,

δI = −λ〈G, G〉 = −λ(

G, G)

,

and an improvement is assured if λ is sufficientlysmall and positive, unless the process has alreadyreached a stationary point at which G = 0 (andtherefore G = 0).

5.3 Implicit Descent

As it turns out, if the gradient is dominated by a y′′

term, as is the case with the brachistochrone, seeEqn (21), the smoothed descent given by Eqn (23)can be shown to be equivalent to an implicit step-ping scheme if the appropriate choice of ǫ is used.

Consider the parabolic equation

∂y

∂t= β

∂2y

∂x2

where β is variable. Solving this with an implicitscheme, the resulting system is

−αδyj−1 + (1 + 2α)δyj − αδyj+1 = −∆tGj (24)

where δyj is the correction to yj ,

α =β∆t

∆x2=

∆t

2∆t⋆=

1

2STEP (25)

and

Gj =β

∆x2(yn

j−1 − 2ynj + yn

j+1).

Combining Eqns (22 &23), the discrete smootheddescent method assumes the form of Eqn (24) with

α =ǫ

∆x2. (26)

Comparing Eqn (25) with Eqn (26), one can see us-ing the smoothed gradient is equivalent to an im-plicit time stepping scheme if ǫ = 1

2STEP ∆x2.Furthermore, a Newton iteration is recovered as∆t → ∞.

While it is fortunate that the brachistochroneproblem can be solved by an implicit time step-ping scheme, in general, this may not be the casewith the more complicated optimization of aerody-namic design. However, in this paper, we will usethe performance of the implicit scheme to establisha goal for the multigrid descent methods describedbelow.

5.4 Multigrid Descent

Radical improvements in the rate of convergenceto a steady state can be realized by the multigridtime-stepping technique. The concept of accelera-tion by the introduction of multiple grids was firstproposed by Fedorenko [6]. There is now a fairlywell-developed theory of multigrid methods for el-liptic equations based on the concept that the up-dating scheme acts as a smoothing operator on eachgrid [7, 8]. In the case of a time-dependent prob-lem, one may expect that it should be possible toaccelerate the evolution of a hyperbolic system tosteady state by using large time steps on coarsegrids so that disturbances will be more rapidly ex-pelled through the outer boundary. Various multi-grid time-stepping schemes have been designed totake advantage of this effect [9, 10, 11, 12, 13]. Thepresent work adapts the scheme devised by the firstauthor to solve the Euler equations [11] to the equa-tion dy

dt= −G.

A sequence of K meshes is generated by elim-inating alternate points along each coordinate di-rection of mesh-level k to produce mesh-level k+1.Note that k = 1 refers to the finest mesh of thesequence. In order to give a precise description ofthe multigrid scheme, subscripts may be used to in-dicate grid level. Several transfer operations needto be defined. First, the solution vector, y, on gridk must be initialized as

y(0)k = Tk,k−1 yk−1 , 2 ≤ k ≤ K

where yk−1 is the current value of the solution ongrid k−1, and Tk,k−1 is a transfer operator. Next itis necessary to transfer a residual forcing function,P , such that the solution on grid k is driven by theresiduals of grid k − 1. This can be accomplishedby setting

Pk = Qk,k−1 Gk−1(yk−1) − Gk(y(0)k ),

where Qk,k−1 is another transfer operator. Now,Gk is replaced by Gk +Pk in the time-stepping suchthat

y+k = y

(0)k − ∆tk [Gk(yk) + Pk]

where the superscript + denotes the updated value.The resulting solution vector, y+

k , then provides the


initial data for grid k +1. Finally, the accumulatedcorrection on grid k has to be transferred back togrid k−1 with the aid of an interpolation operator,Ik−1,k. Thus one sets

y++k−1 = y+

k−1 + Ik−1,k

(

y++k − y

(0)k

)

where the superscript ++ denotes the result of boththe time step on grid k and the interpolated cor-rection from grid k + 1.

A W -cycle of the type shown below proves to bea particularly effective strategy for managing thework split between the meshes. In this figure, thesolid nodes indicate that full evaluations are be-ing performed while the open nodes indicate thatonly transfers of solution updates are computed.

Three-Level Multigrid W-Cycle

k = 1

k = 2

k = 3

y

y

y

y

y

i

itAAAAAA

A

AA

Recursive Stencil for aK-Level Multigrid W-Cycle

(K − 1)-LevelW-Cycle

(K − 1)-LevelW-Cycle

y itAAA

In a three-dimensional setting, the number of cellsis reduced by a factor of 8 on each coarser grid. Byexamination of these illustrations, it can be verifiedthat the work measured in units which correspondto an iteration on the finest grid is of the order of

1 +2

8+

4

64+ ... +

1

4K<

4

3.

Since Eqn (21) is a parabolic equation, its con-vergence can be accelerated by standard multigridtechniques. This has been implemented here, withoptions for both V and W cycles.

5.5 Krylov Acceleration

Consider that K linearly-independent solution-gradient states are known. This system spans a K-dimensional Krylov subspace of the N -dimensional

problem. If the system is linear, then the statessupported by this subspace can be described by thefollowing recombination of known y − G pairs.

y⋆ =

K∑

k=1

γkyk , G⋆ =

K∑

k=1

γk Gk

whereK∑

k=1

γk = 1.

Convergence of the descent optimization proce-dures can be accelerated by minimizing the L2

Norm of G⋆ to determine the appropriate recombi-nation coefficients γk. See references [14, 15]. Now,the iterative process becomes

yn+1j = y⋆

j − λG⋆j .

It should be noted that the above accelerationcan fail if the preconditioned state of the currentiteration is not linearly independent of the previ-ous Krylov subspace. This can occur if the previ-ous Krylov subspace happens to contain the zero-gradient state, and this “failure” is fortuitous sincethe converged solution is prematurely determined.Another cause for failure is a poor choice of pre-conditioner, however, this should not be the casewith the method of steepest descent as the gra-dient at y⋆ should be nearly perpendicular to theprevious Krylov subspace. In a linear setting, itis guaranteed that the new state as given by thepreconditioner is not contained within the previousKrylov subspace if the L2 Norm of the gradient ofthe new state is less than that of G⋆ of the previousiteration.

5.6 Quasi-Newton Methods

Quasi-Newton methods are widely regarded as themethod of choice for general optimization problemswhenever the gradient can be readily calculated.These methods estimate the Hessian (A) or its in-verse (A−1) from changes in the gradient (δG) dur-ing the search steps. By the definition of A, to firstorder

δG = A δy.

Let Hn be an estimate of A−1 at the nth step. Thenit should be required to satisfy

HnδGn = δyn. (27)

This can be satisfied by various recursive updat-ing formulas for H , which also have the hereditaryproperty that if A is constant, as is the case of a


quadratic form, Eqn (27) holds for all of the previ-ous steps δyk, k < n. Consequently H becomes anexact estimate of A−1 in N steps.

Three quasi-Newton methods have been tested inthis work. The first is the rank one (R1) correction

Hn+1 = Hn +Pn(Pn)T

(Pn)T δGn. (28)

wherePn = δyn − HnδGn.

The second method is the Davidon-Fletcher-Powell (DFP) rank-two updating formula

Hn+1 = Hn

+δyn(δyn)T

(δyn)T δGn−

HnδGn(δGn)T Hn

(δGn)T HnδGn. (29)

In order to assure N -step convergence for aquadratic form, this method requires the use ofexact line searches to find the minimum objectivefunction in the search direction at each step. Thisis because the hereditary property depends on theorthogonality of each new gradient with the previ-ous search direction, i.e., (Gn+1)T δyn = 0. Hence,many more evaluations of the objective function arerequired for DFP than R1. However, DFP has anadvantage in that the updated estimates of the in-verse Hessian, Hn, remain positive-definite.

The third quasi-Newton method tested uses theBroyden-Fanno-Goldfarb-Shannon (BFGS) updat-ing formula

Hn+1 = Hn

+

(

1 +(δGn)T HnδGn

(δGn)T δyn

)

δyn(δyn)T

(δGn)T δyn

−HnδGn(δyn)T + δyn(δGn)T Hn

(δGn)T δyn. (30)

This corresponds to the DFP formula applied toestimate A rather than A−1. As with DFP, BFGSalso requires exact line searches to assure N -stepconvergence for a quadratic form. In practice how-ever, BFGS is usually preferred over DFP becauseit is considered to be more tolerant to inexact linesearches, and therefore a more robust procedure.

We should note that the line searches performedfor the Rank-2 quasi-Newton methods were imple-mented to locate the position where the local gra-dient is orthogonal to the search direction. For thediscrete gradient, this location coincides with theminimum measurable cost function along the line.However, for the continuous gradient, this may notbe the case. The motivation for this choice of linesearch (as opposed to finding the minimum mea-surable cost) is to ensure that the Rank-2 systemsremain positive-definite for the continuous gradientoptimizations. (See Reference [16], pp 54-55).

6 Results

The various methods described in this paper havebeen exercised to better understand their accuracyand performance. Specifically, three items are ad-dressed. The first is whether the discrete or thecontinuous gradient is necessarily better than theother either by yielding more accurate solutions, orby reducing the cost of the search procedure. Thesecond is whether an alternate search strategy canbe devised which out performs the quasi-Newtonapproaches, when applied to trajectory and shapeoptimization problems, where the solution is ex-pected to be smooth. And the third item discussedis regarding the robustness of the various methodsas applied to the brachistochrone problem.

6.1 Accuracy

Both the continuous and discrete gradient-basedoptimizations have been applied to the brachis-tochrone problem over a wide range of mesh sizes.The model problem analyzed in this work is givenby Eqns (17 &18) with C = 1. Furthermore, tominimize the effects of the left-hand singularity,the boundary conditions were determined by usingt0 = π

2 and t1 = π. A summary of this data relatedto accuracy is provided in Figures 9-13.

Figures 9& 10 illustrate the convergence of theoptimization process being driven by the continu-ous and discrete gradients, respectively. Here, thenumber of design variables is N = 31 and conver-gence is achieved using the implicit stepping ap-proach. Figure 11 is analogous to Figure 9 for thecase of N = 511 design variables. In these plots,the L2 Norm of both gradients and a measure ofthe converged path’s accuracy are provided. Theaccuracy of these results are monitored with theroot-mean-square difference of the optimized dis-crete trajectory with that of the exact solution.

YERR =

√

∑N

j=1 (yj − yexact(xj))2

N

Whether the optimization is driven by the contin-uous or discrete gradient, the convergence histo-ries look very similar. The main difference betweenthese results is that the converged value of YERR

levels off at a lower value for the continuous gra-dient (10−4.619) than it does for the discrete gra-dient (10−4.332). This character persists over thecomplete range of mesh sizes tested in this workas shown in Figure 12. Here, NX is the numberof mesh intervals, thus N = NX − 1. While theadvantage of the continuous gradient is apparentfor the brachistochrone problem, in general, this


may not be the case. Yet, these data clearly il-lustrate that an optimization based on the discretegradient is not necessarily the best choice. In eithercase, Figure 12 illustrates that both approaches aresecond-order accurate.

Using YERR as a measure of accuracy is only pos-sible because the exact optimal solution is knownfor this problem. Typically, problems of optimiza-tion do not know what the exact solution is andtherefore must rely on another figure of merit, usu-ally the measurable cost function. For the brachis-tochrone problem, this is IR. Even though Fig-ure 12 clearly illustrates that the continuous gradi-ent is more accurate than the discrete gradient, weknow that IC

R will be greater than (or equal to) IDR

by definition. Here ICR is the measured cost func-

tion as optimized by the continuous gradient, whileIDR is that optimized by the discrete gradient. This

surplus, defined by

Surplus = ICR − ID

R

is illustrated in Figure 13 as a function of meshsize. Interpretating this surplus as a disadvantageof the continuous gradient relative to the discretegradient is unwarranted, as evidenced by Figure 12.However, since both approaches are second-orderaccurate, it should come as no surprise that thissurplus also diminishes in a second-order mannerwith increasing number of design variables.

6.2 Performance

The performances of the various optimization tech-niques are verified in this section. Detailed illustra-tions of path histories as well as gradient conver-gences are given in Figures 14-30 for the differentstrategies on a problem involving N = 31 & 511design variables.

Since an implicit stepping for the brachistochroneis possible, its performance is used as a goal toachieve in the design of an explicit scheme. Asshown in Figures 14 & 16, the transient path shapesof the implicit scheme rapidly converges to the finalstate within a few iterations, regardless of problemdimension. The rapid convergence of this schemeis also exemplified by the quick decay of the corre-sponding gradients as shown in Figures 15 & 17.

Figures 18-21 provide path histories and gradientconvergences for the three quasi-Newton methods.It is interesting to note that the character of thepaths from iteration to iteration is distinctly dif-ferent than that of the implicit scheme shown be-fore in Figures 14 & 16. The main cause for thisdifference is that the quasi-Newton methods effec-tively take a uniform time step, whereas the otherdescent schemes investigated in this work employ

a non-uniform time step dictated only by stabilityconsiderations. It is also interesting to note thatfor N = 31, the two methods which estimate A−1

(Rank-1 and DFP), follow nearly identical conver-gence histories, while the BFGS method requiresabout two extra iterations to converge the gradientnorm to an equivalent level. For example, noticein Figure 19 that the Rank-1 and DFP schemesrequired 38 iterations to converge the gradient 6orders of magnitude, while BFGS took about 40 cy-cles. These quasi-Newton methods where tested onproblems up to N = 511 design variables; they con-sistently exhibited a convergence behavior wherethe gradient norm remains relatively flat until N

iterations have been performed, then rapidly con-verging to machine-level zero in about 10 moresteps. This behavior is precisely what the theorysuggests should occur and this trend is definitelysupported by Figures 19 & 21 for N = 31 & 511,respectively. However, the reader is reminded thatthe Rank-1 algorithm does not require a line searcheach iteration, and therefore may evaluate the gra-dient up to an order of magnitude fewer times thaneither of the Rank-2 schemes. Robustness issuesaside, this can result in a significant difference inthe computational times between the various quasi-Newton methods.

Figures 22-23 give the results of the steepest de-scent approach. Recall that in Section 5.1 the sta-bility analysis of this stepping concluded that thenumber of iterations required for convergence wasdirectly proportional to N2. Even for the smallcase of N = 31 shown in these figures, more than512 cycles were performed before the path visuallyconverged and almost 6000 iterations were neededbefore machine-level zero was attained. (Becauseof the CPU requirements involved, a solution forN = 511 was not even attempted.) Clearly, thisapproach is unacceptable in the setting of largeengineering problems of interest. In this solution,STEP = 1.0 was used as it provides the fastest con-vergence rates for the steepest descent approach. Atest with STEP = 1.01 verified our stability anal-ysis as the solution quickly diverged.

In contrast with the previous results, the pathhistory for a smoothed descent is illustrated in Fig-ure 24. In this solution, STEP = 100 was used andto the eye, the final path is determined in less than8 cycles. Figure 25 gives the convergence historiesfor a variety of values of STEP = 12.5, 25, 50, 100.These data confirm that a doubling of the step sizeessentially halves the number of iterations requiredfor convergence, up to a point.

Figures 26-27 provide the results of applying thesmoothed descent scheme coupled with Krylov ac-celeration. In this case, the path history looks to


settle down after only 4 iterations while the gradi-ent norm exhibits a quadratic-like convergence be-havior. Figure 27 indicates that the Krylov accel-erated convergence is almost invariant of the valueof STEP in the preconditioned step. To obtain 6orders of magnitude reduction requires between 12& 14 steps for STEP ≥ 25. For reference, this fig-ure also includes the best convergence curve of Fig-ure 25 to better illustrate the positive effect thatthis acceleration scheme has on the convergence ofthe optimization process.

Application of a multigrid W-cycle to drive theconvergence of the brachistochrone is now stud-ied. Figures 28-29 give the corresponding path his-tory and gradient convergence plots as seen before.Amazingly, the path of the second cycle is almostwithin symbol width of the final path. The effectof number of mesh levels, NMESH , used in themultigrid convergence is shown in Figure 29. Here,with NMESH = 1, a smoothed descent scheme isrecovered; increasing the number of multigrid levelssignificantly increases the convergence rate. Fur-thermore, while this figure depicts the convergencefor NX = 32, it was observed that the multigridconvergence history was grid independent if one al-ways used NMESH = log2(NX).

The enhancements incorporated beyond the orig-inal steepest descent method have systematicallyimproved the convergence of the optimization pro-cess to the point where the performance of the im-plicit scheme has been achieved. The final enhance-ment to the multigrid scheme is to incorporatethe Krylov acceleration. Figure 30 compares thecomplete ensemble of enhancements to our explicitscheme with the performance of the implicit step-ping. For all practical purposes, the convergencerates of these two schemes are identical. For com-parison, this figure also includes the NMESH = 5multigrid convergence of Figure 29, which is notice-ably slower than that with the Krylov acceleration.However, the convergence histories depicted in thisfigure, for these three schemes, are essentially inde-pendent of the dimensionality, N . (See Figure 31.)

Figure 31 compares the performance of several ofthe various schemes. Here, ITERS is the numberof iterations required for the gradient norm to be re-duced at least six orders of magnitude. This figurerepresents problems ranging in size from N = 3 de-sign variables up to N = 8191. (In this figure, NX

is the number of mesh intervals, thus N = NX−1.)The basic steepest descent approach exhibits anO(N2) behavior in the number of iterations re-quired, while the quasi-Newton methods indicateO(N) as expected. Using the full number of multi-grid levels and Krylov acceleration provides a gridindependent result, requiring about 8 iterations to

converge 6 orders. Finally, this is shown to comparefavorably with the implicit stepping performancewhich also yields a grid independent character, re-quiring only 7 iterations.

6.3 Robustness

We include a brief, qualitative synopsis of the ro-bustness of the various schemes studied herein.While no attempt was made to establish a met-ric which one could monitor a scheme’s level ofrobustness, the successful application of some ofthese methods proved to be sensitive to certain im-plementation issues. For example, and previouslynoted, the Krylov acceleration can “bomb” if amachine-level exact solution is found in less thanN iterations. Figure 31 clearly shows that this oc-curred for all cases where N > 8.

The behavior of the robustness of the variousquasi-Newton schemes seemed contrary to popularunderstanding. The Rank-1 scheme had no issueconverging any of the cases tested. This includedoptimizations driven by both the continuous anddiscrete gradients for problems with 3 ≤ N ≤ 511design variables. However, the Rank-2 schemeswere fairly sensitive to the level of convergence ofthe line searches, and surprisingly, BFGS seemedmore sensitive to “inexact” line searches than DFP.

Subject only to the stability limits discussed in§ 5.1 & 5.2, the explicit time-stepping schemes ofsteepest descent, smoothed steepest descent, andmultigrid descent proved to be very robust. Theseoptimizations converged as anticipated for everycase tested. This included optimizations based onboth the continuous and discrete gradients for prob-lems up to N = 8191 design variables.

7 Conclusions

In this lecture, two model problems of optimiza-tion have been discussed. The first model problemof the spider-and-fly was used to introduce somebasic concepts of optimization set up; it also il-lustrated some mental traps to avoid. The secondmodel problem of the brachistochrone was used todevelop various aspects of the optimization processin much greater detail.

In the spider-fly problem, we showed how onecan set up the objective function and its associateddesign space, including constraints. Here, the costfunction of the discrete problem was equivalent tothat of the continuum; this is normally not the case.Direct differentiation of the cost function was usedto establish the gradient and Hessian matrix; alsoequivalent to that of the continuum. Both steepest


descent and Newton iteration were demonstrated.One trap avoided was on the choice of which of thesemethods is most appropriate for optimizations thatutilize design spaces of very large dimension. An-other trap discussed was how a developer envisionsthe set up of the discrete problem.

A variety of optimization techniques have beenapplied to the brachistochrone problem. These in-clude gradient approaches based on both the con-tinuous as well as discrete forms. Results showthat, at least for the brachistochrone problem, thecontinuous gradient yields a slightly more accuratesolution than does the discrete gradient; a prossiblereason for this outcome is included.

The solution of the nonlinear optimization prob-lem is accomplished with explicit and implicittime stepping schemes which are also comparedwith several quasi-Newton algorithms. The datapresented herein illustrate that an explicit time-stepping scheme can be constructed whose conver-gence properties are invariant with the dimension-ality of the problem. Furthermore, the performanceof this explicit scheme rivals that of the implicittime stepping. These trends have been verified onproblems ranging from N = 3 design variables upto N = 8191 unknowns.

Finally, the performance of quasi-Newton meth-ods is consistent with theoretical estimates in thattheir convergence (i.e., iterations required) is lin-early dependent on the number of design variables.

For large engineering optimization problems ofinterest, N > O(103), it is obvious that quasi-Newton methods can become prohibitively expen-sive in terms of both computational and memoryrequirements. However, it is encouraging that thepossibility exists of constructing an efficient explicitscheme based on the straightforward techniques in-vestigated under this research.

References

[1] H. E. Dudeney. The Spider and The Fly. The

Weekly Dispatch Newspaper, 14 June 1903.Also published in The Canterbury Puzzles and

Other Curious Problems, London, 1907.

[2] A. Jameson and J. C. Vassberg. Studies of al-ternative numerical optimization methods ap-plied to the brachistochrone problem. Compu-

tational Fluid Dynamics Journal, 9(3), Octo-ber 2000. Special Issue on CFD and Educa-tion, Dedicated to Dr. Koichi Oshima; Emeri-tus Professor of University of Tokyo.

[3] A. Jameson and J. C. Vassberg. Studies of al-ternative numerical optimization methods ap-

plied to the brachistochrone problem. In Pro-

ceedings of OptiCON’99, Newport Beach, CA,October 1999.

[4] G. B. Thomas. Calculus and Analytic Geome-

try. Addison-Wesley, Reading, MA, 1960. Se-ries in Mathematics.

[5] E. Chong and S. Zak. Introduction to Opti-

mization. Wiley, New York, NY, 1995.

[6] R. P. Fedorenko. The speed of convergence ofone iterative process. Comp. Mathematics and

Mathematical Physics, 4:227–235, 1964.

[7] A. Brandt. Multi-level adaptive solutions toboundary value problems. Mathematical Com-

puting, 31:333–390, 1977.

[8] W. Hackbusch. On the multi-grid methodapplied to difference equations. Computing,20:291–306, 1978.

[9] R. H. Ni. A multiple grid scheme for solvingthe Euler equations. AIAA Journal, 20:1565–1571, 1982.

[10] M. G. Hall. Cell vertex multigrid schemes forsolution of the Euler equations. Reading, April1985. Proceedings IMA Conference on Numer-ical Methods for Fluid Dynamics.

[11] A. Jameson. Multigrid algorithms for com-pressible flow calculations. In W. Hackbuschand U. Trottenberg, editors, Lecture Notes in

Mathematics, Vol. 1228, pages 166–201. 2ndEuropean Conf. Multigrid Methods, Cologne,1985, Springer-Verlag, 1986.

[12] A. Jameson. Solution of the Euler equationsfor two dimensional transonic flow by a multi-grid method. Applied Mathematics and Com-

putations, 13:327–356, 1983.

[13] A. Jameson. Solution of the Euler equationsby a multigrid method. Applied Mathematics

and Computations, 13:327–356, 1983.

[14] Y. Saad. Iterative Methods for Sparse Linear

Systems. PWS Publishing Company, Boston,MA, 1996. PWS Series in Computer Science.

[15] R. W. Clark. A new iterative matrix solutionprocedure for three-dimension panel methods.AIAA paper 85-0176, AIAA Aerospace Sci-ences Meeting, Reno, NV, January 1985.

[16] R. Fletcher. Practical Methods of Optimiza-

tion. John Wiley & Sons, New York, NY, 1995.Second Edition.


Block Size

4" x 4" x 12"

Path Length 16.00"

SPIDER FLY PATH

Figure 1: Obvious local-minimum path between Spider and Fly.

Block Size

4" x 4" x 12"

Path Length Sqrt(250.0)"

~ 15.81"

SPIDER FLY PATH

Figure 2: Non-obvious local-minimum path between Spider and Fly.


00

-1

-2

-3

-4

-5

-6

-7

-8

-9

0 50 100 150 200 250 3000

Iteration

LO

G_1

0 (

GR

MS

)

Step: 1.885

Figure 3: Convergence of Gradient for Steepest Descent.

1.9

2.0

2.1

2.2

2.3

2.4

2.5

2.6

2.7

4.9

5.0

5.1

5.2

5.3

5.4

5.5

5.6

5.7

5.8

5.9

6.0

6.1

Baseline

Optimum

Top View

Y

X

1.6

1.7

1.8

1.9

2.0

2.1

4.9

5.0

5.1

5.2

5.3

5.4

5.5

5.6

5.7

5.8

5.9

6.0

6.1

Baseline

Optimum

Side View

Y

Z

Figure 4: Steepest-Descent Trajectory through the Design Space.


00

-1

-2

-3

-4

-5

-6

-7

-8

-9

0 1 2 3 40

Iteration

LO

G_1

0 (

GR

MS

)

Figure 5: Convergence of Gradient for Newton Iteration.

1.9

2.0

2.1

2.2

2.3

2.4

2.5

2.6

2.7

4.9

5.0

5.1

5.2

5.3

5.4

5.5

5.6

5.7

5.8

5.9

6.0

6.1

Baseline

Optimum

Top View

Y

X

1.6

1.7

1.8

1.9

2.0

2.1

4.9

5.0

5.1

5.2

5.3

5.4

5.5

5.6

5.7

5.8

5.9

6.0

6.1

Baseline

Optimum

Side View

Y

Z

Figure 6: Newton-Iteration Trajectory through the Design Space.


Block Size

4" x 4" x 12"

Path Length

SPIDER FLY PATH

Figure 7: Obvious local-minimum path between Spider and Fly on flattened box.

Block Size

4" x 4" x 12"

Path Length Sqrt(250.0)"

~ 15.81"

SPIDER FLY PATH

Figure 8: Non-obvious global-minimum path between Spider and Fly on flattened box.


0 5 10 15 20 25 30 35 40 45-14

-12

-10

-8

-6

-4

-2

0

2

OPTIMIZATION DRIVEN BY CONTINUOUS GRADIENT

( -4.619 )

CYCLE NUMBER

LO

G_1

0 (

GR

MS

or

YE

RR

)

Continuous Gradient

Discrete Gradient

Y-ERROR

Figure 9: Convergence histories of implicit stepping using continuous gradient with N=31.

0 5 10 15 20 25 30 35 40 45-14

-12

-10

-8

-6

-4

-2

0

2

OPTIMIZATION DRIVEN BY DISCRETE GRADIENT

( -4.332 )

CYCLE NUMBER

LO

G_1

0 (

GR

MS

or

YE

RR

)

Continuous Gradient

Discrete Gradient

Y-ERROR

Figure 10: Convergence histories of implicit stepping using discrete gradient with N=31.


0 5 10 15 20 25 30 35 40 45-14

-12

-10

-8

-6

-4

-2

0

2N = 511

OPTIMIZATION DRIVEN BY CONTINUOUS GRADIENT

( -7.020 )

CYCLE NUMBER

LO

G_1

0 (

GR

MS

or

YE

RR

)

Continuous Gradient

Discrete Gradient

Y-ERROR

Figure 11: Convergence histories of implicit stepping using continuous gradient with N=511.


3 4 5 6 7 8 9 10 11 12 13-10

-9

-8

-7

-6

-5

-4

-3

N =

31

N =

511

N =

127

N =

204

7

Log_2 ( NX )

Log

_10

( Y

ER

R )

CONT

DISC

1st-order

2nd-order

3rd-order

Figure 12: Computed path errors as a function of mesh size.

3 4 5 6 7 8 9-13

-12

-11

-10

-9

-8

-7

-6

-5

-4

N =

31

N =

511

Log_2 ( NX )

Log

_10

( S

urpl

us )

Figure 13: Difference of measurable cost function between the continuous and discrete gradients.


0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0-0.65

-0.60

-0.55

-0.50

-0.45

-0.40

-0.35

-0.30

X

Y

Exact

cyc 1

cyc 2

cyc 4

Figure 14: History of paths of implicit stepping with N=31.

0 5 10 15 20 25 30 35 40 45-14

-12

-10

-8

-6

-4

-2

0

2

CYCLE NUMBER

LO

G_1

0 (

GR

MS

)

Figure 15: Convergence history of implicit stepping with N=31.


0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0-0.65

-0.60

-0.55

-0.50

-0.45

-0.40

-0.35

-0.30

X

Y

Exact

cyc 1

cyc 2

cyc 4

Figure 16: History of paths of implicit stepping with N=511.

0 5 10 15 20 25 30 35 40 45-12

-10

-8

-6

-4

-2

0

2

CYCLE NUMBER

LO

G_1

0 (

GR

MS

)

Figure 17: Convergence history of implicit stepping with N=511.


0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0-0.65

-0.60

-0.55

-0.50

-0.45

-0.40

-0.35

-0.30

X

Y

Exact

cyc 1

cyc 5

cyc 9

cyc 13

cyc 17

cyc 21

cyc 25

cyc 29

cyc 33

cyc 37

Figure 18: History of paths for Rank-1 quasi-Newton with N=31.

0 5 10 15 20 25 30 35 40 45-12

-10

-8

-6

-4

-2

0

2

CYCLE NUMBER

LO

G_1

0 (

GR

MS

)

Rank One

DFP

BFGS

Figure 19: Comparison of quasi-Newton convergence histories with N=31.


0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0-0.65

-0.60

-0.55

-0.50

-0.45

-0.40

-0.35

-0.30

X

Y

Exact

cyc 1

cyc 65

cyc 129

cyc 193

cyc 257

cyc 321

cyc 385

cyc 449

cyc 513

cyc 577

Figure 20: History of paths for Rank-1 quasi-Newton with N=511.

0 100 200 300 400 500 600-12

-10

-8

-6

-4

-2

0

2

CYCLE NUMBER

LO

G_1

0 (

GR

MS

)

Rank One

DFP

BFGS

Figure 21: Comparison of quasi-Newton convergence histories with N=511.


0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0-0.65

-0.60

-0.55

-0.50

-0.45

-0.40

-0.35

-0.30

X

Y

Exact

cyc 1

cyc 2

cyc 4

cyc 8

cyc 16

cyc 32

cyc 64

cyc 128

cyc 256

cyc 512

cyc 1024

cyc 2048

cyc 4096

cyc 8192

Figure 22: History of paths of steepest descent with N=31.

0 1000 2000 3000 4000 5000 6000 7000 8000 9000-14

-12

-10

-8

-6

-4

-2

0

2

CYCLE NUMBER

LO

G_1

0 (

GR

MS

)

Figure 23: Convergence history of steepest descent with N=31.


0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0-0.65

-0.60

-0.55

-0.50

-0.45

-0.40

-0.35

-0.30

X

Y

Exact

cyc 1

cyc 2

cyc 4

cyc 8

cyc 32

cyc 64

Figure 24: History of paths of smoothed descent with N=31 & STEP=100.

0 50 100 150 200 250 300 350 400 450 500-14

-12

-10

-8

-6

-4

-2

0

2

CYCLE NUMBER

LO

G_1

0 (

GR

MS

)

STEP = 100

STEP = 50

STEP = 25

STEP = 12.5

Figure 25: Convergence history of smoothed descent with N=31.


0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0-0.65

-0.60

-0.55

-0.50

-0.45

-0.40

-0.35

-0.30

X

Y

Exact

cyc 1

cyc 2

cyc 4

cyc 8

cyc 32

cyc 64

Figure 26: History of paths for Krylov acceleration with N=31 & STEP=100.

0 5 10 15 20 25 30 35 40 45-14

-12

-10

-8

-6

-4

-2

0

2

CYCLE NUMBER

LO

G_1

0 (

GR

MS

)

STEP = 100

STEP = 50

STEP = 25

STEP = 12.5

w/o Krylov Acceleration

Figure 27: Convergence history of Krylov acceleration with N=31.


0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0-0.65

-0.60

-0.55

-0.50

-0.45

-0.40

-0.35

-0.30

STEP = 2.0 , SMOO = 0.75

NMESH = 5

X

Y

Exact

cyc 1

cyc 2

cyc 4

cyc 8

cyc 32

cyc 64

Figure 28: History of paths for multigrid acceleration with N=31.

0 5 10 15 20 25 30 35 40 45-14

-12

-10

-8

-6

-4

-2

0

2

STEP = 2.0 , SMOO = 0.75

CYCLE NUMBER

LO

G_1

0 (

GR

MS

)

NMESH = 5

NMESH = 4

NMESH = 3

NMESH = 2

NMESH = 1

Figure 29: Convergence history of multigrid acceleration with N=31.


0 5 10 15 20 25 30 35 40 45-14

-12

-10

-8

-6

-4

-2

0

2

CYCLE NUMBER

LO

G_1

0 (

GR

MS

)

MG w/ Krylov Acceleration

MG w/o Krylov Acceleration

Implicit Stepping

Figure 30: Comparison of grid-independent convergence histories.

1 2 3 4 5 6 7 8 9 10 11 12 13 140

2

4

6

8

10

12

14

16

N =

31

N =

511

N =

819

1

Log_2 ( NX )

Log

_2 (

IT

ER

S )

Steepest Descent

Rank-1 Quasi-Newton

Multigrid W-Cycle

Multigrid w/ Krylov Acceleration

Implicit Stepping

Figure 31: Comparison of convergence properties with dependence on dimensionality.


aerodynamic shape optimization part i: theoretical...

Documents