efficient nonlinear optimizationvia multiscale gradient

14
Efcient Nonlinear Optimization via Multiscale Gradient Filtering Tobias Martin, Pushkar Joshi, Mikl´ os Bergou, Nathan Carr UUCS-12-001 School of Computing University of Utah Salt Lake City, UT 84112 USA January 17, 2012 Abstract We present a method for accelerating the convergence of gradient-based nonlinear opti- mization algorithms. We start with the theory of the Sobolev gradient, which we analyze from a signal processing viewpoint. By varying the order of the Laplacian used in dening the Sobolev gradient, we can effectively lter the gradient and retain only components at certain scales. We use this idea to adaptively change the scale of features being optimized in order to arrive at a solution that is optimal across multiple scales. This is in contrast to traditional descent-based methods, for which the rate of convergence often stalls early once the high frequency components have been optimized. Our method is conceptually similar to multigrid in that it can be used to smooth errors at multiple scales in a problem, but we do not require a hierarchy of representations during the optimization process. We demon- strate how to integrate our method into multiple nonlinear optimization algorithms, and we show a variety of optimization results in variational shape modeling, parameterization, and physical simulation.

Upload: others

Post on 20-Dec-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Efficient Nonlinear Optimizationvia Multiscale Gradient

Efficient Nonlinear Optimization viaMultiscale Gradient Filtering

Tobias Martin, Pushkar Joshi, MiklosBergou, Nathan Carr

UUCS-12-001

School of ComputingUniversity of Utah

Salt Lake City, UT 84112 USA

January 17, 2012

Abstract

We present a method for accelerating the convergence of gradient-based nonlinear opti-mization algorithms. We start with the theory of the Sobolev gradient, which we analyzefrom a signal processing viewpoint. By varying the order of the Laplacian used in definingthe Sobolev gradient, we can effectively filter the gradient and retain only components atcertain scales. We use this idea to adaptively change the scale of features being optimizedin order to arrive at a solution that is optimal across multiple scales. This is in contrast totraditional descent-based methods, for which the rate of convergence often stalls early oncethe high frequency components have been optimized. Our method is conceptually similarto multigrid in that it can be used to smooth errors at multiple scales in a problem, but wedo not require a hierarchy of representations during the optimization process. We demon-strate how to integrate our method into multiple nonlinear optimization algorithms, and weshow a variety of optimization results in variational shape modeling, parameterization, andphysical simulation.

Page 2: Efficient Nonlinear Optimizationvia Multiscale Gradient

Efficient Nonlinear Optimization via Multiscale Gradient Filtering

Tobias Martin1, Pushkar Joshi2, Miklos Bergou3, Nathan Carr31University of Utah, 2Motorola Mobility, 3Adobe Systems

Abstract

We present a method for accelerating the convergenceof gradient-based nonlinear optimization algorithms.We start with the theory of the Sobolev gradient,which we analyze from a signal processing viewpoint.By varying the order of the Laplacian used in definingthe Sobolev gradient, we can effectively filter the gradi-ent and retain only components at certain scales. Weuse this idea to adaptively change the scale of featuresbeing optimized in order to arrive at a solution thatis optimal across multiple scales. This is in contrastto traditional descent-based methods, for which therate of convergence often stalls early once the high fre-quency components have been optimized. Our methodis conceptually similar to multigrid in that it can beused to smooth errors at multiple scales in a problem,but we do not require a hierarchy of representationsduring the optimization process. We demonstratehow to integrate our method into multiple nonlinearoptimization algorithms, and we show a variety ofoptimization results in variational shape modeling,parameterization, and physical simulation.

1 Introduction

Shape optimization is a problem of fundamental im-portance to computer graphics. Several graphics taskscan be posed as shape optimization problems, wherean input shape is optimized by iteratively modifyingits parameters so that an energy associated with theshape is minimized. The choice of the energy dependson the application at hand. Frequently occurring ex-amples include the Willmore energy for variationalshape design [35], an angle- and area-preserving energyfor surface parameterization [28], and a strain-based

energy for statics problems [34]. In all these cases, theshape optimization problem boils down to a numeri-cal optimization of the appropriate energy, subject toproper boundary conditions.

In many cases of practical importance, including theabove examples, finding an optimal solution amountsto solving a set of nonlinear equations. Linear approxi-mations (e.g., [3]) do not always produce the expectedresult [20], and the minimization of nonlinear energiesis often necessary. The numerical optimization of suchnonlinear energies can be a lengthy process, usuallyperformed iteratively and off-line for starting shapesthat are not already close to optimal. Typically, afterthe first few iterations, the rate of energy decrease issignificantly reduced, and the progress towards theminimum is slow.

The reduction in the rate of energy decrease dur-ing the optimization process is not necessarily a signthat the shape is close to optimal. More often, thisis an indication that the energy being minimized isill-conditioned. While minimizing an ill-conditionedenergy, the rapid initial decrease is due to the opti-mization of shape features of high spatial frequency,which requires just a few iterations and results in alarge reduction of the energy. In other words, theshape becomes locally close to optimal. The subse-quent progress towards the desired minimum is slowbecause features of low spatial frequency must be opti-mized, which require more global changes to the shape.This phenomenon is commonly observed in denoisingapplications, where the high frequency noise can berapidly removed in just a few iterations, while the lowfrequency features take a long time to smooth out [33].The bottom row of Figure 1 shows that after the highfrequency details of the dragon model quickly becomesmooth, the progress towards the global minimum

1

Page 3: Efficient Nonlinear Optimizationvia Multiscale Gradient

Figure 1: A nonlinear, hinge-based bending energy [9] is optimized using gradient descent for a genus-0triangle mesh. Depending on the minimization state, our algorithm chooses what feature size to minimize.High frequency features are rapidly removed in the initial optimization steps, and lower frequency featuresare removed in the later optimization steps. By contrast, a non-preconditioned optimization (shown in thelower row) produces very little shape change as evident in snapshots taken at similar computational times.

becomes negligible.

Contributions Ill-conditioned optimization pro-cesses can be sped up by preconditioning, one form ofwhich is to modify the search direction at each step ofthe optimization process in order to find a directionthat allows larger steps towards the minimum. In thispaper, we study the Sobolev preconditioner [13]. Inparticular, we present

• an analysis that shows the connection betweenthe Sobolev gradient and filter design, leadingto an intuitive explanation of the behavior ofthe Sobolev preconditioner in the context of op-timization;

• a novel, multi-scale optimization algorithm thatuses the Sobolev preconditioner in a general non-linear optimization pipeline suitable for graphicstasks;

• a novel algorithm for filtering the forces in aphysical simulation, which allows larger and morestable timesteps; and

• a comparison that shows the performance bene-fits of incorporating the Sobolev preconditionerin standard optimization algorithms, such as

gradient descent, the nonlinear conjugate gra-dient method, and the limited memory BFGS(L-BFGS) algorithm [21].

Overview Consider the gradient descent optimiza-tion algorithm, in which the gradient of the energyis used as the search direction along which to movetowards the minimum. At a high level, the Sobolevpreconditioner smooths the standard gradient by re-moving high frequency components, which allows theoptimizer to take steps that change the lower fre-quency, or more global, components of the shape. Weextend this to a multi-scale setting, tuning the fre-quency response of the preconditioner at each stepof the optimization in order to accelerate the overallconvergence of the algorithm. This preconditioner canbe easily incorporated into an existing implementa-tion of an optimization algorithm by simply applyingthe Sobolev preconditioner to the search direction ateach iteration and leaving the rest of the optimizerunchanged.

We use the Sobolev preconditioner in the contextsof variational shape modeling, parameterization, andelasticity simulation. The results indicate that evenfor cases where traditional methods tend to stall anddo not produce optimal results, we are able to achievefar more optimal shapes in reasonable runtimes (see

2

Page 4: Efficient Nonlinear Optimizationvia Multiscale Gradient

Figure 2: The user-specified tangent constraints (red triangles) are intended to force the input shape (left) toinflate into a hemisphere. A linear method, such as solving the bi-Laplacian in x, y and z, is not able toreconstruct the hemisphere (middle). With our framework, the nonlinear Willmore energy is minimized in afew steps to create the desired shape (right).

Table 1).

2 Related Work

Sobolev preconditioning for elliptic partial differentialequations is a well-known approach with a substantialamount of study [13]. In this section we will focus onprevious works that use the Sobolev gradient for sci-entific computing tasks commonly found in computergraphics.

Renka and Neuberger proposed the use of theSobolev gradient for curve and surface smoothing[25, 26]. Charpiat et al. [5] and Sundaramoorthi etal. [32] used the Sobolev gradient for speeding upcurve flows used for active contours in a more generalframework that also considered other inner products.Eckstein et al. [7] solved the surface flow equivalent ofthe previous two papers. These authors all describehow they get significant performance benefits in theirflows by using the H1 Sobolev gradient instead of thestandard L2 gradient. Inspired by these works, weextend this idea to show how higher-order Sobolev gra-dients can be used as components in preconditioningfilters.

The multigrid method [4] is a conceptually closelyrelated, and commonly used, technique for solvingill-conditioned optimization problems. It has success-fully been used in a wide variety of contexts, suchas finite element methods [27], parameterization [1],

and mesh deformation [30]. The multigrid methodrequires a hierarchical representation for the shape.At each level of the hierarchy, an iterative smootheris used to remove the highest frequency errors. Sincethe scale of features removed depends on what fre-quencies can be represented by the discretization, onecan use a hierarchy that spans from coarse to finediscretizations to reduce errors across multiple scales.While multi-resolution preconditioners are effective,representing the input shape hierarchically for generalrepresentations can be challenging [10].

In Section 5, we demonstrate several applications ofour method. We include a discussion of work relatedto the applications in that section.

3 Mathematical Framework

In this section, we formally introduce the relation-ship between the gradient of an energy and the innerproduct on the space of shape deformations. We startwith the L2 gradient, and we show how other gra-dients can be constructed. For one particular formof this construction, we provide an interpretation ofthis operation as a filter applied to the L2 gradient.In Section 4, we exploit this filtering viewpoint tocreate a shape optimization algorithm in the spirit ofa multigrid approach.

3

Page 5: Efficient Nonlinear Optimizationvia Multiscale Gradient

Figure 3: Each of the filters shown on the right can be used to select which eigenmodes of the Laplacian tokeep in the gradient during the optimization process. The figures on the left, in which we applied the variousfilters to the L2 gradient of the Willmore energy, show that the eigenmodes correspond to surface features atdifferent scales.

Shape and Tangent Space Let Ω be the space ofpossible shapes for the problem of interest, and letx ∈ Ω be a point in this space. The tangent space,TxΩ, contains all tangent vectors to Ω at the pointx. Given a sufficiently smooth path, γ : R → Ω, suchthat γ(t) = x, the tangent space represents the spaceof all possible velocities γ′(t) that the path could haveat x.

We discretize Ω using a piecewise polynomial ba-sis. In the discrete setting, a point in shape space isgiven by x =

∑ni=1 xiφi, where {φi}ni=0 is the set of

basis functions and {xi}ni=0 is the set of coefficientsthat defines the shape. For example, one possiblediscretization of a surface embedded in R

3 is a tri-angle mesh, for which {xi}ni=0 is the set of vertexpositions. We can write any path in the discretesetting as γ(t) =

∑ni=1 xi(t)φi with corresponding

tangent vector given by γ′(t) =∑n

i=1 x′i(t)φi. As in

the continuous setting, the tangent space at a pointis the space that contains all possible velocities thata path could have at that point.

Inner Product on Tangent Space We can equipthe tangent space with an inner product, 〈u,v〉 foru,v ∈ TxΩ. The canonical L2 inner product is givenby 〈u,v〉L2 =

∫u · v.

In the discrete setting, the L2 inner product fortwo tangent vectors u and v can be written in matrix

form as

〈u,v〉L2 = 〈n∑

i=1

uiφi,

n∑j=1

viφi〉L2 =

n∑i=1

n∑j=1

uivj〈φi, φj〉L2

= uTMv ,

where in the second line, u and v are the vectors ofcoefficients {vi}ni=1 and {ui}ni=1, and M is commonlycalled the “mass matrix” whose entries are defined byMij = 〈φi, φj〉L2 .

As noted by Eckstein et al. [7], we can define alter-nate inner products using

〈u,v〉A = 〈Au,v〉L2 = 〈u,Av〉L2 , (1)

where A : TxΩ → TxΩ is a self-adjoint, positivedefinite linear operator.The choice of inner product is key and, as we

will show, has a large impact on the performanceof descent-based algorithms. Our goal in Section 4will be to tailor A to the task of optimizing the per-formance of these algorithms.

Gradient Vector Field Given a smooth energy,E : Ω → R, that assigns a real value to a given shape,the gradient of E at x is defined as the vector field,g(x) ∈ TxΩ, that satisfies

〈g(x),v〉 = limε→0

E(x+ εv)− E(x)

ε

for all v ∈ TxΩ (see, e.g., [6]). In other words, thegradient is the vector field whose inner product with

4

Page 6: Efficient Nonlinear Optimizationvia Multiscale Gradient

v yields the rate of change of the energy along v.Since the gradient is defined with respect to an in-ner product, each inner product leads to a differentvector field corresponding to the gradient. We canrelate the A gradient to the L2 gradient by combining〈gA(x),v〉A = 〈AA−1gL2(x),v〉L2 with (1) to obtain

gA(x) = A−1gL2(x) . (2)

Therefore, if we have an expression for the L2 gradientof the energy, we can solve for the A gradient usingthis equation.

In the discrete setting, the energy E depends on thecoefficients {xi}ni=1, which we write using an abuse ofnotation as E(x). We can evaluate the rate of changeof E at x in the direction of v as

n∑i=1

∂E(x)

∂xivi =

∂E(x)

∂x· v = 〈gL2(x),v〉L2

= (gL2(x))TMv,

which, using the symmetry of M and the fact thatthis equation must hold for arbitrary v, leads to

gL2(x) = M−1 ∂E(x)

∂x.

From (2), we obtain that for a linear operator A,

gA(x) = A−1M−1 ∂E(x)

∂x(3)

Frequency Response of Gradient Filter Moti-vated by the work of Kim and Rossignac [14] on con-structing filters for mesh processing, we choose A =B−1A, where B and A are both linear combinationsof powers of the Laplace-Beltrami operator, Δ, givenby A =

∑pi=0(−1)iaiΔ

i and B =∑q

i=0(−1)ibiΔi.

Here, {ai}pi=0 and {bi}qi=0 are the sets of scalar coeffi-cients that define each linear combination, and Δ0 = Iis the identity operator. Note that using this defini-tion for A, the resulting inner product is no longersymmetric. However, for the purposes of optimization,this is a valid choice because the resulting gradientcan still be used as a descent direction.

In order to construct the A gradient in the discretesetting, we require a discrete version of the Laplace-Beltrami operator, denoted by −L. For example, we

use the well-known cotan operator [23] for optimiza-tion problems involving triangle meshes. In general,we define the discrete Laplace-Beltrami operator as−L = −M−1K, where M is the mass matrix definedas before by Mij = 〈φi, φj〉 and Kij = 〈∇φi,∇φj〉L2

is the stiffness matrix.Given a discrete Laplace-Beltrami operator, we can

write

A =

p∑i=0

aiLi and B =

q∑i=0

biLi (4)

and use (3) to obtain

AgA = BM−1 ∂E(x)

∂x. (5)

Kim and Rossignac [14] show that, under some rea-sonable assumptions, the eigendecompositon of L canbe written as

L = QΛQ−1 ,

where Λ is a diagonal matrix whose diagonal elements,λi = Λii, are the eigenvalues of L, and Q is a matrixwhose columns are the eigenvectors of L. Using thisdecomposition, we can write (5) as(

p∑i=0

aiΛi

)Q−1gA =

(q∑

i=0

biΛi

)Q−1gL2 .

Each row of this equation corresponds to an eigenmodeof L, which we can use to examine what happens tothe corresponding eigenmode of L that is present ingL2 when we compute gA. We rewrite row j as(

Q−1gA)j= h(λj)

(Q−1gL2

)j,

where h(λ) is the transfer function given by

h(λ) =

∑qi=0 biλ

i∑pi=0 aiλ

i. (6)

This transfer function encodes the amount to whichthe eigenmode with eigenvalue λ is amplified or at-tenuated when computing gA from gL2 . Thus, byvarying the coefficients ai and bi, we can control thefrequency response of A−1, which we exploit in thefollowing section when designing our optimizationprocedure.

5

Page 7: Efficient Nonlinear Optimizationvia Multiscale Gradient

Figure 4: Left: One cycle (k = 0, 1, 2, 3, 2, 1), where k is the filter index, to minimize bending energy on theinput mesh (top left). The index is increased when the magnitude of changes to the shape falls below agiven threshold in order to minimize more global features and then decreased to remove small-scale featuresintroduced by the lower pass filters. Right: Multiscale minimizers based on our implementation of thenonlinear conjugate gradient method and the L-BFGS method exhibit similar convergence rates. WhileL-BFGS requires fewer iterations, it requires more time per iteration.

4 Algorithm

The goal of shape optimization is to find a path froma given initial shape x0 = γ(0) to xmin = limt→∞ γ(t)such that E(xmin) is a minimum. Perhaps the sim-plest such algorithm is gradient descent, which choosesthe velocity of the path to be aligned with the gradi-ent of the energy, so that γ′(t) = −g(x). A variantof this is the nonlinear conjugate gradient method,which uses information from previous search direc-tions to guide the optimization towards a solution.The Newton-type methods, such as L-BFGS [21], com-pute a second-order approximation to the energy ateach step to perform the optimization. Regardless ofthe particulars of each algorithm, most descent-basedmethods can be described as iteratively computing asearch direction and performing a line search that ap-proximately minimizes the energy along this direction.We apply our filtering to this descent direction.

Given a tuple (A,B), where A and B are definedas in the previous section, Algorithm 1 shows howwe modify the nonlinear conjugate gradient methodto minimize E(x) . Traditionally, A = B = I sothat g0 and g1 correspond to the L2 gradient. In thiscase, the algorithm very quickly minimizes the high

frequencies features in the shape. However, lowerfrequency features are minimized only very slowly,and the algorithm tends to stall as the convergencerate slows down.Let F = {(Ak,Bk)}mk=1 be a set consisting of m

tuples, with each element of F corresponding to a dif-ferent choice for the coefficients {ai}pi=0 and {bi}qi=0

in (4). Each tuple in F also corresponds to a differ-ent transfer function hk(λ). We set A0 = B0 = I.Our algorithm is designed such that each one of thesetransfer functions attenuates a different frequencyrange. For the transfer function in (6), Kim andRossignac [14] provide a method for computing coeffi-cients based on an intuitive set of parameters availableto the user. In our case, we have found that the al-gorithm works well with only three filters (m = 3),shown in Figure 3, which can be viewed as a highpass,bandpass, and lowpass filter. The coefficients are givenby {a0 = 1, a1 = 1, b1 = 1}, {a0 = 1, a2 = 1, b1 = 2},and {a0 = 1, a2 = 1, b0 = 1} for the highpass, band-pass, and lowpass filters, with all other coefficientsequal to zero. The form of the transfer function in (6)is general enough so that more sophisticated filterscould be designed if more levels are required for theoptimization.

6

Page 8: Efficient Nonlinear Optimizationvia Multiscale Gradient

Figure 5: Starting from a C2-continuous B-spline surface generated from two input curves (left), we minimizethe second-order Willmore energy and the third-order Mehlum-Tarrou energy subject to positional andtangential constraints. A traditional descent-based algorithm using the L2 gradient stalls after a few iterations(middle), whereas the Multiscale Gradient Filtering approach converges to a more optimal solution for bothenergies (right).

Algorithm 2 summarizes the procedure we use toiteratively optimize the shape at multiple scales. Westart by optimizing the high frequency componentswith k = 0, and then optimize lower and lower fre-quency components by increasing k until the max-imum is reached. While increasing k will optimizemore global features, it may introduce higher fre-quency features during the minimization process (seeinset in Figure 4–left). Thus, one cycle of Algorithm 2involves first increasing k and then decreasing k as wemove up and down the hierarchy of frequencies duringoptimization to ensure that the shape is optimizedat all scales, similar to the multigrid V-cycle. Thealgorithm terminates when the magnitude of changesto the shape is below a given threshold or when themaximum number of cycles has been reached.

Figure 4–left shows the convergence plot for mini-mizing the Willmore energy of a genus-1 object. In-terestingly, the minimum of the energy is not unique,and the resulting shape is not necessarily a symmetrictorus. Note that as we increase k, even though the en-ergy drops only very slowly, the shape still undergoeslarge, global deformations.

The call to Algorithm 1 in Algorithm 2 can bereplaced with other nonlinear optimization algorithmssuch as gradient descent or L-BFGS. Figure 4–rightshows a comparison between our implementation ofthe nonlinear conjugate gradient method, shown inAlgorithm 1, and the L-BFGS algorithm, for whichwe use a library that can be found online [22]. While

L-BFGS requires fewer steps to reach the minimum,each step requires more computation. The result forgradient descent looks very similar to these plots andhas been omitted for clarity.

Our implementation of the nonlinear conjugate gra-dient method uses Brent’s linesearch method [24],which requires an upper bound, ε, on the stepsize. IfBrent’s method uses a stepsize close to ε, we increasethe stepsize to 2 ε. Conversely, if the maximum num-ber of iterations is reached, it is called again with anupper bound of ε/2. A starting value of ε = 0.1 isused for all examples in this paper. The sparse matriximplementation of the Boost library is used to store{Ak,Bk}mk=1. We use PARDISO [29] to solve thelinear system in (4) that defines the preconditionedgradient. Note that we could use an iterative solver,such as the conjugate gradient method, instead of adirect solver, which would avoid having to computeand store higher-order Laplacians.

5 Applications

5.1 Variational Modeling

The variational modeling of surfaces is the process ofconstructing aesthetically pleasing smooth shapes byoptimizing some curvature-based energy. The inputsare generally user-specified positional and tangentialconstraints.

7

Page 9: Efficient Nonlinear Optimizationvia Multiscale Gradient

A commonly used surface energy is the second-order bending energy that computes the area integralof the square of the mean curvature of the surface,E =

∫H2dA, also known as the Willmore energy [35],

where H is the mean curvature. Given appropriateboundary conditions, variants of this formulation (e.g.,∫(κ1

2+κ22)dA, where κi are the principal curvatures,

and∫H2dA+

∫KdA where K is the Gaussian curva-

ture) differ only by constants that depend on topolog-ical type and are equivalent in terms of optimization.In this paper, we use the dihedral-angle-based discreteoperator [9] for approximating the Willmore energyfor triangle meshes (Figure 1).

Minimizing the Willmore energy has the effect thatduring the minimization process, the surface gets moreand more spherical (see Figures 1, 2, and 4). Oftenhowever, this behavior is undesirable when the userdoes not want the shape to bulge out (see Figure 5).If the bulgy results using the Willmore energy are

undesirable, a different energy has to be defined toprevent this behavior. Mehlum and Tarrou [17] pro-posed to minimize the squared variation in normalcurvature integrated over all directions in the tangentplane:

1

π

∫ (∫ π

0

κ′n(θ)

2dθ

)dA (7)

This is a third-order energy since it requires tak-ing the derivative of the normal curvature, wherethe resulting expression contains third-order partialderivatives. Third-order energies tend to be evenmore ill-conditioned than second-order energies such

Algorithm 1 : MinimizeCG((A,B), x)

g0 ← A−1BM−1 ∂E(x)∂x

s0 ← −g0

for i = 1 → max iterations doα ← LineSearch(x, s0)x ← x+ α s0g1 ← A−1BM−1 ∂E(x)

∂xβ ← 〈g1,g1〉/〈g0,g0〉s0 ← β s0 − g1

g0 ← g1

end forreturn x

as the Willmore energy, which can make their op-timization prohibitively expensive using traditionalnonlinear minimization algorithms. Other exampleswhere higher-order energies are desired are discussedin [12].

Figure 5 shows a comparison between the Willmoreenergy and the Mehlum-Tarrou energy as defined in(7) using our approach. The input to the optimiza-tion is a C2-continuous non-uniform bi-cubic B-splinethat is fit through two B-spline curves with tangen-tial constraints. The energy and gradient for thissurface representation is straightforward to computeusing the equations in [17]. In order to achieve theoptimal configuration, the shape needs to move glob-ally into a curved cylinder. The figure also showsa comparison to the state of the surfaces minimizedwithout preconditioning the gradients, demonstratingthat our method yields a more optimal shape thanthe traditional approach in the same computationaltime.

5.2 Parameterization

Computing high quality surface parameterizations foruse in applications such as texture mapping is a wellstudied topic (see, e.g., [8]). The classic aim is to finda suitable embedding from R

3 → R2 that minimizes

some form of measured distortion. Numerous pro-posals have been made for defining distortion metricswhich produce pleasing results. Perhaps the most pop-ular has been conformal mapping, which attempts topreserve angular distortion. Such conformal solutionsare efficient to compute, requiring a linear solve, but

Algorithm 2 : MinimizeMultiScale(x)

for j = 1 → number of cycles dofor k = 0 → m do

x ← MinimizeCG((Ak,Bk),x

)end forfor k = m− 1 → 0 dox ← MinimizeCG

((Ak,Bk),x

)end for

end forreturn x

8

Page 10: Efficient Nonlinear Optimizationvia Multiscale Gradient

2

Figure 6: Starting from the LSCM parameterization, the Multiscale Gradient Filtering approach is used tominimize texture stretch. By varying the filter, a minimum is achieved much faster than by just minimizingthe energy using the L2 gradient.

unfortunately do a poor job of preserving area (seeFigure 6 for a comparison).When considering preserving both area and angle

during flattening one must turn to nonlinear metrics.A family of such metrics have been proposed basedon the singular values of the 3 × 2 Jacobian matrixJi that maps each triangle Ti from 3D into the 2Dparametric domain [28, 31, 36]. A perfect isometricmapping will have singular values equal to 1. Largeand small singular values imply stretching and shrink-ing respectively.For our experiments we use the L2 texture stretch

metric from [28]. Given a triangle Ti from mesh M ,its root-mean-square stretch error can be defined asfollows: E(Ti) =

√(τ + γ)/2, where τ and γ are

the singular values of the triangle’s Jacobian matrixJi. The entire parametrization can be computed byminimizing the following expression over the mesh:√ ∑

Ti∈M

E(Ti)2A(Ti)/∑

Ti∈M

A(Ti) (8)

where A(Ti) is the surface area of the triangle in 3D.When a triangle’s area in 2D approaches zero (e.g.,becomes degenerate), its stretch error E(Ti) goes toinfinity. Thus, relatively small perturbations of theparameterization can result in large changes in expres-sion (8). For this reason the nonlinear system is quite

stiff, producing a set of gradients during optimizationthat may greatly vary in magnitude. The effect ofthis is a system that is challenging to solve efficiently.We applied our gradient preconditioning method tosolve this problem. Figure 6 shows the results of ourmethod for minimizing texture stretch. We start withan initial parametrization using a Least Squares Con-formal Map [15], shown on the left. We then run ouroptimization procedure on the nonlinear energy toobtain the results shown in the middle, demonstratingthat we obtain a solution with much less distortion.On the right, we compare the energy behavior to nopreconditioning, demonstrating that we obtain theoptimal shape with far less computation.

5.3 Nonlinear Elasticity

To this point, we have considered typical shape op-timization problems, where the user inputs a shapethat is then optimized with respect to a given energyusing the algorithm discussed in Section 4. In theseapplications, the user is generally only interested inthe end state and not in the path taken to get there.However, we demonstrate that our proposed frame-work can also be applied to the simulation of thedynamics of elastic objects, where the user is indeedinterested in the path.

We begin with Hooke’s law for continuum mechanics

9

Page 11: Efficient Nonlinear Optimizationvia Multiscale Gradient

Figure 8: Top row: Animation sequence of an elongated bar undergoing deformations based on the infinitesimalstrain tensor. The lack of nonlinear terms causes large distortions and ghost forces in the shape. Bottom row:By filtering high frequencies from the forces based on the finite strain tensor, a larger timestep and nonlineardeformation behavior can be achieved.

Figure 7: We minimize elastic potential energy basedon nonlinear strain on an unstructured hexahedralmesh with a trilinear Bezier basis. Multiscale GradientFiltering quickly propagates the boundary conditionsthrough the shape, whereas a minimizer based on theL2 gradient stalls.

(see, e.g., [16]), which states that the stress, σ, foran isotropic elastic material is linearly related to thestrain, ε, by

σ = 2με+ λTr (ε) I ,

where λ and μ are the Lame first and second pa-rameters that determine the elastic response of thematerial, Tr (ε) =

∑k εkk is the trace of the strain

tensor, and I is the identity. The finite strain ten-sor encodes the deformation of the material from itsundeformed configuration at x0 to its deformed con-figuration x and is defined as ε = 1

2

(FTF− I

), where

F is the deformation gradient, which maps tangent

vectors from the undeformed to the deformed config-uration. The elastic potential energy, a function of x,is defined as

U (x) =1

2

∫Tr (σε)dA , (9)

where the integration is over the undeformed config-uration. Figure 7 shows an example where U(x) isminimized on an hexahedral mesh using a trilinearBezier basis. Similar to the previous cases, the precon-ditioned gradients help to propagate the deformationsto the rest of the mesh much faster.The equations of motion for the elastic material

with Rayleigh damping are

Mx+Cx− fint (x) = fext (x, x) , (10)

where x and x are the first and second derivativeof x with respect to time, M is the physical massmatrix, C is the Rayleigh damping matrix, fextis a vector of external forces such as gravity, andfint (x) = − ∂U(x)/∂x is a vector of internal forcesthat depends on the elastic potential energy. The equa-tions of motion define a coupled system of ordinarydifferential equations that must be integrated in timeto solve for the path of the object. For stiff materials,implicit integrators are often used because they offersuperior stability over explicit methods [2], resultingin a costly system of nonlinear equations that mustbe solved at each timestep. Furthermore, since theequations are nonlinear, even implicit integrators are

10

Page 12: Efficient Nonlinear Optimizationvia Multiscale Gradient

not unconditionally stable, resulting in a restrictionon the maximum allowable timestep size [11].

Various strategies exist for reducing the amountof computation needed to integrate the equations ofmotion. Since many solid materials, such as metaland wood, usually undergo small deformations, the in-finitesimal strain tensor, given by ε = 1

2

(FT + F

)− I,can be used instead of the finite strain tensor in thedefinition of the elastic potential energy. In the caseof large deformation, the naıve use of the infinites-imal strain tensor results in undesirable distortionsand ghost forces, so corotational approaches [18,19],which factor out rotations from the motion on a per-element or per-vertex basis, are often employed. Bothfor the small displacement case and the corotationalapproaches, replacing the finite strain tensor with theinfinitesimal strain tensor and using an implicit inte-grator results in having to solve only a linear ratherthan a nonlinear system per timestep, saving a sig-nificant amount of computation. In the case of smalldeformations, the resulting system is also uncondi-tionally stable, which is very attractive since it allowslarge timesteps to be used.

We propose an alternative for allowing one to takelarger timesteps while integrating the equations ofmotion based on our framework. We modify Equa-tion 10 to filter the internal forces, so the modifiedequations are given by

Mx+Cx−A−1fint(x) = fext(x, x) , (11)

where A−1 is designed to remove the high frequencycomponents from the internal forces. By filtering theforces, the physics of the system is changed; however,for applications where the timestep size must remainfixed, we can choose A to filter those frequencies thatcannot be resolved, allowing a larger timestep. Asshown in Figure 8, this approach does not exhibitany of the distortions apparent when using the smalldeformation assumption and, like the corotationalapproach, allows for large timesteps, even if an explicitintegrator is used. We believe that this approachcan also be used in conjunction with a corotationalmethod, allowing larger timesteps within that contextand for implicit integrators in general. In addition,this approach is not limited to the linear constitutive

Example Energy Rep. # Elems. Basis TimeFig. 1 W. T. 153K C(0) 4hFig. 2 W. T. 720 C(0) 1mFig. 4-left W. T. 26K C(0) 6mFig. 4-right W. T. 6K C(0) 3mFig. 5 W. B.S. 320 C(2) 5mFig. 5 M.T. B.S. 320 C(2) 30mFig. 6 Stretch T. 16K C(0) 1mFig. 7 E.P. B.V. 3.5K C(0) 1hBunny (*) W. T. 10K C(0) 3mBuddha (*) W. T. 30K C(0) 6mTeardrop (*) W. T. 3.8K C(0) 5mPin (*) W. T. 3.3K C(0) 5m

Table 1: Performance of the various optimizations ofWillmore (W.), Mehlum-Tarrou (M.T.) and ElasticPotential (E.P.) energy, peformend on triangles (T.),B-spline surfaces (B.S.) and B-spline volumes (B.V.).Timings were taken on a quadcore machine. Datasetsdenoted with (∗) are shown in the supplementaryvideo. In each case, optimization is stopped whenshape reaches an acceptable state.

equation given by Hooke’s law but can be applied tofilter out undesirable frequencies from any forces.

6 Conclusions and Discussion

In this paper, we provide a general framework fornonlinear shape optimization. We show applicationsof our method for optimizing ill-conditioned, nonlinearenergies in a variety of different contexts, in eachcase demonstrating an improvement in the quality ofthe results and a reduction in the time required tocompute a solution (see Table 1). We also provide anextension of our method to the case of simulating thedynamics of a physical system, which enables the useof larger timesteps during the course of the simulation.We believe that for many cases in which practitionersresort to approximations, sacrificing quality for speed,our framework can be used to obtain superior resultsat comparable runtimes.Our method is conceptually similar to a multigrid

method in that it can be used to smooth errors atmultiple scales. However, there is an important limi-

11

Page 13: Efficient Nonlinear Optimizationvia Multiscale Gradient

tation of our method: a coarser level in a multigridapproach usually requires less computation, whereasin our framework, smoothing errors at larger scalesincreases the requisite computation because of thehigher-order Laplacians involved. On the other hand,the implementation of our method does not require ahierarchy of representations during the optimizationprocess and can therefore be easily integrated intoexisting optimization systems. Therefore, we believethat this framework provides a valuable benchmarkthat can be used to evaluate the practicality of non-linear formulations for many problems.

References

[1] Aksoylu, B., Khodakovsky, A., andSchroder, P. Multilevel Solvers for Unstruc-tured Surface Meshes. SIAM J. Sci. Comput. 26(April 2005), 1146–1165.

[2] Baraff, D., and Witkin, A. P. Large Steps inCloth Simulation. In Proceedings of SIGGRAPH98 (July 1998), Computer Graphics Proceedings,Annual Conference Series, pp. 43–54.

[3] Botsch, M., and Kobbelt, L. An intuitiveframework for real-time freeform modeling. ACMTransactions on Graphics 23, 3 (Aug. 2004), 630–634.

[4] Briggs, W. L., Henson, V. E., and Mc-Cormick, S. F. A Multigrid Tutorial, 2nd ed.SIAM, 2000.

[5] Charpiat, G., Keriven, R., Pons, J.-P., andFaugeras, O. Designing Spatially CoherentMinimizing Flows for Variational Problems Basedon Active Contours. In ICCV (2005), IEEEComputer Society, pp. 1403–1408.

[6] Do Carmo, M. P. Riemannian Geometry.Birkhauser Boston, 1992.

[7] Eckstein, I., Pons, J.-P., Tong, Y., Kuo,C.-C. J., and Desbrun, M. Generalized sur-face flows for mesh processing. In SGP (2007),Eurographics Association, pp. 183–192.

[8] Floater, M. S., and Hormann, K. SurfaceParameterization: a Tutorial and Survey. InAdvances in Multiresolution for Geometric Mod-elling, Mathematics and Visualization. SpringerBerlin Heidelberg, 2005, pp. 157–186.

[9] Grinspun, E., Hirani, A. N., Desbrun, M.,and Schroder, P. Discrete shells. In SCA(2003), Eurographics Association, pp. 62–67.

[10] Grinspun, E., Krysl, P., and Schroder, P.CHARMS: A Simple Framework for AdaptiveSimulation. ACM Transactions on Graphics 21,3 (July 2002), 281–290.

[11] Hauth, M., Etzmuss, O., and Straßer, W.Analysis of numerical methods for the simulationof deformable models. The Visual Computer 19,7-8 (2003), 581–600.

[12] Joshi, P. Minimizing Curvature Variation forAesthetic Surface Design. PhD thesis, EECSDepartment, University of California, Berkeley,Oct 2008.

[13] Karatson, J., and Farago, I. Precondition-ing operators and Sobolev gradients for nonlinearelliptic problems. Comput. Math. Appl. 50 (Oc-tober 2005), 1077–1092.

[14] Kim, B., and Rossignac, J. GeoFilter: Geo-metric Selection of Mesh Filter Parameters. Com-puter Graphics Forum 24, 3 (Sept. 2005), 295–302.

[15] Levy, B., Petitjean, S., Ray, N., and Mail-lot, J. Least Squares Conformal Maps for Au-tomatic Texture Atlas Generation. ACM Trans-actions on Graphics 21, 3 (July 2002), 362–371.

[16] Marsden, J. E., and Hughes, T. J. R. Math-ematical Foundations of Elasticity. Dover Publi-cations, 1994.

[17] Mehlum, E., and Tarrou, C. Invariantsmoothness measures for surfaces. Advances inComputational Mathematics 8, 1-2 (1998), 49–63.

12

Page 14: Efficient Nonlinear Optimizationvia Multiscale Gradient

[18] Muller, M., Dorsey, J., McMillan, L.,Jagnow, R., and Cutler, B. Stable Real-Time Deformations. In SCA (2002), pp. 49–54.

[19] Muller, M., and Gross, M. H. InteractiveVirtual Materials. In Graphics Interface 2004(May 2004), pp. 239–246.

[20] Nealen, A., Igarashi, T., Sorkine, O., andAlexa, M. FiberMesh: Designing FreeformSurfaces with 3D Curves. ACM Transactions onGraphics 26, 3 (July 2007), 41:1–41:9.

[21] Nocedal, J. Updating Quasi-Newton Matriceswith Limited Storage. Mathematics of Computa-tion 35, 151 (1980), 773–782.

[22] Okazaki, N. libLBFGS: a libraryof Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS), 2010.http://www.chokkan.org/software/liblbfgs/.

[23] Pinkall, U., and Polthier, K. ComputingDiscrete Minimal Surfaces and Their Conjugates.Experimental Mathematics 2, 1 (1993), 15–36.

[24] Press, W. H., Teukolsky, S. A., Vetter-ling, W. T., and Flannery, B. P. NumericalRecipes in C: The Art of Scientific Computing,2nd ed. Cambridge University Press, 1992.

[25] Renka, R. J. Constructing fair curves and sur-faces with a Sobolev gradient method. ComputerAided Geometric Design 21, 2 (2004), 137 – 149.

[26] Renka, R. J., and Neuberger, J. W. Min-imal surfaces and Sobolev gradients. SIAM J.Sci. Comput. 16 (November 1995), 1412–1427.

[27] Rivara, M.-C. Local modification of meshes foradaptive and/or multigrid finite-element meth-ods. Journal of Computational and Applied Math-ematics 36, 1 (1991), 79–89. Special Issue onAdaptive Methods.

[28] Sander, P. V., Snyder, J., Gortler, S. J.,and Hoppe, H. Texture Mapping ProgressiveMeshes. In Proceedings of ACM SIGGRAPH2001 (Aug. 2001), Computer Graphics Proceed-ings, Annual Conference Series, pp. 409–416.

[29] Schenk, O., and Gartner, K. Solving un-symmetric sparse systems of linear equationswith PARDISO. Future Gener. Comput. Syst. 20(April 2004), 475–487.

[30] Shi, L., Yu, Y., Bell, N., and Feng, W.-W.A fast multigrid algorithm for mesh deformation.ACM Transactions on Graphics 25, 3 (July 2006),1108–1117.

[31] Sorkine, O., Cohen-Or, D., Goldenthal,R., and Lischinski, D. Bounded-distortionpiecewise mesh parameterization. In VIS (2002),IEEE Computer Society, pp. 355–362.

[32] Sundaramoorthi, G., Yezzi, A., and Men-nucci, A. C. Sobolev active contours. Inter-national Journal of Computer Vision 73 (2005),109–120.

[33] Taubin, G. A Signal Processing Approach toFair Surface Design. In Proceedings of SIG-GRAPH 95 (Aug. 1995), Computer GraphicsProceedings, Annual Conference Series, pp. 351–358.

[34] Terzopoulos, D., Platt, J., Barr, A., andFleischer, K. Elastically Deformable Mod-els. In Computer Graphics (Proceedings of SIG-GRAPH 87) (July 1987), pp. 205–214.

[35] Willmore, T. J. Mean Curvature of Rieman-nian Immersions. Journal of the London Mathe-matical Society s2-3, 2 (1971), 307–310.

[36] Zhang, E., Mischaikow, K., and Turk, G.Feature-based surface parameterization and tex-ture mapping. ACM Transactions on Graphics24, 1 (Jan. 2005), 1–27.

13