towards optimal solvers for pde-constrained optimization volkan akcelik omar ghattas volkan akcelik...

Towards Optimal Solvers for Towards Optimal Solvers for PDE-Constrained OptimizationPDE-Constrained Optimization

Volkan Akcelik Omar GhattasVolkan Akcelik Omar GhattasCarnegie Mellon UniversityCarnegie Mellon University

George BirosGeorge BirosNew York UniversityNew York University

TOPS Winter MeetingTOPS Winter MeetingLivermore, CA January 25-26, 2002Livermore, CA January 25-26, 2002

Simulation vs. OptimizationSimulation vs. Optimization

• PDE modelPDE model

• Simulation (forward) problemSimulation (forward) problemo Given “data” Given “data” xx (e.g. material coefficients, domain or (e.g. material coefficients, domain or

boundary sources, geometry), find state variables boundary sources, geometry), find state variables u u (velocity, stress, temperature, magnetic or electric field, (velocity, stress, temperature, magnetic or electric field, displacement, etc.) displacement, etc.)

• Optimization (inverse) problemOptimization (inverse) problemo Given desired goal involving Given desired goal involving uu, find , find xx o Optimal design, optimal control, or parameter estimationOptimal design, optimal control, or parameter estimation

PDE-Constrained Optimization PDE-Constrained Optimization

First-order Optimality ConditionsFirst-order Optimality Conditions

Examples Examples

• Optimal design Optimal design o Shape optimization of viscous flow systemShape optimization of viscous flow system

• NNdd = O(1)= O(1)

• Optimal controlOptimal controlo Boundary flow control Boundary flow control

• NNdd = O(N = O(Nuu2/32/3))

• Parameter estimationParameter estimationo Heterogeneous inverse wave propagation Heterogeneous inverse wave propagation

• NNdd = O(N = O(Nuu))

Optimal Design: Optimal Design: Artificial Heart Shape Optimization 1Artificial Heart Shape Optimization 1

Goal: Find pump geometry that minimizes blood damageGoal: Find pump geometry that minimizes blood damage (J. Antaki & G. Burgreen, Univ Pittsburgh Medical Center)(J. Antaki & G. Burgreen, Univ Pittsburgh Medical Center)

G. BurgreenG. Burgreen

Optimal Design: Optimal Design: Artificial Heart Shape Optimization 2 Artificial Heart Shape Optimization 2

Burgreen & Antaki, 1996Burgreen & Antaki, 1996

Optimal Control:Optimal Control:Boundary Flow Control 1Boundary Flow Control 1

MHD flow control using boundary magnetic fieldMHD flow control using boundary magnetic field

Optimal Control: Optimal Control: Boundary Flow Control 2Boundary Flow Control 2

Goal: find boundary suction/injection that Goal: find boundary suction/injection that minimizes energy dissipation in a viscous minimizes energy dissipation in a viscous incompressible flowincompressible flow

Optimal Control: Optimal Control: Viscous Boundary Flow Control 2Viscous Boundary Flow Control 2

(G. Biros, 2000) (G. Biros, 2000)

Optimal Control: Optimal Control: Viscous Boundary Flow Control 3Viscous Boundary Flow Control 3

No controlNo control Optimal controlOptimal control

Parameter Estimation: Parameter Estimation: Seismic Inversion 1Seismic Inversion 1

• Forward ProblemForward Problem: : Given soil Given soil material and earthquake source material and earthquake source parameters, find earthquake parameters, find earthquake ground motion.ground motion.

• Inverse ProblemInverse Problem: : Given Given earthquake observations, earthquake observations, estimate material and source estimate material and source parameters.parameters.

•Part of Part of Quake KDI ProjectQuake KDI Project at at CMU (J. Bielak, D. O’Hallaron), CMU (J. Bielak, D. O’Hallaron), Berkeley (J. Shewchuk), and Berkeley (J. Shewchuk), and SDSU (S. Day, H. Magistrale)SDSU (S. Day, H. Magistrale)


Simulation of 1994 Northridge EarthquakeSimulation of 1994 Northridge Earthquake


Surface VisualizationSurface Visualization

Lagrange-Newton-Krylov-Schur methodLagrange-Newton-Krylov-Schur method

• Consider discrete form of problem:Consider discrete form of problem:

statesstates controlscontrols

objective functionobjective function state equations (PDEs)state equations (PDEs)

Optimality ConditionsOptimality Conditions

• Lagrangian function:Lagrangian function:

• First order optimality conditions:First order optimality conditions:

• Define:Define:

Newton’s Method for Optimality Newton’s Method for Optimality Conditions (SQP, Newton-Lagrange)Conditions (SQP, Newton-Lagrange)

• A Newton step:A Newton step:

• Two possibilities for solution:Two possibilities for solution:• Full space method: solve KKT system with a Krylov Full space method: solve KKT system with a Krylov

method—need good preconditionermethod—need good preconditioner• Reduced space method: eliminate state variables Reduced space method: eliminate state variables

and Lagrange multipliersand Lagrange multipliers

(KKT system)(KKT system)

Quasi-Newton Reduced SQPQuasi-Newton Reduced SQP

• LU factorization of (permuted) KKT matrixLU factorization of (permuted) KKT matrix

• Problems:Problems:o Reduced Hessian Reduced Hessian WWzz needs needs mm (number of decision variables) PDE (number of decision variables) PDE

solves per optimization iterationsolves per optimization iterationo 22ndnd derivatives needed derivatives needed

• Quasi-Newton RSQP:Quasi-Newton RSQP:o Replace reduced Hessian by quasi-Newton approximationReplace reduced Hessian by quasi-Newton approximationo Discard other 2Discard other 2ndnd derivative terms derivative terms

• PropertiesPropertieso Just 2 PDE solves required per optimization iterationJust 2 PDE solves required per optimization iterationo 2-step superlinear convergence2-step superlinear convergence

• Use Schur-based approximate LU factorization as KKT Use Schur-based approximate LU factorization as KKT preconditionerpreconditioner

• Since preconditioner, can replace PDE solves by actions of Since preconditioner, can replace PDE solves by actions of PDE preconditionerPDE preconditioner

• Properties:Properties:o No PDE solves per iteration (2 preconditioner applications No PDE solves per iteration (2 preconditioner applications

required per inner Krylov iteration)required per inner Krylov iteration)o Newton convergence for outer iterationNewton convergence for outer iteration

• Inspiration:Inspiration:o Schur complement domain decomposition preconditioners Schur complement domain decomposition preconditioners

(Keyes & Gropp, 1987)(Keyes & Gropp, 1987)

Approximate QN-RSQP PreconditionerApproximate QN-RSQP Preconditioner

~~

~~

~~

Effectiveness of Schur PreconditionerEffectiveness of Schur Preconditioner

• Preconditioned KKT system (w/exact Preconditioned KKT system (w/exact AA))

• Schur preconditioner results in:Schur preconditioner results in:o Reduced condition numberReduced condition numbero Clustering of eigenvaluesClustering of eigenvalueso But not a normal matrix—convergence rate bounds But not a normal matrix—convergence rate bounds

depend on condition number of eigenvector matrixdepend on condition number of eigenvector matrixo In practice, works very wellIn practice, works very well

How to Precondition Reduced Hessian?How to Precondition Reduced Hessian?

• RequirementsRequirementso Cannot compute reduced Hessian, only its Cannot compute reduced Hessian, only its

(approximate) action on a vector(approximate) action on a vector

• PossibilitiesPossibilitieso Quasi-Newton formula (limited memory for large m)Quasi-Newton formula (limited memory for large m)o Fixed small number of stationary iterationsFixed small number of stationary iterationso Sparse approximate inverse preconditioner (SPAI)Sparse approximate inverse preconditioner (SPAI)

o Discrete approximation of underlying continuous WDiscrete approximation of underlying continuous Wzz

o Sherman-Morrison-Woodbury, if reduced Hessian is Sherman-Morrison-Woodbury, if reduced Hessian is of form “compact operator + SPD”of form “compact operator + SPD”

2-step Stationary Iterations as Reduced 2-step Stationary Iterations as Reduced Hessian Preconditioner Hessian Preconditioner

• PropertiesProperties• Same complexity as CG but a constant preconditionerSame complexity as CG but a constant preconditioner• Needs only Matvecs (here with approximate reduced Hessian)Needs only Matvecs (here with approximate reduced Hessian)• Needs estimates of extremal eigenvaluesNeeds estimates of extremal eigenvalues

• Use to initialize L-BFGS preconditioner (effectively HUse to initialize L-BFGS preconditioner (effectively H00))

Inexact Solution of KKT SystemInexact Solution of KKT System

• Solve KKT system iteratively to tolerance dependent on Solve KKT system iteratively to tolerance dependent on

x Lagrangian gradientx Lagrangian gradient• Eisenstat & Walker inexact Newton theory applies for Eisenstat & Walker inexact Newton theory applies for

reduction of norm of Lagrangian gradient reduction of norm of Lagrangian gradient • But will it satisfy merit function criteria?But will it satisfy merit function criteria?• Can show that in vicinity of local minimum (for Can show that in vicinity of local minimum (for

augmented Lagrangian merit function):augmented Lagrangian merit function):o If If small enough, Armijo condition satisfied with unit small enough, Armijo condition satisfied with unit

steplengthssteplengthso If If is is OO(Lagrangian gradient), quadratic convergence (Lagrangian gradient), quadratic convergence

preservedpreserved

Putting it all together: Putting it all together: Lagrange-Newton-Krylov-Schur MethodLagrange-Newton-Krylov-Schur Method

• Continuation loopContinuation loopo Optimization iteration (Optimization iteration (Lagrange-NewtonLagrange-Newton))

• Estimate extremal eigenvalues of (approximate) reduced Hessian using Estimate extremal eigenvalues of (approximate) reduced Hessian using Lanczos (retreat to QN-RSQP if negative)Lanczos (retreat to QN-RSQP if negative)

• Inexact KKT solution via symmetric QMR (Inexact KKT solution via symmetric QMR (KrylovKrylov) ) – Quasi-Newton RSQP preconditioner (Quasi-Newton RSQP preconditioner (SchurSchur))

» 2-step stationary iterations+ L-BFGS approximation of inverse 2-step stationary iterations+ L-BFGS approximation of inverse reduced Hessianreduced Hessian

» PDE solve replaced by PDE preconditioner PDE solve replaced by PDE preconditioner • Backtracking line search on augmented Lagrangian or Backtracking line search on augmented Lagrangian or ll11 merit function merit function• If no sufficient descent use QN-RSQPIf no sufficient descent use QN-RSQP• Compute derivatives, objective function, and residualsCompute derivatives, objective function, and residuals• Update solution, tighten tolerancesUpdate solution, tighten tolerances

Application: Optimal Boundary Control of Application: Optimal Boundary Control of Viscous FlowViscous Flow

Goal: find boundary suction/injection that Goal: find boundary suction/injection that minimizes energy dissipation in a viscous minimizes energy dissipation in a viscous incompressible flowincompressible flow

Fixed-size efficiencyFixed-size efficiency

Veltisto + PETSc implementationVeltisto + PETSc implementation

algorithmicalgorithmicparallelparallel

overalloverall

(n=120,000)(n=120,000)

Isogranular algorithmic efficiency Isogranular algorithmic efficiency

““textbook” Newtontextbook” Newtonmesh independencemesh independence

Mesh independence of Krylov iteration with Mesh independence of Krylov iteration with exact PDE solves (implies reduced Hessianexact PDE solves (implies reduced Hessianpreconditioner is effective)preconditioner is effective)

Moderate growth of Krylov iterations Moderate growth of Krylov iterations with approximate PDE solves (implies with approximate PDE solves (implies PDE precond is moderately effective)PDE precond is moderately effective)

4x cost of 4x cost of PDE solvePDE solve

Veltisto: A Library for Veltisto: A Library for PDE-Constrained Optimization PDE-Constrained Optimization

• PETSc extensionPETSc extension• Object orientedObject oriented• Nonlinearly constrained optimizationNonlinearly constrained optimization• Targets steady PDE constraintsTargets steady PDE constraints• Supports domain-based parallelismSupports domain-based parallelism• Matrix-free implementationMatrix-free implementation• In use at Sandia-AlbuquerqueIn use at Sandia-Albuquerque

Veltisto Features Veltisto Features

• Optimizers: LNKS, LNKS-IP, QN-RSQP, N-RSQPOptimizers: LNKS, LNKS-IP, QN-RSQP, N-RSQP• Preconditioners: LM-BFGS, SR1, Broyden, 2-step Preconditioners: LM-BFGS, SR1, Broyden, 2-step

stationary stationary • Merit functions: augmented Lagrangian, L1, Merit functions: augmented Lagrangian, L1,

L1+second order correction L1+second order correction • Symmetric QMR (allows indefinite preconditioning)Symmetric QMR (allows indefinite preconditioning)

Veltisto Example Veltisto Example

ConclusionsConclusions

• Schur-complementing the optimality system Schur-complementing the optimality system permits capitalizing on investment in PDE permits capitalizing on investment in PDE solvers solvers

• Lagrange-Newton-Krylov-Schur methods can Lagrange-Newton-Krylov-Schur methods can exhibit mesh-independent outer iterations for exhibit mesh-independent outer iterations for mesh-based control spacesmesh-based control spaces

• Inner iteration potentially mesh independentInner iteration potentially mesh independento Depends on effective PDE and reduced Hessian Depends on effective PDE and reduced Hessian

preconditionerspreconditioners

• Very good scalability Very good scalability • >10x speedup over quasi-Newton>10x speedup over quasi-Newton• Optimization in small multiple of simulation Optimization in small multiple of simulation

costcost

Future WorkFuture Work

• Approximate block elimination preconditionersApproximate block elimination preconditionerso Other reduced Hessian preconditionersOther reduced Hessian preconditioners

• Direct preconditioning of KKT systemDirect preconditioning of KKT systemo DD preconditioners directly on KKT systemDD preconditioners directly on KKT systemo Multigrid on KKT system (Barry)Multigrid on KKT system (Barry)o Multigrid preconditioner on block diagonal Multigrid preconditioner on block diagonal

approximation of KKT system approximation of KKT system

• Reformulation of KKT system to make it SPDReformulation of KKT system to make it SPDo FOSLS?FOSLS?o Augmented Lagrangian?Augmented Lagrangian?

• Time dependence and inequalities remain a Time dependence and inequalities remain a challengechallenge

Acoustic Acoustic inversioninversion

First order optimality conditionsFirst order optimality conditions

First order optimality conditions, cont.First order optimality conditions, cont.

Summary: first order optimality conditions Summary: first order optimality conditions (Karush-Kuhn-Tucker)(Karush-Kuhn-Tucker)

Newton solution of the KKT conditionsNewton solution of the KKT conditions

Schur complement methodSchur complement method

• Schur complement is self adjoint & positive definite near minimumSchur complement is self adjoint & positive definite near minimum• Action of Schur complement on q requires 1 forward & 1 adjoint wave propAction of Schur complement on q requires 1 forward & 1 adjoint wave prop

Gauss-Newton-Schur methodGauss-Newton-Schur method

• GN Schur complement is self adjoint & positive definite everywhereGN Schur complement is self adjoint & positive definite everywhere• Quadratic convergence for good fit problems, linear otherwiseQuadratic convergence for good fit problems, linear otherwise• Action of Schur complement on Action of Schur complement on qq requires 1 forward & 1 adjoint wave prop requires 1 forward & 1 adjoint wave prop

Spectrum of GN Schur complement Spectrum of GN Schur complement (just least squares term; no regularization)(just least squares term; no regularization)

rougheigenvectors

smooth eigenvectors

400 eigenvalues

first 12 eigenvalues

-(Gauss)-Newton-Schur-CG Solver-(Gauss)-Newton-Schur-CG Solver

• Multiscale continuation over Multiscale continuation over uu, , , , qq grids and grids and source frequencysource frequency

o (Gauss)-Newton nonlinear iteration(Gauss)-Newton nonlinear iteration

• Conjugate gradient solution of Schur-Conjugate gradient solution of Schur-decomposed linear systemdecomposed linear system

– Currently no preconditionerCurrently no preconditioner

2D/3D Acoustic Wave Equation Example2D/3D Acoustic Wave Equation Example

• Piecewise bi(tri)linear approximation of pressure Piecewise bi(tri)linear approximation of pressure uu, , adjoint adjoint , and square of wave speed q, and square of wave speed q

• Explicit time integration Explicit time integration • Up to 512x512 and 64x64x64 grids, 2000 time stepsUp to 512x512 and 64x64x64 grids, 2000 time steps• 50 receivers, 1 harmonic source50 receivers, 1 harmonic source• Homogeneous initial guessHomogeneous initial guess• PETSc (PETSc (www.petsc.anl.govwww.petsc.anl.gov) implementation ) implementation • Performance (Tikhonov regularization)Performance (Tikhonov regularization)

o 6 hours on 64 processors of T3E-900 at PSC6 hours on 64 processors of T3E-900 at PSCo Number of inner conjugate gradient iterations 10~20Number of inner conjugate gradient iterations 10~20o Number of outer Newton iterations 10~20Number of outer Newton iterations 10~20

http://www.petsc.anl.gov/

Effect of different types of regularizationEffect of different types of regularization

Total variation regularization, finer gridsTotal variation regularization, finer grids

Effect of different regularization weightsEffect of different regularization weights(Tikhonov regularization)(Tikhonov regularization)

Multilevel vs. single level continuationMultilevel vs. single level continuation(Tikhononv regularization)(Tikhononv regularization)

multilevel solutionmultilevel solution

single level solutionsingle level solution

Effect of data noise on solutionEffect of data noise on solution

3D Inversion3D Inversion

Performance: linear and nonlinear Performance: linear and nonlinear iterationsiterations

Performance, objective function valuePerformance, objective function value

Objective function iteration/grid history Objective function iteration/grid history

ConclusionsConclusions

• Multilevel continuation forces successive Multilevel continuation forces successive velocity estimates to remain within basin of velocity estimates to remain within basin of attraction of global minimumattraction of global minimum

• Outer iterations are mesh-independent for Outer iterations are mesh-independent for Newton methodNewton method

• Inner iterations weakly mesh-dependent for Inner iterations weakly mesh-dependent for Tikhonov regularization (but scale poorly for Tikhonov regularization (but scale poorly for TV regularization)TV regularization)

• Preconditioner in progressPreconditioner in progresso Limited memory quasi-Newton for least squares termLimited memory quasi-Newton for least squares termo Multigrid for TV termMultigrid for TV termo Use Sherman-Morrison-Woodbury to combineUse Sherman-Morrison-Woodbury to combine

Computation of terms likeComputation of terms like

• Adjoint problem is a final value problemAdjoint problem is a final value problem• Forward problem is an initial value problemForward problem is an initial value problem• But integral requires simultaneous availability of But integral requires simultaneous availability of and and uu • Options Options

o Store everything: Store everything: O(n)O(n) storage and storage and O(n)O(n) work worko Store nothing: Store nothing: O(1)O(1) storage, storage, O(nO(n22)) work worko Direct sensitivities: Direct sensitivities: O(1)O(1) storage, storage, O(nO(n44)) work worko Metrics:Metrics:

• Work unit = work per time stepWork unit = work per time step• Storage unit = storage per time stepStorage unit = storage per time step• nn = number of time steps (assume explicit method) = number of time steps (assume explicit method)

• Griewank’s checkpointing algorithmGriewank’s checkpointing algorithmo Requires Requires O(log n)O(log n) storage, storage, O(n log n)O(n log n) work work

dtuT

0

Checkpointing Algorithm for Checkpointing Algorithm for Time-dependent AdjointsTime-dependent Adjoints

Progression of the algorithmProgression of the algorithm Binary tree structureBinary tree structure

n/2n/2 n/4n/4

n/4n/4

n/8n/8

n/8n/8

n/8n/8

n/8n/8

log n log n levelslevels n/2 log n n/2 log n workwork

log n storagelog n storage

Behavior of Least Squares ObjectiveBehavior of Least Squares Objective

Source frequencySource frequency

Fre

qu

en

cy o

f p

ert

urb

ati

on

Fre

qu

en

cy o

f p

ert

urb

ati

on

of

wave s

peed

of

wave s

peed

lowlow mediummedium highhigh

lowlow

medmed

highhigh

Multiple m

inima

Multiple m

inima

Ill-conditioning

Ill-conditioning

A Multiscale AlgorithmA Multiscale Algorithm

• Determine material frequency components separatelyDetermine material frequency components separately

• Solve inversion problem using source frequency and Solve inversion problem using source frequency and grid resolution that are consistent with material grid resolution that are consistent with material frequencyfrequencyo Ill-conditioningIll-conditioning: Regularize against material : Regularize against material

frequency changes greater than appropriate for frequency changes greater than appropriate for current gridcurrent grid

o Multiple minimaMultiple minima: Rely on good initial guesses from : Rely on good initial guesses from coarser grids to remain in basins of attraction of coarser grids to remain in basins of attraction of global minimaglobal minima

• Refine source and material frequency components and Refine source and material frequency components and grid resolutiongrid resolution

towards optimal solvers for pde-constrained optimization volkan akcelik omar ghattas volkan akcelik...

Documents

viscous boundary flow

mhd flow control

boundary sources

seismic inversion

boundary suctioninjection

x optimal design

newtonlagrangea newton

optimal solvers