towards optimal solvers for pde-constrained optimization volkan akcelik omar ghattas volkan akcelik...
TRANSCRIPT
Towards Optimal Solvers for Towards Optimal Solvers for PDE-Constrained OptimizationPDE-Constrained Optimization
Volkan Akcelik Omar GhattasVolkan Akcelik Omar GhattasCarnegie Mellon UniversityCarnegie Mellon University
George BirosGeorge BirosNew York UniversityNew York University
TOPS Winter MeetingTOPS Winter MeetingLivermore, CA January 25-26, 2002Livermore, CA January 25-26, 2002
Simulation vs. OptimizationSimulation vs. Optimization
• PDE modelPDE model
• Simulation (forward) problemSimulation (forward) problemo Given “data” Given “data” xx (e.g. material coefficients, domain or (e.g. material coefficients, domain or
boundary sources, geometry), find state variables boundary sources, geometry), find state variables u u (velocity, stress, temperature, magnetic or electric field, (velocity, stress, temperature, magnetic or electric field, displacement, etc.) displacement, etc.)
• Optimization (inverse) problemOptimization (inverse) problemo Given desired goal involving Given desired goal involving uu, find , find xx o Optimal design, optimal control, or parameter estimationOptimal design, optimal control, or parameter estimation
PDE-Constrained Optimization PDE-Constrained Optimization
First-order Optimality ConditionsFirst-order Optimality Conditions
Examples Examples
• Optimal design Optimal design o Shape optimization of viscous flow systemShape optimization of viscous flow system
• NNdd = O(1)= O(1)
• Optimal controlOptimal controlo Boundary flow control Boundary flow control
• NNdd = O(N = O(Nuu2/32/3))
• Parameter estimationParameter estimationo Heterogeneous inverse wave propagation Heterogeneous inverse wave propagation
• NNdd = O(N = O(Nuu))
Optimal Design: Optimal Design: Artificial Heart Shape Optimization 1Artificial Heart Shape Optimization 1
Goal: Find pump geometry that minimizes blood damageGoal: Find pump geometry that minimizes blood damage (J. Antaki & G. Burgreen, Univ Pittsburgh Medical Center)(J. Antaki & G. Burgreen, Univ Pittsburgh Medical Center)
G. BurgreenG. Burgreen
Optimal Design: Optimal Design: Artificial Heart Shape Optimization 2 Artificial Heart Shape Optimization 2
Burgreen & Antaki, 1996Burgreen & Antaki, 1996
Optimal Control:Optimal Control:Boundary Flow Control 1Boundary Flow Control 1
MHD flow control using boundary magnetic fieldMHD flow control using boundary magnetic field
Optimal Control: Optimal Control: Boundary Flow Control 2Boundary Flow Control 2
Goal: find boundary suction/injection that Goal: find boundary suction/injection that minimizes energy dissipation in a viscous minimizes energy dissipation in a viscous incompressible flowincompressible flow
Optimal Control: Optimal Control: Viscous Boundary Flow Control 2Viscous Boundary Flow Control 2
(G. Biros, 2000) (G. Biros, 2000)
Optimal Control: Optimal Control: Viscous Boundary Flow Control 3Viscous Boundary Flow Control 3
No controlNo control Optimal controlOptimal control
Parameter Estimation: Parameter Estimation: Seismic Inversion 1Seismic Inversion 1
• Forward ProblemForward Problem: : Given soil Given soil material and earthquake source material and earthquake source parameters, find earthquake parameters, find earthquake ground motion.ground motion.
• Inverse ProblemInverse Problem: : Given Given earthquake observations, earthquake observations, estimate material and source estimate material and source parameters.parameters.
•Part of Part of Quake KDI ProjectQuake KDI Project at at CMU (J. Bielak, D. O’Hallaron), CMU (J. Bielak, D. O’Hallaron), Berkeley (J. Shewchuk), and Berkeley (J. Shewchuk), and SDSU (S. Day, H. Magistrale)SDSU (S. Day, H. Magistrale)
Parameter Estimation: Parameter Estimation: Seismic Inversion 2Seismic Inversion 2
Parameter Estimation: Parameter Estimation: Seismic Inversion 3Seismic Inversion 3
Simulation of 1994 Northridge EarthquakeSimulation of 1994 Northridge Earthquake
Parameter Estimation: Parameter Estimation: Seismic Inversion 4Seismic Inversion 4
Surface VisualizationSurface Visualization
Lagrange-Newton-Krylov-Schur methodLagrange-Newton-Krylov-Schur method
• Consider discrete form of problem:Consider discrete form of problem:
statesstates controlscontrols
objective functionobjective function state equations (PDEs)state equations (PDEs)
Optimality ConditionsOptimality Conditions
• Lagrangian function:Lagrangian function:
• First order optimality conditions:First order optimality conditions:
• Define:Define:
Newton’s Method for Optimality Newton’s Method for Optimality Conditions (SQP, Newton-Lagrange)Conditions (SQP, Newton-Lagrange)
• A Newton step:A Newton step:
• Two possibilities for solution:Two possibilities for solution:• Full space method: solve KKT system with a Krylov Full space method: solve KKT system with a Krylov
method—need good preconditionermethod—need good preconditioner• Reduced space method: eliminate state variables Reduced space method: eliminate state variables
and Lagrange multipliersand Lagrange multipliers
(KKT system)(KKT system)
Quasi-Newton Reduced SQPQuasi-Newton Reduced SQP
• LU factorization of (permuted) KKT matrixLU factorization of (permuted) KKT matrix
• Problems:Problems:o Reduced Hessian Reduced Hessian WWzz needs needs mm (number of decision variables) PDE (number of decision variables) PDE
solves per optimization iterationsolves per optimization iterationo 22ndnd derivatives needed derivatives needed
• Quasi-Newton RSQP:Quasi-Newton RSQP:o Replace reduced Hessian by quasi-Newton approximationReplace reduced Hessian by quasi-Newton approximationo Discard other 2Discard other 2ndnd derivative terms derivative terms
• PropertiesPropertieso Just 2 PDE solves required per optimization iterationJust 2 PDE solves required per optimization iterationo 2-step superlinear convergence2-step superlinear convergence
• Use Schur-based approximate LU factorization as KKT Use Schur-based approximate LU factorization as KKT preconditionerpreconditioner
• Since preconditioner, can replace PDE solves by actions of Since preconditioner, can replace PDE solves by actions of PDE preconditionerPDE preconditioner
• Properties:Properties:o No PDE solves per iteration (2 preconditioner applications No PDE solves per iteration (2 preconditioner applications
required per inner Krylov iteration)required per inner Krylov iteration)o Newton convergence for outer iterationNewton convergence for outer iteration
• Inspiration:Inspiration:o Schur complement domain decomposition preconditioners Schur complement domain decomposition preconditioners
(Keyes & Gropp, 1987)(Keyes & Gropp, 1987)
Approximate QN-RSQP PreconditionerApproximate QN-RSQP Preconditioner
~~
~~
~~
Effectiveness of Schur PreconditionerEffectiveness of Schur Preconditioner
• Preconditioned KKT system (w/exact Preconditioned KKT system (w/exact AA))
• Schur preconditioner results in:Schur preconditioner results in:o Reduced condition numberReduced condition numbero Clustering of eigenvaluesClustering of eigenvalueso But not a normal matrix—convergence rate bounds But not a normal matrix—convergence rate bounds
depend on condition number of eigenvector matrixdepend on condition number of eigenvector matrixo In practice, works very wellIn practice, works very well
How to Precondition Reduced Hessian?How to Precondition Reduced Hessian?
• RequirementsRequirementso Cannot compute reduced Hessian, only its Cannot compute reduced Hessian, only its
(approximate) action on a vector(approximate) action on a vector
• PossibilitiesPossibilitieso Quasi-Newton formula (limited memory for large m)Quasi-Newton formula (limited memory for large m)o Fixed small number of stationary iterationsFixed small number of stationary iterationso Sparse approximate inverse preconditioner (SPAI)Sparse approximate inverse preconditioner (SPAI)
o Discrete approximation of underlying continuous WDiscrete approximation of underlying continuous Wzz
o Sherman-Morrison-Woodbury, if reduced Hessian is Sherman-Morrison-Woodbury, if reduced Hessian is of form “compact operator + SPD”of form “compact operator + SPD”
2-step Stationary Iterations as Reduced 2-step Stationary Iterations as Reduced Hessian Preconditioner Hessian Preconditioner
• PropertiesProperties• Same complexity as CG but a constant preconditionerSame complexity as CG but a constant preconditioner• Needs only Matvecs (here with approximate reduced Hessian)Needs only Matvecs (here with approximate reduced Hessian)• Needs estimates of extremal eigenvaluesNeeds estimates of extremal eigenvalues
• Use to initialize L-BFGS preconditioner (effectively HUse to initialize L-BFGS preconditioner (effectively H00))
Inexact Solution of KKT SystemInexact Solution of KKT System
• Solve KKT system iteratively to tolerance dependent on Solve KKT system iteratively to tolerance dependent on
x Lagrangian gradientx Lagrangian gradient• Eisenstat & Walker inexact Newton theory applies for Eisenstat & Walker inexact Newton theory applies for
reduction of norm of Lagrangian gradient reduction of norm of Lagrangian gradient • But will it satisfy merit function criteria?But will it satisfy merit function criteria?• Can show that in vicinity of local minimum (for Can show that in vicinity of local minimum (for
augmented Lagrangian merit function):augmented Lagrangian merit function):o If If small enough, Armijo condition satisfied with unit small enough, Armijo condition satisfied with unit
steplengthssteplengthso If If is is OO(Lagrangian gradient), quadratic convergence (Lagrangian gradient), quadratic convergence
preservedpreserved
Putting it all together: Putting it all together: Lagrange-Newton-Krylov-Schur MethodLagrange-Newton-Krylov-Schur Method
• Continuation loopContinuation loopo Optimization iteration (Optimization iteration (Lagrange-NewtonLagrange-Newton))
• Estimate extremal eigenvalues of (approximate) reduced Hessian using Estimate extremal eigenvalues of (approximate) reduced Hessian using Lanczos (retreat to QN-RSQP if negative)Lanczos (retreat to QN-RSQP if negative)
• Inexact KKT solution via symmetric QMR (Inexact KKT solution via symmetric QMR (KrylovKrylov) ) – Quasi-Newton RSQP preconditioner (Quasi-Newton RSQP preconditioner (SchurSchur))
» 2-step stationary iterations+ L-BFGS approximation of inverse 2-step stationary iterations+ L-BFGS approximation of inverse reduced Hessianreduced Hessian
» PDE solve replaced by PDE preconditioner PDE solve replaced by PDE preconditioner • Backtracking line search on augmented Lagrangian or Backtracking line search on augmented Lagrangian or ll11 merit function merit function• If no sufficient descent use QN-RSQPIf no sufficient descent use QN-RSQP• Compute derivatives, objective function, and residualsCompute derivatives, objective function, and residuals• Update solution, tighten tolerancesUpdate solution, tighten tolerances
Application: Optimal Boundary Control of Application: Optimal Boundary Control of Viscous FlowViscous Flow
Goal: find boundary suction/injection that Goal: find boundary suction/injection that minimizes energy dissipation in a viscous minimizes energy dissipation in a viscous incompressible flowincompressible flow
Fixed-size efficiencyFixed-size efficiency
Veltisto + PETSc implementationVeltisto + PETSc implementation
algorithmicalgorithmicparallelparallel
overalloverall
(n=120,000)(n=120,000)
Isogranular algorithmic efficiency Isogranular algorithmic efficiency
““textbook” Newtontextbook” Newtonmesh independencemesh independence
Mesh independence of Krylov iteration with Mesh independence of Krylov iteration with exact PDE solves (implies reduced Hessianexact PDE solves (implies reduced Hessianpreconditioner is effective)preconditioner is effective)
Moderate growth of Krylov iterations Moderate growth of Krylov iterations with approximate PDE solves (implies with approximate PDE solves (implies PDE precond is moderately effective)PDE precond is moderately effective)
4x cost of 4x cost of PDE solvePDE solve
Veltisto: A Library for Veltisto: A Library for PDE-Constrained Optimization PDE-Constrained Optimization
• PETSc extensionPETSc extension• Object orientedObject oriented• Nonlinearly constrained optimizationNonlinearly constrained optimization• Targets steady PDE constraintsTargets steady PDE constraints• Supports domain-based parallelismSupports domain-based parallelism• Matrix-free implementationMatrix-free implementation• In use at Sandia-AlbuquerqueIn use at Sandia-Albuquerque
Veltisto Features Veltisto Features
• Optimizers: LNKS, LNKS-IP, QN-RSQP, N-RSQPOptimizers: LNKS, LNKS-IP, QN-RSQP, N-RSQP• Preconditioners: LM-BFGS, SR1, Broyden, 2-step Preconditioners: LM-BFGS, SR1, Broyden, 2-step
stationary stationary • Merit functions: augmented Lagrangian, L1, Merit functions: augmented Lagrangian, L1,
L1+second order correction L1+second order correction • Symmetric QMR (allows indefinite preconditioning)Symmetric QMR (allows indefinite preconditioning)
Veltisto Example Veltisto Example
Veltisto Example Veltisto Example
ConclusionsConclusions
• Schur-complementing the optimality system Schur-complementing the optimality system permits capitalizing on investment in PDE permits capitalizing on investment in PDE solvers solvers
• Lagrange-Newton-Krylov-Schur methods can Lagrange-Newton-Krylov-Schur methods can exhibit mesh-independent outer iterations for exhibit mesh-independent outer iterations for mesh-based control spacesmesh-based control spaces
• Inner iteration potentially mesh independentInner iteration potentially mesh independento Depends on effective PDE and reduced Hessian Depends on effective PDE and reduced Hessian
preconditionerspreconditioners
• Very good scalability Very good scalability • >10x speedup over quasi-Newton>10x speedup over quasi-Newton• Optimization in small multiple of simulation Optimization in small multiple of simulation
costcost
Future WorkFuture Work
• Approximate block elimination preconditionersApproximate block elimination preconditionerso Other reduced Hessian preconditionersOther reduced Hessian preconditioners
• Direct preconditioning of KKT systemDirect preconditioning of KKT systemo DD preconditioners directly on KKT systemDD preconditioners directly on KKT systemo Multigrid on KKT system (Barry)Multigrid on KKT system (Barry)o Multigrid preconditioner on block diagonal Multigrid preconditioner on block diagonal
approximation of KKT system approximation of KKT system
• Reformulation of KKT system to make it SPDReformulation of KKT system to make it SPDo FOSLS?FOSLS?o Augmented Lagrangian?Augmented Lagrangian?
• Time dependence and inequalities remain a Time dependence and inequalities remain a challengechallenge
Acoustic Acoustic inversioninversion
First order optimality conditionsFirst order optimality conditions
First order optimality conditions, cont.First order optimality conditions, cont.
Summary: first order optimality conditions Summary: first order optimality conditions (Karush-Kuhn-Tucker)(Karush-Kuhn-Tucker)
Newton solution of the KKT conditionsNewton solution of the KKT conditions
Schur complement methodSchur complement method
• Schur complement is self adjoint & positive definite near minimumSchur complement is self adjoint & positive definite near minimum• Action of Schur complement on q requires 1 forward & 1 adjoint wave propAction of Schur complement on q requires 1 forward & 1 adjoint wave prop
Gauss-Newton-Schur methodGauss-Newton-Schur method
• GN Schur complement is self adjoint & positive definite everywhereGN Schur complement is self adjoint & positive definite everywhere• Quadratic convergence for good fit problems, linear otherwiseQuadratic convergence for good fit problems, linear otherwise• Action of Schur complement on Action of Schur complement on qq requires 1 forward & 1 adjoint wave prop requires 1 forward & 1 adjoint wave prop
Spectrum of GN Schur complement Spectrum of GN Schur complement (just least squares term; no regularization)(just least squares term; no regularization)
rougheigenvectors
smooth eigenvectors
400 eigenvalues
first 12 eigenvalues
-(Gauss)-Newton-Schur-CG Solver-(Gauss)-Newton-Schur-CG Solver
• Multiscale continuation over Multiscale continuation over uu, , , , qq grids and grids and source frequencysource frequency
o (Gauss)-Newton nonlinear iteration(Gauss)-Newton nonlinear iteration
• Conjugate gradient solution of Schur-Conjugate gradient solution of Schur-decomposed linear systemdecomposed linear system
– Currently no preconditionerCurrently no preconditioner
2D/3D Acoustic Wave Equation Example2D/3D Acoustic Wave Equation Example
• Piecewise bi(tri)linear approximation of pressure Piecewise bi(tri)linear approximation of pressure uu, , adjoint adjoint , and square of wave speed q, and square of wave speed q
• Explicit time integration Explicit time integration • Up to 512x512 and 64x64x64 grids, 2000 time stepsUp to 512x512 and 64x64x64 grids, 2000 time steps• 50 receivers, 1 harmonic source50 receivers, 1 harmonic source• Homogeneous initial guessHomogeneous initial guess• PETSc (PETSc (www.petsc.anl.govwww.petsc.anl.gov) implementation ) implementation • Performance (Tikhonov regularization)Performance (Tikhonov regularization)
o 6 hours on 64 processors of T3E-900 at PSC6 hours on 64 processors of T3E-900 at PSCo Number of inner conjugate gradient iterations 10~20Number of inner conjugate gradient iterations 10~20o Number of outer Newton iterations 10~20Number of outer Newton iterations 10~20
Effect of different types of regularizationEffect of different types of regularization
Total variation regularization, finer gridsTotal variation regularization, finer grids
Effect of different regularization weightsEffect of different regularization weights(Tikhonov regularization)(Tikhonov regularization)
Multilevel vs. single level continuationMultilevel vs. single level continuation(Tikhononv regularization)(Tikhononv regularization)
multilevel solutionmultilevel solution
single level solutionsingle level solution
Effect of data noise on solutionEffect of data noise on solution
3D Inversion3D Inversion
Performance: linear and nonlinear Performance: linear and nonlinear iterationsiterations
Performance, objective function valuePerformance, objective function value
Objective function iteration/grid history Objective function iteration/grid history
ConclusionsConclusions
• Multilevel continuation forces successive Multilevel continuation forces successive velocity estimates to remain within basin of velocity estimates to remain within basin of attraction of global minimumattraction of global minimum
• Outer iterations are mesh-independent for Outer iterations are mesh-independent for Newton methodNewton method
• Inner iterations weakly mesh-dependent for Inner iterations weakly mesh-dependent for Tikhonov regularization (but scale poorly for Tikhonov regularization (but scale poorly for TV regularization)TV regularization)
• Preconditioner in progressPreconditioner in progresso Limited memory quasi-Newton for least squares termLimited memory quasi-Newton for least squares termo Multigrid for TV termMultigrid for TV termo Use Sherman-Morrison-Woodbury to combineUse Sherman-Morrison-Woodbury to combine
Computation of terms likeComputation of terms like
• Adjoint problem is a final value problemAdjoint problem is a final value problem• Forward problem is an initial value problemForward problem is an initial value problem• But integral requires simultaneous availability of But integral requires simultaneous availability of and and uu • Options Options
o Store everything: Store everything: O(n)O(n) storage and storage and O(n)O(n) work worko Store nothing: Store nothing: O(1)O(1) storage, storage, O(nO(n22)) work worko Direct sensitivities: Direct sensitivities: O(1)O(1) storage, storage, O(nO(n44)) work worko Metrics:Metrics:
• Work unit = work per time stepWork unit = work per time step• Storage unit = storage per time stepStorage unit = storage per time step• nn = number of time steps (assume explicit method) = number of time steps (assume explicit method)
• Griewank’s checkpointing algorithmGriewank’s checkpointing algorithmo Requires Requires O(log n)O(log n) storage, storage, O(n log n)O(n log n) work work
dtuT
0
Checkpointing Algorithm for Checkpointing Algorithm for Time-dependent AdjointsTime-dependent Adjoints
Progression of the algorithmProgression of the algorithm Binary tree structureBinary tree structure
n/2n/2 n/4n/4
n/4n/4
n/8n/8
n/8n/8
n/8n/8
n/8n/8
log n log n levelslevels n/2 log n n/2 log n workwork
log n storagelog n storage
Behavior of Least Squares ObjectiveBehavior of Least Squares Objective
Source frequencySource frequency
Fre
qu
en
cy o
f p
ert
urb
ati
on
Fre
qu
en
cy o
f p
ert
urb
ati
on
of
wave s
peed
of
wave s
peed
lowlow mediummedium highhigh
lowlow
medmed
highhigh
Multiple m
inima
Multiple m
inima
Ill-conditioning
Ill-conditioning
A Multiscale AlgorithmA Multiscale Algorithm
• Determine material frequency components separatelyDetermine material frequency components separately
• Solve inversion problem using source frequency and Solve inversion problem using source frequency and grid resolution that are consistent with material grid resolution that are consistent with material frequencyfrequencyo Ill-conditioningIll-conditioning: Regularize against material : Regularize against material
frequency changes greater than appropriate for frequency changes greater than appropriate for current gridcurrent grid
o Multiple minimaMultiple minima: Rely on good initial guesses from : Rely on good initial guesses from coarser grids to remain in basins of attraction of coarser grids to remain in basins of attraction of global minimaglobal minima
• Refine source and material frequency components and Refine source and material frequency components and grid resolutiongrid resolution