scalable solvers and software for pde applications david e. keyes department of applied physics...
TRANSCRIPT
Scalable Solvers and Software for PDE Applications
David E. Keyes
Department of Applied Physics & Applied Mathematics
Columbia University
Institute for Scientific Computing Research
Lawrence Livermore National Laboratory
www . tops-scidac . org
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Motivation Solver performance is a major concern for parallel
simulations based on PDE formulations … including many of those of the U.S. DOE Scientific Discovery
through Advanced Computing (SciDAC) program
For target applications, implicit solvers may require 50% to 95% of execution time … at least, before “expert” overhaul for algorithmic optimality and
implementation performance
Even after a “best manual practice” overhaul, the solver may still require 20% to 50% of execution time
The solver may hit up against both the processor scalability limit and the memory bandwidth limitation of a PDE-based application, before any other part of the code … the first of these is not fundamental, though the second one is
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Presentation plan Overview of the SciDAC initiative
Brief review of scalable implicit methods (domain decomposed multilevel iterative methods) algorithms software components: PETSc, Hypre, etc.
Overview of the Terascale Optimal PDE Simulations project (TOPS)
Three “war stories” from the SciDAC magnetically confined fusion energy portfolio
Some advanced research directions physics-based preconditioning nonlinear Schwarz
On the horizon
IMI Lecture, 26 Jan 2004www . tops-scidac . org
SciDAC apps and infrastructure4 projects
in high energy and
nuclear physics
5 projects in fusion
energy science
14 projects in biological and environmental research
10 projects will in basic energy sciences
18 projects in scientific
software and network
infrastructure
IMI Lecture, 26 Jan 2004www . tops-scidac . org
“Enabling technologies” groups to develop reusable software and partner with application groups
From 2001 start-up, 51 projects share $57M/year Approximately one-third for applications A third for “integrated software infrastructure
centers” A third for grid infrastructure and collaboratories
Plus, multi-Tflop/s IBM SP machines at NERSC and ORNL available for SciDAC researchers
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Unclassified resources for DOE science
IBM Power4 Regatta
32 procs per node
864 procs total
4.5 Tflop/s
“Cheetah”
IBM Power3+ SMP
16 procs per node
6656 procs total
10 Tflop/s
“Seaborg”
Berkeley
Oak Ridge
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Designing a simulation code(from 2001 SciDAC report)
V&V loop
Performance loop
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Hardware Infrastructure
ARCHITECTURES
Applications
A “perfect storm” for simulation
scientific models
numerical algorithms
computer architecture
scientific software engineering
1686
1947
1976
“Computational science is undergoing a phase transition.” – D. Hitchcock, DOE
(dates are symbolic)
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Imperative: multiple-scale applications Multiple spatial scales
interfaces, fronts, layers thin relative to domain size,
<< L Multiple temporal scales
fast waves small transit times relative to
convection or diffusion, << T
Analyst must isolate dynamics of interest and model the rest in a system that can be discretized over more modest range of scales
May lead to infinitely “stiff” subsystem requiring special treatment by the solution method
Richtmeyer-Meshkov instability, c/o A. Mirin, LLNL
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Examples: multiple-scale applications Biopolymers, nanotechnology
1012 range in time, from 10-15 sec (quantum fluctuation) to 10-3 sec (molecular folding time)
typical computational model ignores smallest scales, works on classical dynamics only, but scientists increasingly want both
Galaxy formation 1020 range in space from binary star interactions
to diameter of universe heroic computational model handles all scales
with localized adaptive meshing
Supernova simulation, c/o A. Mezzacappa, ORNL
Supernovae simulation massive ranges in time and
space scales for radiation, turbulent convection, diffusion, chemical reaction, nuclear reaction
IMI Lecture, 26 Jan 2004www . tops-scidac . org
SciDAC portfolio characteristics Multiple temporal scales Multiple spatial scales Linear ill conditioning Complex geometry and severe anisotropy Coupled physics, with essential nonlinearities Ambition for uncertainty quantification,
parameter estimation, and design
Need toolkit of portable, extensible, tunable implicit solvers, not “one-size fits all”
IMI Lecture, 26 Jan 2004www . tops-scidac . org
TOPS starting point codes PETSc (ANL) Hypre (LLNL) Sundials (LLNL) SuperLU (LBNL) PARPACK (LBNL*) TAO (ANL) Veltisto (CMU) Many interoperability connections between these
packages that predated SciDAC Many application collaborators that predated SciDAC
IMI Lecture, 26 Jan 2004www . tops-scidac . org
TOPS participants
ODU
UC-B/LBNLANL
UT-K
TOPS lab (3)
CU
LLNL
TOPS university (7)
CMU
CU-B
NYU
IMI Lecture, 26 Jan 2004www . tops-scidac . org
In the old days, see “Templates” guides …www.netlib.org/etemplateswww.netlib.org/templates
124 pp. 410 pp.… these are good starts, but not adequate for SciDAC scales!
IMI Lecture, 26 Jan 2004www . tops-scidac . org
34 applications groups
7 ISIC groups (4 CS, 3 Math)
10 grid, data collaboratory groups
adaptive gridding discretization
solvers (TOPS)
systems software component architecture performance engineering data management
0),,,( ptxxf
0),( pxF
bAx BxAx
..),(min tsuxu
0),( uxF
software integration
performance optimization
“integrated software infrastructure centers”
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Keyword: “Optimal” Convergence rate nearly
independent of discretization parameters
multilevel schemes for rapid linear convergence of linear problems
Newton-like schemes for quadratic convergence of nonlinear problems
Convergence rate as independent as possible of physical parameters
continuation schemes physics-based preconditioning
Optimal convergence plus scalable loop body yields scalable solver
unscalable
scalable
Problem Size (increasing with number of processors)
Tim
e to
So
luti
on
200
150
50
0
100
10 100 10001
IMI Lecture, 26 Jan 2004www . tops-scidac . org
But where to go past O(N) ? Since O(N) is already optimal, there is nowhere further
“upward” to go in efficiency, but one must extend optimality “outward,” to more general problems
Hence, for instance, algebraic multigrid (AMG) to seek to obtain O(N) in indefinite, anisotropic, or inhomogeneous problems on irregular grids
AMG FrameworkR n
Choose coarse grids, transfer operators, and smoothers to
eliminate these “bad” components within a smaller dimensional space, and recur
error easily damped by pointwise relaxation
algebraically smooth error
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Toolchain for PDE solvers in TOPS project Design and implementation of “solvers”
Time integrators
Nonlinear solvers
Constrained optimizers
Linear solvers
Eigensolvers
Software integration Performance optimization
0),,,( ptxxf
0),( pxF
bAx
BxAx
0,0),(..),(min uuxFtsuxu
Optimizer
Linear solver
Eigensolver
Time integrator
Nonlinear solver
Indicates dependence
Sens. Analyzer
(w/ sens. anal.)
(w/ sens. anal.)
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Dominant data structures are grid-based
finite differences finite elements
finite volumes
All lead to problems with sparse Jacobian matrices; many tasks can leverage off an efficient set of tools for manipulating distributed sparse data structures
J=
node i
row i
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Newton-Krylov-Schwarz: a PDE applications “workhorse”
Newtonnonlinear solver
asymptotically quadratic
0)(')()( uuFuFuF cc uuu c
Krylovaccelerator
spectrally adaptive
FuJ }{minarg
},,,{ 2
FJxuFJJFFVx
Schwarzpreconditionerparallelizable
FMuJM 11
iTii
Tii RJRRRM 11 )(
IMI Lecture, 26 Jan 2004www . tops-scidac . org
SPMD parallelism w/domain decomposition
Partitioning of the grid induces block structure on the Jacobian
1
2
3
A23A21 A22
rows assigned to proc “2”
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Time-implicit Newton-Krylov-SchwarzFor accommodation of unsteady problems, and nonlinear robustness in
steady ones, NKS iteration is wrapped in time-stepping:for (l = 0; l < n_time; l++) {
select time step
for (k = 0; k < n_Newton; k++) {
compute nonlinear residual and Jacobian
for (j = 0; j < n_Krylov; j++) {
forall (i = 0; i < n_Precon ; i++) {
solve subdomain problems concurrently
} // End of loop over subdomains
perform Jacobian-vector product
enforce Krylov basis conditions
update optimal coefficients
check linear convergence
} // End of linear solver
perform DAXPY update
check nonlinear convergence
} // End of nonlinear loop
} // End of time-step loop
NKS loop
Pseudo-time loop
IMI Lecture, 26 Jan 2004www . tops-scidac . org
(N)KS kernel in parallel
local scatter
Jac-vec multiply
precond sweep
daxpy inner product
Krylov iteration
…
Bulk synchronous model leads to easy scalability analyses and projections. Each phase can be considered separately. What happens if, for instance, in this (schematicized) iteration, arithmetic speed is doubled, scalar all-gather is quartered, and local scatter is cut by one-third?
P1:
P2:
Pn:
…P1:
P2:
Pn:
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Estimating scalability of stencil computations Given complexity estimates of the leading terms of:
the concurrent computation (per iteration phase) the concurrent communication the synchronization frequency
And a bulk synchronous model of the architecture including: internode communication (network topology and protocol reflecting horizontal
memory structure) on-node computation (effective performance parameters including vertical
memory structure)
One can estimate optimal concurrency and optimal execution time
on per-iteration basis, or overall (by taking into account any granularity-dependent convergence rate), based on problem size N and concurrency P
simply differentiate time estimate in terms of (N,P) with respect to P, equate to
zero and solve for P in terms of N
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Scalability results for DD stencil computations With tree-based (logarithmic) global reductions and
scalable nearest neighbor hardware: optimal number of processors scales linearly with problem
size
With 3D torus-based global reductions and scalable nearest neighbor hardware:
optimal number of processors scales as three-fourths power of problem size (almost “scalable”)
With common network bus (heavy contention): optimal number of processors scales as one-fourth power
of problem size (not “scalable”) bad news for conventional Beowulf clusters, but see 2000
Bell Prize “price-performance awards”, for multiple NICs
IMI Lecture, 26 Jan 2004www . tops-scidac . org
PETSc codeUser code
ApplicationInitialization
FunctionEvaluation
JacobianEvaluation
Post-Processing
PC KSPPETSc
Main Routine
Linear Solvers (SLES)
Nonlinear Solvers (SNES)
Timestepping Solvers (TS)
NKS efficiently implemented in PETSc’s MPI-based distributed data structures
IMI Lecture, 26 Jan 2004www . tops-scidac . org
PETSc codeUser code
ApplicationInitialization
FunctionEvaluation
JacobianEvaluation
Post-Processing
PC KSPPETSc
Main Routine
Linear Solvers (SLES)
Nonlinear Solvers (SNES)
Timestepping Solvers (TS)
User Code/PETSc library interactions
Can be AD code
IMI Lecture, 26 Jan 2004www . tops-scidac . org
1999 Bell Prize for unstructured grid computational aerodynamics
mesh c/o D. Mavriplis, ICASE
Implemented in PETSc
www.mcs.anl.gov/petsc
Transonic “Lambda” Shock, Mach contours on surfaces
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Fixed-size parallel scaling results
Four orders of magnitude in 13 years
c/o K. Anderson, W. Gropp, D. Kaushik, D. Keyes and B. Smith
128 nodes 128 nodes 43min43min
3072 nodes 3072 nodes 2.5min, 2.5min, 226Gf/s226Gf/s
11M unknowns 11M unknowns 70% efficient70% efficient
IMI Lecture, 26 Jan 2004www . tops-scidac . org
BEB
divbt
JBVE
BJ 0
nDnt
n
V
VBJVVV
p
t
QTnpTt
Tn
IbbVV
ˆˆ1 ||
Physical models based on fluid-like magnetohydrodynamics (MHD)
0
Three “war stories” from magnetic fusion energy applications in SciDAC
IMI Lecture, 26 Jan 2004www . tops-scidac . org
• Conditions of interest possess two properties that pose great challenges to numerical approaches—anisotropy and stiffness.
• Anisotropy produces subtle balances of large forces, and vastly different parallel and perpendicular transport properties.
• Stiffness reflects the vast range of time-scales in the system: targeted physics is slow (~transport scale) compared to waves
Challenges in magnetic fusion
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Tokamak/stellerator simulations Center for Extended MHD Modeling (based at
Princeton Plasma Physics Lab) M3D code Realistic toroidal geom., unstructured mesh,
hybrid FE/FD discretization Fields expanded in scalar potentials, and
streamfunctions Operator-split, linearized, w/11 potential
solves in each poloidal cross-plane/step (90% exe. time)
Parallelized w/PETSc (Tang et al., SIAM PP01, Chen et al., SIAM AN02, Jardin et al., SIAM CSE03)
Want from TOPS: Now: scalable linear implicit solver for much
higher resolution (and for AMR) Later: fully nonlinearly implicit solvers and
coupling to other codes
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Provided new solvers across existing interfaces Hypre in PETSc
codes with PETSc interface (like CEMM’s M3D) can now invoke Hypre routines as solvers or preconditioners with command-line switch
SuperLU_DIST in PETSc as above, with SuperLU_DIST
Hypre in AMR Chombo code so far, Hypre is level-solver only; its AMG will ultimately
be useful as a bottom-solver, since it can be coarsened indefinitely without attention to loss of nested geometric structure; also FAC is being developed for AMR uses, like Chombo
IMI Lecture, 26 Jan 2004www . tops-scidac . org
smoother
Finest Grid
First Coarse Grid
coarser grid has fewer cells (less work & storage)
Restrictiontransfer from fine to coarse grid
Recursively apply this idea until we have an easy problem to solve
A Multigrid V-cycle
Prolongationtransfer from coarse to fine grid
Hypre: multilevel preconditioning
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Hypre’s AMG in M3D PETSc-based PPPL code M3D has been retrofit with Hypre’s algebraic
MG solver of Ruge-Steuben type Iteration count results below are averaged over 19 different PETSc
SLESSolve calls in initialization and one timestep loop for this operator split unsteady code, abcissa is number of procs in scaled problem; problem size ranges from 12K to 303K unknowns (approx 4K per processor)
0
100
200
300
400
500
600
700
3 12 27 48 75
ASM-GMRESAMG-FMGRES
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Hypre’s AMG in M3D Scaled speedup timing results below are summed over 19 different PETSc
SLESSolve calls in initialization and one timestep loop for this operator split unsteady code
Majority of AMG cost is coarse-grid formation (preprocessing) which does not scale as well as the inner loop V-cycle phase; in production, these coarse hierarchies will be saved for reuse (same linear systems are called in each timestep loop), making AMG much less expensive and more scalable
0
10
20
30
40
50
60
3 12 27 48 75
ASM-GMRESAMG-FMGRESAMG inner (est)
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Hypre’s “Conceptual Interfaces”
Data Layout
structured composite block-struc unstruc CSR
Linear Solvers
GMG, ... FAC, ... Hybrid, ... AMGe, ... ILU, ...
Linear System Interfaces
(Slide c/o E. Chow, LLNL)
IMI Lecture, 26 Jan 2004www . tops-scidac . org
SuperLU in NIMROD NIMROD is another MHD code in the CEMM collaboration
employs high-order elements on unstructured grids very poor convergence with default Krylov solver on 2D poloidal
crossplane linear solves
TOPS wired in SuperLU, just to try a sparse direct solver Speedup of more than 10 in serial, and about 8 on a
modest parallel cluster (24 procs) PI Dalton Schnack (General Atomics) thought he entered a
time machine SuperLU is not a “final answer”, but a sanity check Parallel ILU under Krylov should be superior
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Equilibrium:
Model equations: (Porcelli et al., 1993, 1999)
2D Hall MHD sawtooth instability (PETSc examples /snes/ex29.c and /sles/ex31.c)
(figures c/o A. Bhattacharjee, CMRS)
Vorticity, early time
Vorticity, later time
zoom
IMI Lecture, 26 Jan 2004www . tops-scidac . org
PETSc’s DMMG in Hall MR application Implicit code (snes/ex29.c)
versus explicit code (sles/ex31.c), both with second-order integration in time
Implicit code (snes/ex29.c) with first- and second-order integration in time
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Abstract Gantt Chart for TOPS
Algorithmic Development
Research Implementations
Hardened Codes
Applications Integration
Dissemination
time
e.g.,PETSc
e.g.,TOPSLib
e.g., ASPIN
Each color module represents an algorithmic research idea on its way to becoming part of a supported community software tool. At any moment (vertical time slice), TOPS has work underway at multiple levels. While some codes are in applications already, they are being improved in functionality and performance as part of the TOPS research agenda.
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Jacobian-free Newton-Krylov In the Jacobian-Free Newton-Krylov (JFNK) method, a
Krylov method solves the linear Newton correction equation, requiring Jacobian-vector products
These are approximated by the Fréchet derivatives
(where is chosen with a fine balance between approximation and floating point rounding error) or automatic differentiation, so that the actual Jacobian elements are never explicitly needed
One builds the Krylov space on a true F’(u) (to within numerical approximation)
)]()([1
)( uFvuFvuJ
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Philosophy of Jacobian-free NK To evaluate the linear residual, we use the true F’(u) , giving
a true Newton step and asymptotic quadratic Newton convergence
To precondition the linear residual, we do anything convenient that uses understanding of the dominant physics/mathematics in the system and respects the limitations of the parallel computer architecture and the cost of various operations:
Jacobian of lower-order discretization Jacobian with “lagged” values for expensive terms Jacobian stored in lower precision Jacobian blocks decomposed for parallelism Jacobian of related discretization operator-split Jacobians physics-based preconditioning
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Recall idea of preconditioning Krylov iteration is expensive in memory and in
function evaluations, so subspace dimension k must be kept small in practice, through preconditioning the Jacobian with an approximate inverse, so that the product matrix has low condition number in
Given the ability to apply the action of to a vector, preconditioning can be done on either the left, as above, or the right, as in, e.g., for matrix-free:
)]()([1 11 uFvBuFvJB
bBxAB 11 )( 1B
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Physics-based preconditioning In Newton iteration, one seeks to obtain a correction
(“delta”) to solution, by inverting the Jacobian matrix on (the negative of) the nonlinear residual:
A typical operator-split code also derives a “delta” to the solution, by some implicitly defined means, through a series of implicit and explicit substeps
This implicitly defined mapping from residual to “delta” is a natural preconditioner
Software must accommodate this!
)()]([ 1 kkk uFuJu
kk uuF )(
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Physics-based preconditioning We consider a standard “dynamical
core,” the shallow-water wave splitting algorithm, as a solver
Leaves a first-order in time splitting error
In the Jacobian-free Newton-Krylov framework, this solver, which maps a residual into a correction, can be regarded as a preconditioner
The true Jacobian is never formed yet the time-implicit nonlinear residual at each time step can be made as small as needed for nonlinear consistency in long time integrations
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Example: shallow water equations Continuity (*)
Momentum (**)
These equations admit a fast gravity wave, as can be seen by cross differentiating, e.g., (*) by t and (**) by x, and subtracting:
0)(
x
u
t
0)()( 2
xg
x
u
t
u
termsotherx
gt
2
2
2
2
t
x
IMI Lecture, 26 Jan 2004www . tops-scidac . org
1D shallow water equations, cont.
Wave equation for geopotential:
Gravity wave speed g
Typically , but stability restrictions would require timesteps based on the Courant-Friedrichs-Levy (CFL) criterion for the fastest wave, for an explicit method
One can solve fully implicitly, or one can filter out the gravity wave by solving semi-implicitly
ug
termsotherx
gt
2
2
2
2
IMI Lecture, 26 Jan 2004www . tops-scidac . org
1D shallow water equations, cont. Continuity (*)
Momentum (**)
0)( 11
x
u nnn
0)()()( 121
xg
x
uuu nn
nnn
Solving (**) for and substituting into (*),
where
1)( nu
x
S
xxg
nn
nnn
)(1
21
x
uuS
nnn
)(
)(2
IMI Lecture, 26 Jan 2004www . tops-scidac . org
1D shallow water equations, cont. After the parabolic equation is spatially discretized and
solved for , then can be found from n
nnn S
xgu
1
1)(
One scalar parabolic solve and one scalar explicit update replace an implicit hyperbolic system
This semi-implicit operator splitting is foundational to multiple scales problems in geophysical modeling
Similar tricks are employed in aerodynamics (sound waves), MHD (multiple Alfvén waves), reacting flows (fast kinetics), etc.
Temporal truncation error remains due to the lagging of the advection
in (**)
1n 1)( nu
To be dealt with shortly
IMI Lecture, 26 Jan 2004www . tops-scidac . org
1D Shallow water preconditioning Define continuity residual for each timestep:
Define momentum residual for each timestep:
_)]([
Rx
u
uRx
gu n _
][)(
Continuity delta-form (*):
Momentum delta form (**):
x
uR
nnn
11 )(
_
xg
x
uuuuR
nn
nnn
121 )()()(_
IMI Lecture, 26 Jan 2004www . tops-scidac . org
1D Shallow water preconditioning, cont. Solving (**) for and substituting into (*),
After this parabolic equation is solved for , we have
This completes the application of the preconditioner to one Newton-Krylov iteration at one timestep
Of course, the parabolic solve need not be done exactly; one sweep of multigrid can be used See paper by Mousseau et al. (2002) for impressive results for longtime weather integration
)( u
)_(_)][
( 22 uRx
Rxx
g n
uRx
gu n _][
)(
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Physics-based preconditioning update
So far, physics-based preconditioning has been applied to several codes at Los Alamos, in an effort led by D. Knoll
Summarized in new J. Comp. Phys. paper by Knoll & Keyes (Jan 2004)
PETSc’s “shell preconditioner” is designed for inserting physics-based preconditioners, and PETSc’s solvers underneath are building blocks
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Nonlinear Schwarz preconditioning Nonlinear Schwarz has Newton both inside and
outside and is fundamentally Jacobian-free It replaces with a new nonlinear system
possessing the same root, Define a correction to the partition (e.g.,
subdomain) of the solution vector by solving the following local nonlinear system:
where is nonzero only in the components of the partition
Then sum the corrections: to get an implicit function of u
0)( uF
0)( uthi
thi
)(ui
0))(( uuFR ii n
i u )(
)()( uu ii
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Nonlinear Schwarz – picture
1
1
1
1
0 0
u
F(u)
Ri
RiuRiF
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Nonlinear Schwarz – picture
1
1
1
1
0 0
1
1
1
1
0 0
u
F(u)
Ri
Rj
Riu
RjF
RiF
Rju
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Nonlinear Schwarz – picture
u
F(u)
Fi’(ui)
Ri
Rj
δiu+δju
1
1
1
1
0 0
1
1
1
1
0 0 RiuRiF
RjuRjF
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Nonlinear Schwarz, cont. It is simple to prove that if the Jacobian of F(u) is
nonsingular in a neighborhood of the desired root then and have the same unique root
To lead to a Jacobian-free Newton-Krylov algorithm we need to be able to evaluate for any : The residual The Jacobian-vector product
Remarkably, (Cai-Keyes, 2000) it can be shown that
where and All required actions are available in terms of !
0)( u
nvu ,)()( uu ii
0)( uF
vu ')(
JvRJRvu iiTii )()( 1'
)(' uFJ Tiii JRRJ
)(uF
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Experimental example of nonlinear Schwarz
Newton’s methodAdditive Schwarz Preconditioned Inexact Newton
(ASPIN)
Difficulty at critical Re
Stagnation beyond
critical Re
Convergence for all Re
IMI Lecture, 26 Jan 2004www . tops-scidac . org
The 2003 SCaLeS initiative
Workshop on a Science-based Case for Large-scale Simulation
Arlington, VA
24-25 June 2003
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Charge (April 2003, W. Polansky, DOE): “Identify rich and fruitful directions for the
computational sciences from the perspective of scientific and engineering applications”
Build a “strong science case for an ultra-scale computing capability for the Office of Science”
“Address major opportunities and challenges facing computational sciences in areas of strategic importance to the Office of Science”
“Report by July 30, 2003”
Chapter 1. Introduction
Chapter 2. Scientific Discovery through Advanced Computing: a Successful Pilot Program
Chapter 3. Anatomy of a Large-scale Simulation
Chapter 4. Opportunities at the Scientific Horizon
Chapter 5. Enabling Mathematics and Computer Science Tools
Chapter 6. Recommendations and Discussion
Volume 2 (due out early 2004):
11 chapters on applications
8 chapters on mathematical methods
8 chapters on computer science and infrastructure
First fruits!
IMI Lecture, 26 Jan 2004www . tops-scidac . org
“There will be opened a gateway and a road to a large and excellent science
into which minds more piercing than mine shall penetrate to recesses still deeper.”
Galileo (1564-1642)(on ‘experimental mathematical analysis of nature’
appropriated here for ‘simulation science’)
IMI Lecture, 26 Jan 2004www . tops-scidac . org
Related URLs TOPS project
http://www.tops-scidac.org SciDAC initiative
http://www.science.doe.gov/scidac
SCaLeS reporthttp://www.pnl.gov/scales