sidney fernbach director of computing at lawrence ...kd2112/fernbach_2007.pdf · fernbach awardees...
TRANSCRIPT
Sidney FernbachDirector of Computing at
Lawrence Livermore National Laboratory1952-1982
Fernbach Awardees
1993 David Bailey1994 Charles Peskin1995 Paul Woodward1996 Gary Glatzmaier1997 Charbel Farhat1998 Phillip Collela1999 Michael Norman
2000 Stephen Attaway2002 Robert Harrison2003 Jack Dongarra2004 Marsha Berger2005 John Bell2006 Ed Seidel2007
David Keyes, William Gropp, Mitchell Smooke, Tony Chan, Xiao-Chuan Cai, Barry Smith, David Young, Dana Knoll, M. Driss Tidriri, V. Venkatakrishnan, Dimitri Mavriplis, C. Timothy Kelley, Omar Ghattas, Lois C. McInnes, Dinesh Kaushik, John Shadid,, Kyle Anderson, Carol Woodward, Florin Dobrian, Daniel Reynolds, Yuan He
David KeyesApplied Physics & Applied Mathematics
Columbia University&
Towards Optimal Petascale Simulations (TOPS) CenterU.S. DOE SciDAC Program
T ÇÉÇÄ|ÇxtÜÄç |ÅÑÄ|v|à ÅtÇ|yxáàÉ
2007 Sidney Fernbach Lecturewww.columbia.edu/~kd2112/Fernbach_2007.pdf
*
* a public declaration of principles and intentions
got implicitness?
Going implicit?Why you would, if you could
multiscale problems with good scale separationcoupled problems (“multiphysics”)problems with uncertain or controllable inputs (optimization: design, control, inversion, assimilation)
You can, so you shouldoptimal and scalable algorithms knownfreely available software availablereasonable learning curve that harvests legacy code
got Jacobians?
circa 2002
Current focus on Jacobian-free implicit methods
Two stories to track in supercomputing
raise the peak capabilitylower the entry threshold
higher capabilityfor hero users
best practicesfor all users
New York Blue at BNL (#10 on the Top 500)
first frontier
“new” frontier
Jacobian a steep price, in terms of coding
very valuable to have, but not necessaryapproximations thereto often sufficientmeanwhile, automatic differentiation tools lowering the threshold
Recent “E3” report highlights weaknesses of explicit methods
“The dominant computational solution strategy over the past 30 years has been the use of first-order-accurate operator-splitting, semi-implicit and explicit time integration methods, and decoupled nonlinear solution strategies. Such methods have not provided the stability properties needed to perform accurate simulations over the dynamical time-scales of interest. Moreover, in most cases, numerical errors and means for controlling such errors are understood heuristically at best.”
2007
Recent E3 report highlights opportunities for implicit methods
“Research in linear and nonlinear solvers remains a critical focus area because the solvers provide the foundation for more advanced solution methods. In fact, as modeling becomes more sophisticated to increasingly include optimization, uncertainty quantification, perturbationanalysis, and more, the speed androbustness of the linear and nonlinear solvers will directly determine the scope of feasible problems to be solved.”
2007
20022003
2003-2004 (2 vol )2004
20062006
2007
Fusion Simulation Project
June 2007
2007
Mathematical Challenges for the
Department of Energy
December 2007
2007
among many other developments
Some reports predicated on scalable implicit
solvers …
Plan of presentationMotivations for implicit solvers
one-dimensional model problems, linear and nonlinearportfolio of real world problems
State-of-the-art for large-scale nonlinearly implicit solvers
brief look at algorithms and softwareintuition about how they scale
Illustrative stories from the trenchesan undergraduate semester project “gone Broadway”community code simulations supporting international magneticallyconfined fusion energy program
Invitation and acknowledgments
“Explicit” versus “implicit”
nj
nj
nj
nj
nj UUUUU =+−− +
−++
++ )2( 1
111
11 νImplicit methods solve a
function of state data at the current time, to update all components simultaneously
equivalent to inverting a matrix, in linear problems
2
2
xu
tu
∂∂
=∂∂
)2( 111 n
jnj
nj
nj
nj UUUUU −++ +−+= ν
2)( xt
ΔΔ
≡ν
Explicit methods evaluate a function of state data at prior time, to update each component of the current state independently
equivalent to matrix-vector multiplication, in linear problems
Explicit methods can be unstable –linear example
nj
nj
nj
nj
nj UUUUU =+−− +
−++
++ )2( 1
111
11 ν
)2( 111 n
jnj
nj
nj
nj UUUUU −++ +−+= ν
Stable for all ν
Unstable for ν>1/2
c/o K. Morton & D. Mayers, 2005
initialdata
after 1step
after 25steps
after 50steps
Δt = 0.0012 Δt = 0.0013
Explicit methods can be unphysically oscillatory –nonlinear example
⎩⎨⎧
>−+≤
≡≡⎟⎠⎞
⎜⎝⎛
∂∂
∂∂
=∂∂
critcrit
critxx ssssk
sssux
xuux
xtu
||,)|(|||,
)(;)(;),( 2/10
0
κκ
κκαα
nj
nj
nj
nj
nj
nj
nj
nj
UUU
UUU
=−−
−−+−
+−
++++
+
)](
)([11
12/1
1112/1
1
α
αν
Linearly implicit, nonlinearly explicit:
nj
nj
nj
nj
nj
nj
nj
nj
UUU
UUU
=−−
−−+−
++−
+++
++
+
)](
)([11
112/1
111
12/1
1
α
αν
Linearly and nonlinearly implicit:
history at station 10
history at station 10
Oscillatory
Non-oscillatory
However –implicit methods can be unruly and expensive
O(N c), e.g., c=7/3O(N)Complexity
O(N w), e.g., w=5/3O(N)Workspace
global, in principlenearest neighbor*Communication
many times per steponce per stepSynchronization
limitedO(N)Concurrency
data-dependentpredictablePerformance
uncertainrobust when stableReliability
Naïve ImplicitExplicit
* plus the estimation of the stable step size
Many simulation opportunities are multiscaleMultiple spatial scales
interfaces, fronts, layersthin relative to domain size, δ << L
Multiple temporal scalesfast wavessmall transit times relative to convection or diffusion, τ << T
Analyst must isolate dynamics of interest and model the rest in a system that can be discretized over more modest range of scalesOften involves filtering of high frequency modes, quasi-equilibrium assumptions, etc.May lead to infinitely “stiff” subsystem requiring implicit treatment
Richtmeyer-Meshkov instability, c/o A. Mirin, LLNL
CS
Math
Applications
Common technologies respond
Many applications
drive
e.g., DOE’s SciDAC* portfolio is multiscale
* Scientific Discovery through Advanced Computing
Examples of scale-separated features of multiscale problems
Gravity surface waves in global climateAlfvén waves in tokamaksAcoustic waves in aerodynamicsFast transients in detailed kinetics chemical reactionBond vibrations in protein folding (?)
Explicit methods are restricted to marching out the long-scale dynamics on short scales. Implicit methods can “step over” or “filter out” with equilibrium assumptions the dynamically irrelevant short scales,ignoring stability bounds. (Accuracy bounds must still be satisfied; for long time steps, one can use high-order temporal integration schemes!)
IBM’s BlueGene/P: 72K quad-core procs w/ 2 FMADD @ 850 MHz = 1.008 Pflop/s
13.6 GF/s8 MB EDRAM
4 processors
1 chip
13.6 GF/s2 GB DDRAM
32 compute cards
435 GF/s64 GB
32 node cards
72 racks
1 PF/s144 TB
Rack
System
Node Card
Compute Card
Chip
14 TF/s2 TB
Thread concurrency: 288K (or 294,912) processors
On the floor at Argonne National Laboratory by early 2009
What’s “big iron” for, if not multiscale?
“You go to [petascale] with the [computers] you have, not with the [computers] you want.”paraphrase of a recent U.S. Secretary of Defense
Review: two definitions of scalability“Strong scaling”
execution time decreases in inverse proportion to the number of processorsfixed size problem overalloften instead graphed as reciprocal, “speedup”
“Weak scaling”execution time remains constant, as problem size and processor number are increased in proportionfixed size problem per processoralso known as “Gustafson scaling”
T
p
good
poor
poor
N ∝ p
log T
log pgood
N constant
Slope= -1
Slope= 0
Explicit methods do not weak scale!Illustrate for CFL-limited explicit time stepping*Parallel wall clock time
dd PST //1 αα+∝
d-dimensional domain, length scale Ld+1-dimensional space-time, time scale Th computational mesh cell sizeτ computational time step size τ=O(hα) stability bound on time stepn=L/h number of mesh cells in each dimN=nd number of mesh cells overallM=T/τ number of time steps overallO(N) total work to perform one time stepO(MN) total work to solve problemP number of processorsS storage per processorPS total storage on all processors (=N)O(MN/P) parallel wall clock time∝ (T/τ)(PS)/P ∝ T S1+α/d Pα/d
(since τ ∝ hα ∝ 1/nα = 1/Nα/d = 1/(PS)α/d )
3 months10 days1 dayExe. time
105×105×105104×104×104103× 103×103Domain
Example: explicit wave problem in 3D (α=1, d=3)
27 years3 months1 dayExe. time
105× 105104× 104103× 103Domain
Example: explicit diffusion problem in 2D (α=2, d=2)
*assuming dynamics needs to be followed only on coarse scales
“blackboard”
Interfacial couplingOcean-atmosphere coupling in climateCore-edge coupling in tokamaksFluid-structure vibrations in aerodynamicsBoundary layer-bulk phenomena in fluidsSurface-bulk phenomena in solids
Bulk-bulk couplingRadiation-hydrodynamicsMagneto-hydrodynamics
Many simulation opportunities are multiphysics
SST Anomalies, c/o A. Czaja, MIT
Coupled systems may admit destabilizing modes not present in either system alone
Climate predictionSubsurface contaminant transport or petroleum recovery, and seismologyMedical imagingStellar dynamics, e.g., supernovaeNondestructive evaluation of structures
Uncertainty can be inconstitutive lawsinitial conditionsboundary conditions
Many simulation opportunities face uncertainty
Subsurface property estimation, c/o Roxar
Sensitivity, optimization, parameter estimation, boundary control require the ability to apply the inverse action of the Jacobian – available in all Newton-like implicit methods
Forward vs. inverse problems
model
forcingBCs
params
ICs
forward problem
solution
inverse problem
model
forcingBCs
solution
ICs
params
+ regularization
Applications requiring scalable solvers –conventional and progressive
Magnetically confined fusion Poisson problemsnonlinear coupling of multiple physics codes
Accelerator designMaxwell eigenproblemsshape optimization subject to PDE constraints
Porous media flowdiv-grad Darcy problemsparameter estimation
The TOPS “Center for Enabling Technology”spans 4 labs & 5 universities
Towards Optimal Petascale Simulations
Mission: enable scientists and engineers to take full advantage of petascale hardware by overcoming the scalability bottlenecks of traditional solvers, and assist users to move beyond “one-off”simulations, to validation and optimization (1 of 3 such centers)
Columbia University University of Colorado University of TexasUniversity of California
at San Diego
Lawrence Livermore National Laboratory
Sandia National Laboratories
TOPS institutions
UCB/LBNLANL
UT
TOPS lab (4)
CU
LLNL
TOPS university (5)
UCSD
CU-B
Towards Optimal Petascale Simulations
SNL
TOPS is building a toolchain of proven solver components that interoperate
We aim to carry users from “one-off” solutionsto the full scientific agenda of sensitivity, stability, and optimization (from heroic point studies to systematic parametric studies) all in one software suiteTOPS solvers are nested, from applications-hardened linear solvers outward, leveraging common distributed data structures Communication and performance-oriented details are hidden so users deal with mathematical objects throughoutTOPS features these trusted packages, whose functional dependences are illustrated (right):Hypre, PETSc, ScaLAPACK, SUNDIALS, SuperLU, TAO, Trilinos
Optimizer
Linear solver
Eigensolver
Time integrator
Nonlinear solver
Indicates dependence
Sens. Analyzer
These are in use and actively debugged in dozens of high-performance computing environments, in dozens of applications domains, by thousands of user groups around the world.
Adams Cai Demmel Falgout Ghattas Heroux
Hu Husbands Kaushik Knepley Li Manteuffel
Reynolds Serban Smith Woodward C. Yang U. Yang
McCormick McInnes Moré Munson Ng Norris
TOPS software is brought to you by … many of the finest computational scientists alive !
It’s all about algorithms (at the petascale)Given, for example:
a “physics” phase that scales as O(N)a “solver” phase that scales as O(N3/2)computation is almost all solver after several doublings
Most applications groups have not yet “felt” this curve in their gut
as users actually get into queues with more 4K processors, this will change
0
0.2
0.4
0.6
0.8
1
1.2
1 4 16 64 256 1024
SolverPhysics
Solver takes 50% time on 128 procs
Solver takes 97% time on 128K procs
Weak scaling limit, assuming efficiency of 100% in both physics and solver phases
problem size
Legacy solvers will not port to the petascale
Data Layoutstructured composite block-struc unstruc CSR
Linear SolversGMG, ... FAC, ... Hybrid, ... AMGe, ... ILU, ...
Linear System Interfaces
Nonscalable solution!
Reminder: solvers evolve underneath “Ax = b”Advances in algorithmic efficiency rival advances in hardware architectureConsider Poisson’s equation on a cube of size N=n3
If n=64, this implies an overall reduction in flops of ~ 16 million
n3n3BrandtFull MG1984
n3.5 log nn3ReidCG-MILU1971
n4 log nn3YoungOptimal SOR1950
n7n5Von Neumann & Goldstine
GE (banded)1947
FlopsStorage ReferenceMethodYear
∇2u=f 64
64 64
*Six months is reduced to 1 second
*
year
relative speedup
Algorithms and Moore’s LawThis advance took place over a span of about 36 years, or 24 doubling times for Moore’s Law224 ≈ 16 million ⇒ the same as the factor from algorithms alone!
16 million speedup
from each
Algorithmic and architectural
advances work together!
Implicit methods can scale!2004 Gordon Bell “special” prize
Cortical bone
Trabecular bone
2004 Bell Prize in “special category” went to an implicit, unstructured grid bone mechanics simulation
0.5 Tflop/s sustained on 4 thousand procs of ASCI Whitelarge-deformation analysisin production in bone mechanics lab
c/o M. Adams, Columbia
Transonic “Lambda” Shock, Mach contours on surfaces
1999 Bell Prize in “special category” went to implicit, unstructured grid aerodynamics problems
0.23 Tflop/s sustained on 3 thousand processors of Intel’s ASCI Red11 million degrees of freedomincompressible and compressible Euler flowemployed in NASA analysis/design missions
to s
Implicit methods can scale!1999 Gordon Bell “special” prize
SPMD parallelism w/domain decompositionputs off limitation of Amdahl in weak scaling
Partitioning of the grid induces block structure on the system matrix (Jacobian)
Computation scales with area; communication scales with perimeter; ratio fixed in weak scaling
Ω1
Ω2
Ω3
A23A21 A22rows assigned
to proc “2”
DD relevant to any local stencil formulation
finite differences finite elements finite volumes
• lead to sparse Jacobian matrices J=
node i
row i• however, the inverses are generally dense; even the factors suffer unacceptable fill-in in 3D• want to solve in subdomains only, and use to precondition full sparse problem
No “scalable” without “optimal”“Optimal” for a theoretical numerical analyst means a method whose floating point complexity grows at most linearly in the data of the problem, N, or (more practically and almost as good) linearly times a polylog termFor iterative methods, this means that both the cost per iteration and the number of iterations must be O(N logp N)Cost per iteration must include communication cost as processor count increases in weak scaling, P ∝ N
BlueGene, for instance, permits this with its log-diameter hardware global reduction
Number of iterations comes from condition number for linear iterative methods; Newton’s superlinear convergence is important for nonlinear iterations
Why optimal algorithms?
The more powerful the computer, the greater the importance of optimalityExample:
Suppose Alg1 solves a problem in time C N2, where N is the input sizeSuppose Alg2 solves the same problem in time C N log2 NSuppose Alg1 and Alg2 parallelize perfectly on a machine of 1,000,000 processors
In constant time (compared to serial), Alg1 can run a problem 1,000 X larger, whereas Alg2 can run a problem nearly 65,000 X larger
Components of scalable solvers for PDEsSubspace solvers
elementary smoothersincomplete factorizations full direct factorizations
Global linear preconditionersSchwarz and Schur methodsmultigrid
Linear acceleratorsKrylov methods
Nonlinear rootfindersNewton-like methods
alone unscalable: either too many iterations or too much fill-in
opt. combins. of subspace solvers
mat-vec algs.
vec-vec algs. + linear solves
Newton-Krylov-Schwarz: a PDE applications “workhorse”
Newtonnonlinear solver
asymptotically quadratic
0)(')()( =+≈ uuFuFuF cc δuuu c δλ+=
Krylovaccelerator
spectrally adaptive
FuJ −=δ}{minarg
},,,{ 2FJxu
FJJFFVx+=
≡∈ L
δ
Schwarzpreconditionerparallelizable
FMuJM 11 −− −=δ
iTii
Tii RJRRRM 11 )( −− ∑=
“Secret sauce” #1: iterative correction w/ each step O(N)
The most basic idea in iterative methods for Ax = b
Evaluate residual accurately, but solve approximately, where is an approximate inverse to AA sequence of complementary solves can be used, e.g., with first and then one has
)(1 AxbBxx −+← −
)]([ 11
12
12
11 AxbABBBBxx −−++← −−−−
2B1B
1−B
RRARRB TT 112 )( −− =
)( 1AB−Optimal polynomials of lead to various preconditioned Krylov methodsScale recurrence, e.g., with , leads to multilevel methods
smoother
Finest Grid
First Coarse Gridcoarser grid has fewer cells
(less work & storage)
Restrictiontransfer from fine to coarse grid
Recursively apply this idea until we have an easy problem to solve
A Multigrid V-cycle
Prolongationtransfer from coarse to fine grid
“Secret sauce” #2:treat each error component in optimal subspace
c/o R. Falgout, LLNL
“Secret sauce” #3:skip the Jacobian
In the Jacobian-Free Newton-Krylov (JFNK) method for F(u) = 0 , a Krylov method solves the linear Newton correction equation, requiring Jacobian-vector productsThese are approximated by the Fréchet derivatives
(where is chosen with a fine balance between approximation and floating point rounding error) or automatic differentiation, so that the actual Jacobian elements are never explicitly needed
One builds the Krylov space on a true F′(u) (to within numerical approximation)
)]()([1)( uFvuFvuJ −+≈ εε
ε Carl Jacobi
Secret sauce #4:use the user’s solver to precondition
Almost any code to solve F(u) = 0 computes a residual and invokes some process to compute an update to u based on the residualDefines a weakly converging nonlinearly method
M is, in effect, a preconditioner and can be applied directly within a Jacobian-free Newton context This is the “physics-based preconditioning”strategy discussed in the E3 report
uuFM k δa)(:uuu kk δ+←+1
Example: fast spin-up of ocean circulation model using Jacobian-free Newton-Krylov
State vector, u(t)Propagation operator (this is any code) Φ (u,t): u(t) = Φ (u(0),t)
here, single-layer quasi-geostrophic ocean forced by surface Ekman pumping, damped with biharmonic hyperviscosity
Task: find state u that repeats every period T (assumed known)Difficulty: direct integration (DI) to find steady state may require thousands of years of physical timeInnovation: pose as Jacobian-free NK rootfinding problem, F(u) = 0, where F(u) ≡ u - Φ (u(0),T)
Jacobian is dense, would never think of forming!
converged streamfunction
difference between DI and NK (10-14)
Example: fast spin-up of ocean circulation model using Jacobian-free Newton-Krylov
2-3 orders of magnitude speedup of Jacobian-free NK relative to Direct Integration (DI)
OGCM: Helfrich-Holland integrator
Implemented in PETSc as undergraduate research project
c/o T. Merlis, Caltech, Dept. Environmental Science & Engineering
SciDAC’s Fusion Simulation Project: support of the international fusion program
+
J. Fusion Energy 20: 135-196 (2001)
updated in June 2007, Kritz & Keyes, eds.
ITER: $12B “the way (L)”
Fusion by 2017; criticality by 2022
“Big Iron” meets “Big Copper”
0)( =Φ u
Scaling fusion simulations up to ITER
c/o S. Jardin, PPPL
1012 needed (explicit uniform
baseline)
International ThermonuclearExperimentalReactor
2017 – first experiments, in Cadaraches, France
Small tokamak
Large tokamak
Huge tokamak
1.5 orders: increased processor speed and efficiency1.5 orders: increased concurrency1 order: higher-order discretizations
Same accuracy can be achieved with many fewer elements
1 order: flux-surface following griddingLess resolution required along than across field lines
4 orders: adaptive griddingZones requiring refinement are <1% of ITER volume and resolution requirements away from them are ~102 less severe
3 orders: implicit solversMode growth time 9 orders longer than Alfven-limited CFL
Where to find 12 orders of magnitude in 10 years?H
ardw
are:
3So
ftwar
e: 9
Algorithmic improvements bring
yottascale (1024) calculation down to
petascale (1015)!
Increased processor speed10 years is 6.5 Moore doubling times
Increased concurrencyBG/L is already 217 procs, MHD now routinely at ca. 212
Higher-order discretizations low-order preconditioning of high-order discretizations
Flux-surface following griddingin SciDAC, this is ITAPS; evolve mesh to approximately follow flux surfaces
Adaptive griddingin SciDAC, this is APDEC; Cartesian AMR
Implicit solversin SciDAC, this is TOPS; Newton-Krylov w/multigrid preconditioning
Comments on ITER simulation roadmap
Illustrations from computational MHD
M3D code (Princeton)multigrid replaces block Jacobi/ASM preconditioner for optimalitynew algorithm callable across Ax=b interface
NIMROD code (General Atomics)direct elimination replaces PCG solver for robustnessscalable implementation of old algorithm for Ax=b
The fusion community may use more cycles on unclassified U.S. DOE computers than any other (e.g., 32% of all cycles at NERSC in 2003). Well over 90% of these cycles are spent solving linear systems in M3D and NIMROD, which are prime U.S. code contributions to the designing of ITER.
NIMROD: direct elim. for robustnessNIMROD code
high-order finite elements in each poloidal crossplanecomplex, nonsymmetric linear systems with 10K-100K unknowns in 2D (>90% exe. time)
TOPS collaborationreplacement of diagonally scaled Krylov with SuperLU, a supernodal parallel sparse direct solver2D tests run 100× faster; 3D production runs are ~5× faster
c/o D. Schnack, et al.
M3D: multigrid for optimalityM3D code
unstructured mesh, hybrid FE/FD discretization with C0 elements in each poloidal crossplanereal linear systems (>90% exe. time)
TOPS collaborationreplacement of additive Schwarz (ASM) preconditioner with algebraic multigrid (AMG) from Hypreachieved mesh-independent convergence rate ~5× improvement in execution time
0
100
200
300
400
500
600
700
3 12 27 48 75
ASM-GMRESAMG-FMGRES
c/o S. Jardin, et al.
Algebraic multigrid a key algorithmic technologyDiscrete operator defined for finest grid by the application, itself, and for many recursively derived levels with successively fewer degrees of freedom, for solver purposesUnlike geometric multigrid, AMG not restricted to problems with “natural”coarsenings derived from grid alone
Optimality (cost per cycle) intimately tied to the ability to coarsen aggressivelyConvergence scalability (number of cycles) and parallel efficiency also sensitive to rate of coarsening
c/o U. M. Yang, LLNL
Solvers are scaling:algebraic multigrid (AMG) on BG/L (hypre)
Figure shows weak scaling result for AMG out to 120K processors, with one 25×25×25block per processor (up to 1.875B dofs) procs
While much research and development remains, multigrid will clearly be practical at BG/P-scale concurrency
fu =Δ−
0
5
10
15
20
0 50000 100000
2B dofs
15.6K dofs
sec
Resistive MHD prototype implicit solverMagnetic reconnection: the breaking and reconnecting of oppositely directed magnetic field lines in a plasma, replacing hot plasma core with cool plasma, halting the fusion process
Replace explicit updates with implicit Newton-Krylov from SUNDIALS with factor of ~5× in execution time
Current (J = r £ B)
J. Brin et al., “Geospace Environmental Modeling (GEM) magnetic reconnection challenge,” J. Geophys. Res. 106 (2001) 3715-3719.
c/o D. Reynolds, UCSD
Resistive MHD: implicit solver, ex #2
Magnetic reconnection: previous example was compressible – primitive variable; this example is incompressible –streamfunction/vorticity
Replace explicit updates with implicit Newton-Krylov from PETSc with factor of ~5× in speedup
c/o F. Dobrian, ODU
“What changed were simulations that showed that the new ITER design will, in fact, be capable of achieving and
sustaining burning plasma.”
– Ray Orbach,Undersecretary of Energy
& CETS
The U.S. role in multi-billion-dollar international projects will increasingly depend upon large-scale simulation, as exemplified
by the 2003 Congressional testimony of Ray Orbach, above.
Engage at a higher-level than Ax=bNewton-Krylov-Schwarz/MG on coupled nonlinear system
Sensitivity analysesvalidation studies
Stability analyses“routine” outer loop on steady-state solutions
Optimizationparameter identificationdesign of facilities (accelerators, tokamaks, power plants, etc.)control of experiments
TOPS’ wishlist for MHD collaborations —“Asymptopia”
Hardware Infrastructure
ARCHITECTURES
Applications
A “perfect storm” for scientific simulation
scientific models
numerical algorithms
computer architecture
scientific software engineering
(dates are symbolic)
1686
1947
1976
1992
TOPS dreams that users will…
Understand range of algorithmic options w/tradeoffse.g., memory vs. time, comp. vs. comm., inner iteration
work vs. outer
Try all reasonable options “easily”without recoding or extensive recompilation
Know how their solvers are performingwith access to detailed profiling information
Intelligently drive solver researche.g., publish joint papers with algorithm researchers
Simulate truly new physics free from solver limitse.g., finer meshes, complex coupling, full nonlinearity
User’s Rights
People who share the 2007 Fernbach
1982 William Gropp, UIUC1984 Mitchell Smooke, Yale1984 Tony Chan, UCLA1989 Xiao-Chuan Cai, CU-Boulder1990 Barry Smith, Argonne1991 David Young, Boeing1992 Dana Knoll, Idaho Nat Lab 1992 M. Driss Tidriri, Iowa State 1993 V. Venkatakrishnan, Boeing1993 Dimitri Mavriplis, UWyoming
1995 C. Timothy Kelley, NCSU1995 Omar Ghattas, UT-Austin 1995 Lois C. McInnes, Argonne1996 Dinesh Kaushik, Argonne1997 John Shadid, Sandia1997 Kyle Anderson, UT-C1997 Carol Woodward, LLNL 2001 Florin Dobrian, ODU2002 Daniel Reynolds, UCSD2006 Yuan He, Columbia
(with the year in which we began substantive collaborations)
Primary support through the years
DOE/SC SciDAC ISIC/CET project (2001-present)DOE/NNSA ASCI Level-2 center (1998-2001)NSF Multidisciplinary Computational Challenges center (1995-1998)NASA Computational Aerosciences project (1995-1998)NSF Presidential Young Investigator (1989-1994)
Sidney FernbachDirector of Computing at
Lawrence Livermore National Laboratory1952-1982
EOF