dft requirements for leadership-class computers
DESCRIPTION
http://unedf.org. DFT requirements for leadership-class computers. N. Schunck Department of Physics Astronomy, University of Tennessee, Knoxville, TN-37996, USA Physics Division, Oak Ridge National Laboratory, Oak Ridge, TN-37831, USA. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: DFT requirements for leadership-class computers](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815734550346895dc4d33c/html5/thumbnails/1.jpg)
DFT requirements for leadership-class computers
N. SchunckDepartment of Physics Astronomy, University of Tennessee, Knoxville, TN-37996, USA
Physics Division, Oak Ridge National Laboratory, Oak Ridge, TN-37831, USA
http://unedf.org
The 3rd LACM-EFES-JUSTIPEN Workshop JIHIR, Oak Ridge National Laboratory, February 23-25, 2009
A. Baran, J. Dobaczewski, J. McDonnell, J. Moré, W. Nazarewicz, N. Nikolov, H. H. Nam, J. Pei, J. Sarich, J. Sheikh, A. Staszczak, M. V. Stoitsov, S. Wild
![Page 2: DFT requirements for leadership-class computers](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815734550346895dc4d33c/html5/thumbnails/2.jpg)
Nuclear DFT: Why supercomputing?1
Why super-computers:
Large-scale problems (LACM): fission, shape coexistence, time-dependent problems
Systematic restoration of broken symmetries and correlations “made easy” (QRPA, GCM?, etc.)
Optimization of extended functionals on larger sets of experimental data
DFT: A global theory
Supercomputers: DFT at full power…
Ground-state of even nucleus can be computed in a matter of minutes on a standard laptop: why bother with supercomputing?
Principle: average out individual degrees of freedom Treatment of correlations ?
Current lack of quantitative predictions at the ~100 keV level
Extrapolability ?
“No limit” theory: from light nuclei to the physics of neutron stars
Rich physics
Fast and reliable
![Page 3: DFT requirements for leadership-class computers](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815734550346895dc4d33c/html5/thumbnails/3.jpg)
Classes of DFT Solvers2
1D 2D 3D
r-space1 mn, 1 core
(HFBRAD)5 hours,70 cores
(HFBAX)-
HO basis -2 mn, 1 core
(HFBTHO)5 hours, 1 core
(HFODD)
Computational package used and developed at ORNL and estimate of the resources needed for a standard HFB
calculation
Coordinate-space: direct integration of the HFB equations Accurate: provide « exact » result Slow and CPU/memory intensive for 2D-3D geometries
Configuration space: expansion of the solutions on a basis (usually HO) Fast and amenable to beyond mean-field extensions Truncation effects: source of divergences/renormalization issues Wrong asymptotic unless different bases are used (WS, PTG, Gamow, etc.)
Non-linear integro-differential fixed point problem
![Page 4: DFT requirements for leadership-class computers](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815734550346895dc4d33c/html5/thumbnails/4.jpg)
Recent physics achievements3
Even-even, odd-even and odd-odd mass tables
Nuclear fission
Systematics of odd-proton states in odd nucleiCf. Talks by M. Stoitsov, S. Wild and J.
Moré
Online resources:
http://massexplorer.org/
http://unedf.org/
![Page 5: DFT requirements for leadership-class computers](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815734550346895dc4d33c/html5/thumbnails/5.jpg)
Petascale and beyond4• Hardware constraints (see R. Lusk and J. Vary’s talks):
Many cores (100,000+) stacked into sockets - Currently 4 cores/socket, evolution toward 8 cores/socket and more
Small-memory per core (shared memory per socket) Short, crash-prone, expensive runtime
• Consequences on the architecture of DFT solvers: Optimize time of one HFB calculation: reduce number of iterations, use symmetries
smartly by improving/interfacing codes, parallelization, etc. Work on parallel wrapper: load balancing, checkpoints, error control mechanisms, etc.
![Page 6: DFT requirements for leadership-class computers](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815734550346895dc4d33c/html5/thumbnails/6.jpg)
Optimization - Interface HFBTHO/HFODD
• Restarting HFODD from HFB-THO means:– Tremendous gain in time of calculation
– Accrued numerical stability
– Taking advantage of existing mass tables
• Procedure:– Coordinate + phase transformation (both unitary)
– Modify HFODD to restart from HFB matrix elements instead of density fields on Gauss-Hermite mesh
5• Interface fulling
working for spherical HO bases (precision of restart at 10-4 - 10-6)
• Memory issue for deformed bases
HFB-THO: Axial
Cylindrical coordinates
Time-reversal symmetry
j-block diagonalization
HFODD: symmetry-unrestricted
Cartesian coordinates
Y-simplex eigenbasis
No time-reversal symmetry
Full diagonalization
![Page 7: DFT requirements for leadership-class computers](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815734550346895dc4d33c/html5/thumbnails/7.jpg)
6 Optimization – HFODD Profiling
Broyden routine: storage of NBroyden fields on 3D Gauss-Hermite mesh
Temporary array allocation for HFB matrix diagonalization
neutrons protons
Calculations by J. McDonnell
Safe limit memory/core on Jaguar/Franklin
![Page 8: DFT requirements for leadership-class computers](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815734550346895dc4d33c/html5/thumbnails/8.jpg)
7 Optimization – HFODD ParallelizationM
M
• Two levels of parallelism handled by simple MPI group structure– Nuclear configuration (Z, N, interaction, {Qλμ}, etc.)
– HFB solver
• Standard PBLAS and ScaLAPACK libraries for distributed linear algebra
• Natural splitting of the HFB matrix (OpenMP): perhaps not scalable enough
• Splitting:– HFB matrix into N blocks– Eigenfunctions conserve the same N-blocks splitting – Densities must be re-constructed piecewise
• Challenges– Identify self-contained set of all matrices required for one iteration– Handling of conserved symmetries: give different block
structure– Identify and replace all BLAS calls by PBLAS equivalents
M
M
![Page 9: DFT requirements for leadership-class computers](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815734550346895dc4d33c/html5/thumbnails/9.jpg)
Optimization - Finite-size spin instabilities8• Response of the nucleus to a
perturbation with finite momentum q studied in the RPA theory
• Channels: scalar-isoscalar, scalar-isovector, vector-isoscalar, vector-isovector, etc.
Modern Skyrme functionals are highly-instable with respect to finite-size spin perturbations !
Convergence of the HFB calculation of 100 blocked states in 157-165Ba
Region of instability
T. Lesinski et al, Phys. Rev. C 74, 044315 (2006)D. Davesne et al, arXiv:0906.1927 (2009)
Warning for next generation of functionals: stability must be assessed !
![Page 10: DFT requirements for leadership-class computers](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815734550346895dc4d33c/html5/thumbnails/10.jpg)
Work in progress - Fission9• Example of challenges for next generation DFT: microscopic description of nuclear
fission• Degrees of freedom at the HFB level: deformation, temperature• Potential energy surfaces depend critically on interaction/functional and pairing
correlations
• Computational tools– Augmented Lagrangian
Method – Broyden Method
• Precision tools– Large bases – Benchmarks
• Distributed computing tools– MPI wrapper – Load balancing – Efficient, independent,
constraint calculations
Static HFB pre-requisites
![Page 11: DFT requirements for leadership-class computers](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815734550346895dc4d33c/html5/thumbnails/11.jpg)
DFT Computing Infrastructure10
Interfacing codes
Parallelize solver
Load balancing
![Page 12: DFT requirements for leadership-class computers](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815734550346895dc4d33c/html5/thumbnails/12.jpg)
11 Deliverables Year 2-3
• Have a DFT package combining HFB-THO and HFODD available for large-scale calculations
• Optimize full diagonalization of “large” (4,000 4,000) matrices in HFODD
– Take advantage of N-core architecture
– Increase speed for large bases (fission, heavy nuclei)
– Overcome current memory limitations
• Optimize Broyden method (Cf. Jorge’s talk) to improve stability/convergence
• Papers on odd nuclei:
1.Methodology and Theoretical Models
2.Systematic and comparison with experiment
Workplan Year 2-3 Current Status
Done (for spherical bases) - large-scale calculations up to 14,112 cores (2 hours)
Well on target– Parallelization of the HFODD core (PBLAS,
ScaLAPACK)
– Will solve issues related to speed, memory and precision
– Change of iteration cycle: updating HFB matrix elements instead of fields
Done - Numerical instabilities of large-scale calculations can be tracked down to physical instabilities built-in current functionals (see Mario’s talk)
Delayed by problem of instabilities– Paper 1 ready to be published– Paper 2 in preparation– Additional Paper 3 on finite-size spin instabilities in preparation
![Page 13: DFT requirements for leadership-class computers](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815734550346895dc4d33c/html5/thumbnails/13.jpg)
Work Plan (Year 4)12
• Physics
– Optimization of DME-based functionals: genetic algorithm + Argonne optimizer (cf Mario’s talk)
– Applications of DME functionals: UNEDF-1
• Computing
– Implement DME functionals in HFODD (study of time-odd channels)
– Complete version 1.0 of parallel HFODD core Demonstrate efficiency and scalability of the code First applications: N-dimensional potential energy surface, fission pathways
– Improve parallel interface to HFODD: Optimistic: it should be a good application of ADLB (“moderately long to long” work
units of 1-2 hours, little communication).
Realistic: remove the master and have him work like a slave (French revolution spirit)
– Replace sequential I/O by parallel I/O for HFODD records (used as checkpoints)
Remaining of the year• New version of HFODD: HFBTHO interface, shell correction, finite-temperature,
Augmented Lagrangian Method, matrix elements mixing, parallel interface, etc.• 2 papers on odd nuclei and 1 on spin instabilities in preparation
![Page 14: DFT requirements for leadership-class computers](https://reader035.vdocuments.site/reader035/viewer/2022062315/56815734550346895dc4d33c/html5/thumbnails/14.jpg)
December 10, 2008 Slide 14
Nuclear Structure and Nuclear Interactions
Forefront Questions in Nuclear Science and the Role of High Performance Computing January 26-28, 2009 · Washington, D.C.
Microscopic Description of Nuclear Fission
Scientific and computational challenges
• Describe dynamics with novel energy functionals and ab initio methods
1) adiabatic approach 2) non-adiabatic/early stochastic3) full time-dependent dynamics
• Develop ultra-scale techniques for the description of fission
• Build a spectroscopic precision nuclear energy density functional
• Perform constrained minimization on a multi-dimensional potential energy surface
• Find full spectrum of dense millions-sized matrices
• Predict half-lives, mass and kinetic energy distribution of fission fragments and fission cross-sections
• Analyze the fission process through the visualization of time evolution
• Develop scalable application software for time-dependent many-body dynamics
• Societal Impact Nuclear Energy programs Threat reduction NNSA Stockpile Stewardship Program
• Time-dependent many-body dynamics Low-energy heavy-ion collisions and
nucleon- and photon-induced reactions Neutron star quakes Vortex dynamics in quantum super-fluids
Summary of research direction
Expected Scientific and Computational Outcomes Potential impact on Nuclear Science
Our Holy Grail…