unedf 2011 annual/final meeting
DESCRIPTION
UNEDF 2011 ANNUAL/FINAL MEETING. Progress report on the BIGSTICK configuration-interaction code Calvin Johnson 1 Erich Ormand 2 Plamen Krastev 1,2,3 1 San Diego State University, 2 Lawrence Livermore Lab, 3 Harvard University Supported by DOE Grants DE-FG02-96ER40985,DE-FC02-09ER41587, - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/1.jpg)
UNEDF 2011 ANNUAL/FINAL MEETING
Progress report on the BIGSTICKconfiguration-interaction code
Calvin Johnson1
Erich Ormand2
Plamen Krastev1,2,3
1San Diego State University, 2Lawrence Livermore Lab, 3Harvard University
Supported by DOE Grants DE-FG02-96ER40985,DE-FC02-09ER41587, and DE-AC52-07NA27344
![Page 2: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/2.jpg)
UNEDF 2011 ANNUAL/FINAL MEETING
We have good news and bad news...
We have good news and bad news...
....both the same thing........both the same thing....
....the postdoc (Plamen Krastev) got a permanent staff position in scientific computing at Harvard.
....the postdoc (Plamen Krastev) got a permanent staff position in scientific computing at Harvard.
![Page 3: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/3.jpg)
BIGSTICK:
General purpose M-scheme configuration interaction (CI) code
On-the-fly calculation of the many-body Hamiltonian
Fortran 90, MPI and OpenMP
35,000+ lines in 30+ files and 200+ subroutines
Faster set-up
Faster Hamiltonian application
Rewritten for “easy” parallelization
New parallelization scheme
REDSTICK BIGSTICK
2
![Page 4: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/4.jpg)
BIGSTICK:
Flexible truncation scheme: handles ‘no core’ ab initio Nhw truncation, valence-shell (sd & pf shell) orbital truncation; np-nh truncations; and more.
Applied to ab initio calculations, valence shell calculations (in particular level densities, random interaction studies, and benchmarking projected HF), cold atoms, and electronic structure of atoms (benchmarking RPA and HF for atoms).
REDSTICK BIGSTICK
2
Version 6.5 is available at NERSC: unedf/lcci/BIGSTICK/v650/
![Page 5: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/5.jpg)
BIGSTICK uses factorization algorithm reduces storage of Hamiltonian arrays
5
Nuclide Space Basis dim matrix store factorization
56Fe pf 501 M 290 Gb 0.72 Gb7Li Nmax=12 252 M 3600 Gb 96 Gb7Li Nmax=14 1200 M 23 Tb 624 Gb
12C Nmax=6 32M 196 Gb 3.3 Gb12C Nmax=8 590M 5000 Gb 65 Gb12C Nmax=10 7800M 111 Tb 1.4 Tb16O Nmax=6 26 M 142 Gb 3.0 Gb16O Nmax=8 990 M 9700 Gb 130 Gb
Comparison of nonzero matrix storage with factorization
TRIUMF – Feb 2011
UNEDF 2011 ANNUAL/FINAL MEETING
![Page 6: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/6.jpg)
BIGSTICK:
2
Micah Schuster, Physics MS project
![Page 7: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/7.jpg)
BIGSTICK:
2
Joshua Staker, Physics MS project
![Page 8: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/8.jpg)
BIGSTICK:
2
![Page 9: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/9.jpg)
BIGSTICK:
2
![Page 10: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/10.jpg)
3
![Page 11: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/11.jpg)
BIGSTICK
3
![Page 12: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/12.jpg)
UNEDF 2011 ANNUAL/FINAL MEETING
Major accomplishment as of last year:excellent scaling of mat-vec multiply
This demonstrates our factorization algorithm, as predicted, facilitates
efficient distribution of mat-vec ops
This demonstrates our factorization algorithm, as predicted, facilitates
efficient distribution of mat-vec ops
![Page 13: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/13.jpg)
Major accomplishments after last UNEDF meeting:
Rebalanced workload with additional constraint for dimension of local Lanczos vectors (Krastev)
Fully distributed Lanczos vectors with hermiticity on (Krastev)
Major steps towards distributing Lanczos vectors with suppressed hermiticity (Krastev)
OpenMP implementations in matrix-vector multiply (Ormand & Johnson)
Significant progress in 3-body implementation (Johnson & Ormand)
Added restart option (Johnson)
Implemented in-lined 1-body density matrices (Johnson)
6
![Page 14: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/14.jpg)
UNEDF 2011 ANNUAL/FINAL MEETING
Highlighting accomplishments for 2010-2011:
Add OpenMP
Reduce memory load/ node -- Lanczos vectors-- matrix information (matrix elements/jumps)
Speed up reorthogonalization-- I/O is bottleneck
![Page 15: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/15.jpg)
UNEDF 2011 ANNUAL/FINAL MEETING
Highlighting accomplishments for 2010-2011:
Add OpenMP
-- Crude 1st generation by Johnson (about 70-80% efficiency)
-- 2nd generation by Ormand (nearly 100% efficiency)
Hybrid OpenMP+MPI implemented, full testing delayed due to reorthogonalization issues
![Page 16: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/16.jpg)
UNEDF 2011 ANNUAL/FINAL MEETING
Highlighting accomplishments for 2010-2011:
Add OpenMP
Reduce memory load/ node -- Lanczos vectors-- matrix information (matrix elements/jumps)
We break up the Lanczos vectors so only part on each node
Future: separate forward/backward multiplication
![Page 17: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/17.jpg)
4pzJ 4nzJ
3pzJ 3nzJ
Vin
1
2
3
4
Vout
1
2
3
4
1 1
2 2
Proton sector Neutron sector
Lanczos vectors distribution:
22
![Page 18: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/18.jpg)
4pzJ 4nzJ
3pzJ 3nzJ
Vin
1
2
3
4
Vout
1
2
3
4
1 1
2 2
Proton sector Neutron sector
Lanczos vectors distribution:
Hermiticity on
Forward and …
22
![Page 19: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/19.jpg)
4pzJ 4nzJ
3pzJ 3nzJ
Vin
1
2
3
4
Vout
1
2
3
4
1 1
2 2
Proton sector Neutron sector
Lanczos vectors distribution:
Hermiticity on
Forward and …… backward application of H
22
![Page 20: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/20.jpg)
4pzJ 4nzJ
3pzJ 3nzJ
Vin
1
2
3
4
Vout
1
2
3
4
1 1
2 2
Proton sector Neutron sector
Lanczos vectors distribution:
Hermiticity on
Each compute node needs at a minimum TWO sectors from initial and TWO sectors from final Lanczos vector
Forward and …… backward application of H
22
![Page 21: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/21.jpg)
Vin
1
2
Vout
1
2
Lanczos vectors distribution:
Hermiticity off
4pzJ 4nzJ
3pzJ 3nzJ
1 1
2 2
Proton sector Neutron sector
Forward application of H on one node and …
23
![Page 22: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/22.jpg)
Vin
1
2
Vout
1
2
Lanczos vectors distribution:
Hermiticity off
4pzJ 4nzJ
3pzJ 3nzJ
1 1
2 2
Proton sector Neutron sector
Forward application of H on one node and …
… backward application of H on another node
4pzJ 4nzJ
3pzJ 3nzJ
1 1
2 2
1
2
1
2
23
![Page 23: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/23.jpg)
Vin
1
2
Vout
1
2
Lanczos vectors distribution:
Hermiticity off
4pzJ 4nzJ
3pzJ 3nzJ
1 1
2 2
Proton sector Neutron sector
Forward application of H on one node and …
… backward application of H on another node
4pzJ 4nzJ
3pzJ 3nzJ
1 1
2 2
1
2
1
2
Each compute node needs ONE sector from initial and ONE sector from final Lanczos vector
23
![Page 24: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/24.jpg)
Comparison of memory requirements for distributing Lanczos vectors:
Nuclide Space Basis dim Store Hermiticity ON
Hermiticity OFF
12C Nmax = 10 7800M 117GB 8.44GB 4.39GB
60Zn pf 2300M 34GB 8.65GB 4.45GB
24
Memory required to store 2 Lanczos vectors (double precision) on a node
![Page 25: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/25.jpg)
Comparison of memory requirements for distributing Lanczos vectors:
Nuclide Space Basis dim Store Hermiticity ON
Hermiticity OFF
12C Nmax = 10 7800M 117GB 8.44GB 4.39GB
60Zn pf 2300M 34GB 8.65GB 4.45GB
24
Memory required to store 2 Lanczos vectors (double precision) on a node
Distribution scheme with suppressed hermiticity is the most memory efficient. This is the scheme of choice for us
![Page 26: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/26.jpg)
UNEDF 2011 ANNUAL/FINAL MEETING
Highlighting accomplishments for 2010-2011:
Add OpenMP
Reduce memory load/ node -- Lanczos vectors-- matrix information (matrix elements/jumps)
Speed up reorthogonalization-- I/O is bottleneck
![Page 27: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/27.jpg)
UNEDF 2011 ANNUAL/FINAL MEETING
Highlighting accomplishments for 2010-2011:
Add OpenMP
Reduce memory load/ node -- Lanczos vectors-- matrix information (matrix elements/jumps)
Speed up reorthogonalization-- I/O is bottleneck
We (i.e. PK) spent time trying to make MPI/IO efficient for our needs via striping, etc.
Analysis by Rebecca Hartman-Baker (ORNL) suggests our I/O still running sequentially rather than in parallel.
Now we will store all Lanczos vectors in memory a la MFDn(makes restarting an interrupted run difficult)
![Page 28: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/28.jpg)
UNEDF 2011 ANNUAL/FINAL MEETING
Next steps for remainder of project period:
•Store Lanczos vectors in RAM (end of summer)•Write paper on factorization algorithm (drafted, finish by9/2011)•Fully implement MPI/ OpenMP hybrid code (11/2011)•Write up paper for publication of code (early 2012)
![Page 29: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/29.jpg)
UNEDF 2011 ANNUAL/FINAL MEETING
UNEDF Deliverables for BIGSTICK
•The LCCI project will deliver final UNEDF versions of LCCI codes, scripts, and test cases will be completed and released. Current version (6.5) at NERSC; expect final version by end of year; plans to publish in CPC or similar venue.
•Improve the scalability of BIGSTICK CI code up to 50,000 cores.Main barrier was reorthogonalization; now putting Lanczos vectors in memory to minimize I/O
• Use BIGSTICK code to investigate isospin breaking in pf shell Delayed due to problem with I/O hardware on Sierra
![Page 30: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/30.jpg)
UNEDF 2011 ANNUAL/FINAL MEETING
SciDAC-3 possible deliverables for BIGSTICK
(End of SciDAC-2: 3-body forces on 100,000 cores)
•Run with 3-body up to 1,000,000 cores on Sequoia,Nmax =10/12 for 12,14C
•Add in 4-body forces; investigate alpha-clustering with effective 4-body forces (via SRG or Lee-Suzuki)
•Currently interfaces with Navratil’s TRDENS to generate densities, spectroscopic factors, etc, needed for RGM reactioncalculations; will improve this: develop fast post-processingwith factorization
•Investigate general unitary-transform effective interactions, adding constraint to observables
![Page 31: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/31.jpg)
31
Sample application: cold atomic gases at unitarity in a harmonic trap
Using only 1 generator (d/dr) (very much like UCOM)
Fit to A =3, 1-, 0+
A = 4, 0+,1+, 2+
UNEDF -- MSU June 2010
starting rms = 2.32final rms = 0.58
UNEDF 2011 ANNUAL/FINAL MEETING
![Page 32: UNEDF 2011 ANNUAL/FINAL MEETING](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813a2c550346895da212f0/html5/thumbnails/32.jpg)
UNEDF 2011 ANNUAL/FINAL MEETING
Cross-fertilization of LCCI project:
BIGSTICKMFDn
On-the-fly construction of basis states and matrix elements
On-the-fly construction of basis states and matrix elements
Reorthogonalization and Lanczos vector management
Reorthogonalization and Lanczos vector management
NuShellX
J-projecte
d basis
J-projecte
d basisJ-projected basis
J-projected basis