software needs for quantum chemistry softwarecscads.rice.edu/talk09-edo-apra.pdfedoardo aprà...
TRANSCRIPT
Edoardo Aprà
Software needs for Quantum Chemistry
Software
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
1
Acknowledgments
• Part of this work is funded by the U.S.• Part of this work is funded by the U.S. Department of Energy, Office of Advanced Scientific Computing Research and by the division of Basic Energy Science, Office of Science, under contract DE-AC05-00OR22725 with Oak Ridge National Laboratory.
• This research used resources of the National
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
2
This research used resources of the National Center for Computational Sciences at Oak Ridge National Laboratory under contract DE-AC05-00OR22725.
Outline
• NWChem and Madness introduction• NWChem and Madness introduction• List of Software requirements
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
3
Why Was NWChem Developed?• Developed as part of the construction of the
Environmental Molecular Sciences Laboratory (EMSL) at Pacific Northwest National Lab
• Designed and developed to be a highly efficient and portable Massively Parallel computational chemistry package
• Provides computational chemistry solutions that are scalable with respect to chemical system
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
4
are scalable with respect to chemical system size as well as MPP hardware size
• Provides major modeling and simulation capability for molecular science
What is NWChem used for?
capability for molecular science−Broad range of molecules, including
biomolecules, nanoparticles and heavy elements− Electronic structure of molecules (non-
relativistic, relativistic, structural optimizations and vibrational analysis)I i l t i lid t t bilit (DFT
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
5
− Increasingly extensive solid state capability (DFT plane-wave, CPMD)
−Molecular dynamics, molecular mechanics
Molecular Science Software Molecular Science Software GroupGroup
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
6
GA Tools Overview• Shared memory model in context of distributed
dense arrays• Complete environment for parallel code p p
development• Compatible with MPI• Data locality control similar to distributed
memory/message passing model• Compatible with other libraries: ScaLapack,
Peigs, …
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
7
• Parallel and local I/O library (Pario)De-Facto Standard communication library for Quantum Chemistry Software (NWChem, Molpro, Molcas, MPQC)Other fields expressed interest: Fusion, Nuclear Physics, Astrophysics, Climate
Structure of GAF90 JavaApplication
programming
Message PassingARMCI
distributed arrays layermemory management, index translationGlobal Arrays
and MPI are completely interoperable.
Fortran 77 C C++ BabelPython
programming language interface
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
8
Message PassingGlobal operations
portable 1-sidedcommunication
put,get, locks, etc
system specific interfacesLAPI, GM, threads, QSnet, IB, SHMEM, Portlals
pCode can contain calls to both libraries.
NWChem Architecture
base DFT energy, gradient, …
MD NMR S l ti
SCF energy, gradient, …Molecular
CalculationModules
GenericTasksEnergy, structure, …
• Object-oriented design• abstraction, data hiding,
APIs
Run
-tim
e da
tab MD, NMR, Solvation, …
Optimize, Dynamics, …
egra
l API
omet
ry O
bjec
t
sis S
et O
bjec
t
GS
MolecularModeling
Toolkit
Modules APIs• Parallel programming model
• non-uniform memory access, Global Arrays, MPI
• Infrastructure• GA, Parallel I/O, RTDB, MA,
Peigs, ...
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
9
Inte
Geo
Bas
...PeI G
...
Global Arrays
Memory Allocator
Parallel IO MolecularSoftware
DevelopmentToolkit
g ,• Program modules
• communication only through the database
• persistence for easy restart
Gaussian DFT computational kernelEvaluation of XC potential matrix element
my next task = SharedCounter() Dmy_next_task = SharedCounter()do i=1,max_i
if(i.eq.my_next_task) thencall ga_get()
(do work)call ga_acc()my_next_task = SharedCounter()
endifenddo
barrier()
D μν
F λσ
ρ(xq) = ∑μν Dμν χκ(xq) χν(xq)
Fλσ+= ∑q wq χλ(xq) Vxc[ρ(xq)] χσ(xq)
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
10
()
Both GA operations are greatly dependent on the communication latency
Multiresolution Adaptive Numerical Scientific Simulation
Ariana Beste1, George I. Fann1, Robert J. Harrison1,2, Rebecca Hartman-Baker1, Shinichiro Sugiki1g
1Oak Ridge National Laboratory2University of Tennessee, Knoxville
In collaboration with
Gregory Beylkin4, Fernando Perez4, Lucas Monzon4, Martin Mohlenkamp5 and others
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
11
Martin Mohlenkamp and others4University of Colorado
5Ohio University
Multiresolution chemistry objectives
• Complete elimination of the basis error− One-electron models (e.g., HF, DFT)− Pair models (e.g., MP2, CCSD, …)
Bound and continuum states on equal footing− Bound and continuum states on equal footing• Correct scaling of cost with system size• General approach
− Readily accessible by students and researchers− Higher level of composition − Direct computation of chemical energy differences
• New computational approaches
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
12
New computational approaches • Fast algorithms with guaranteed precision
MADNESS applications – chemistry, physics, nuclear, ...
MADNESS Structure
MADNESS parallel runtime
MADNESS math and numerics
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
13
MPI Global Arrays ARMCI GPC/GASNET
PhysicsOpen Espresso
Linear AlgebraScaLAPACKLAPACK I/O
HDF5
MADNESS
VisualizationVisitOpenDXVMD
RADPythonOctave
SolversSundanceTOPS
ChemistryNWChemGAMESSMPQC
Parallel tools Math
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
14
OctaveSciLab
TOPSPetSc
Parallel toolsMPIGlobal ArraysGPC/GASNETSengal
MathSage Maple
MADNESS Runtime Objectives• Scalability to 1+M processors ASAP• Runtime responsible for
• scheduling and placement, MADNESS applications – chemistry, physics, nuclear, ...
• managing data dependencies, • hiding latency, and• Medium to coarse grain concurrency
• Compatible with existing models• MPI, Global Arrays
• Borrow successful concepts from Cilk, Charm++,
MADNESS parallel runtime
MPI Global Arrays ARMCI GPC/GASNET
MADNESS math and numerics
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
15
p , ,Python
• Anticipating next gen. languages
Key elements• Futures for
− hiding latency and − automating dependency management
• Global names and name spaces
• Non-process centric computing− One-sided messaging between objects− Retain place=process for MPI compatibility
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
16
Retain place process for MPI compatibility
• Dynamic load balancing
I/O Requirements
• I/O models in NWCHEM− local I/O (e.g. internal HD in a node of a cluster)
P ll l I/O− Parallel I/O• I/O usage:
− Checkpointing (size: O(N2) where N is ~104)− Scratch data: compute and written once, read many
times (size: ask much as we can get)• MPI/IO
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
17
• HDF5• Reliability (parallel filesystems …)
1-sided Libs req
• GA is used for the bulk of NWChem• GA is used for the bulk of NWChem comm.
• ARMCI• GPC• Gasnet
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
18
MPI Requirements
• 64-bit integers (again!)• 64-bit integers (again!)• Compatibility with other communication
libraries in use.
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
19
Multicore programming
• Task based parallelism (Intel Threading• Task based parallelism (Intel Threading Building Blocks ?)
• Compatible with MPI and other communication libraries
• Profiling & debugging tools
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
20
Linear Algebra - Serial
• Serial (e g Lapack/Blas)• Serial (e.g. Lapack/Blas)− Requirement: ease of installation− Optimized perf. Other large range of N− 64-bit integers (if possible)
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
21
Linear Algebra - Parallel
− Distributed arrays: interface with GA− Installation− Extensive use of dense symmetric eigensolver
• MRRR algorithm in Scalapack (PDSYEVR by C. Voemel)
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
22
Optimization/Tuning
• Profiling tools• Profiling tools− Gprof, TAU− PAPI− Code coverage & instrumentation
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
23
Quality/Assurance?
• QA suite in NWChem• QA suite in NWChem − Perl scripts− Reference output files− Checks results’ correctness− Database of performance on various HW?
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
24
Debugging
• Parallel debuggers• Parallel debuggers• Printf?
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
25
Thanks
• NWChem group and GA Group - PNNL• Robert Harrison• NCCS & CCS – ORNL• Workshop organizers
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
26
Backup Slides
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
27
NWChem HW and SW requirements
• low latency and high bandwidth for − I/O− communication
• availability of large amount of aggregate memoryavailability of large amount of aggregate memory and disk
• flat I/O model using local disk• 64-bit addressing (required by highly correlated
methods)• extensive use of Linear Algebra:
− BLAS− FFT
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
28
FFT− Eigensolvers
• use of other scientif libraries (e.g. MASS, ACML_vec for exp and erfc)
NWChem Porting issues on the XT3
• Re-use of existing serial port for x86 64• Re-use of existing serial port for x86_64 (compilers)
• Communication library: GA/ARMCI− Two ARMCI ports: CRAY-SHMEM & Portals
• Made Pario library compatible with Catamount (using glibc calls)
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
29
Catamount (using glibc calls)
NWChem Porting - II
• Stability issue with SHMEM port• Stability issue with SHMEM port− Uncovered portal problem: Cray SHMEM group provided
workaround− NWChem crashes the whole XT running large jobs
• Performance of SHMEM port− Latency ~ 10 μsec− BW
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
30
• Contiguous put/get ~ 2GB/sec• Strided put/get ~ 0.3GB/sec
• ARMCI using Portal in progress