sustainable software for computational chemistry and materials modeling
DESCRIPTION
Presented at the 1st Workshop on Maintainable Software Practices in e-Science, Chicago, 9 October 2012. Co-located with e-Science 2012.TRANSCRIPT
SUSTAINABLE SOFTWARE FOR
COMPUTATIONAL CHEMISTRY
AND MATERIALS MODELING
Beverly Sanders, University of Florida
Outline
Overview
Challenges and Current State
Overcoming the Barriers—technical and cultural
Computational Chemistry
Long history – over 50 years
Underpins broad array of scientific applications—Grand Challenges Efficient combustion systems
Drug design
Understanding biological systems
Semiconductor design
Water sustainability
CO2 Sequestration
Efficient lighting (quantum dots)
…
Full partner with experiments—”computational experiments” may be more reliable than lab measurements
Scientific Software Innovation Institute for Computational
Chemistry and Materials Modeling -- S2I2C2M2
Collaboration between computational chemists,
computer scientists, applied mathematicians, and
computer engineers
Goal: overcome obstacles of algorithms and culture
and change the nature of computational chemistry
software development.
Year long conceptualization phase has been funded by
NSF
First meeting scheduled for January 2013
People
Daniel Crawford (Virginia Tech)
Robert Harrison (Tennessee, ORNL)
Anna I. Krylov (U.S.C.)
Theresa Windus (Iowa State)
Emily Carter (Princeton)
Edmund Chow (Georgia Tech)
Erik Deumens (Florida)
Mark Gordon (Iowa State)
Martin Head-Gordon (Berkeley)
Todd Martinez (Stanford),
David McDowell (Georgia Tech)
Vijay Pande (Stanford)
Manish Parashar (Rutgers)
Ram Ramanujam (LSU)
Beverly Sanders (Florida)
Bernhard Schlegel (Wayne State)
David Sherrill (Georgia Tech)
Lyudmila Slipchenko (Purdue)
Masha Sosonkina (Iowa State)
Edward Valeev (Virginia Tech)
Ross Walker (San Diego
Supercomputing Center)
+ others
Current State of Computational
Chemistry
Long history--legacies of modern molecular dynamics and quantum chemistry packages span decades
Both open source and commercial
Amalgam of programming languages
Domain specific methods
Multi-dimensional integral engines
General purpose
Davidson method for computing eigenvalues of large matrices (ranks in tens of billions)
Software is extremely complex
Example: Modern ab initio quantum chemistry simulations
Computations scale as O(N7) or higher
Where N represents size of molecular system (number of atoms, electrons, or basis functions)
Code complexity arises naturally from problems, but
is an obstacle to long-term sustainability
is an obstacle to exploitation of (ever changing) HPC hardware
hinders education of next generation of scientists
only a handful of very senior students can make a contribution
Much recent software development focused on
exploiting parallel architectures
Varying degrees of success
With a few exceptions, still not fully exploiting
available systems
Utilizing exascale will require rethinking of approach
Desperate need for tools to generate high
performance massively parallel code from high
level specifications
Developers
Mostly grad students and post-docs
Training in software engineering left to individual
research groups
Extent to which this is done varies
Large burden for small groups: community approach
has potential benefits for both software and the
students
Education tends to be narrow: students learn about
software their advisors are involved with
Science Drivers
Catalysis
Catalysts facilitate control of chemical reactions by raising rates that chemical bonds are formed or broken
Improve selectivity and control over unwanted byproducts
Decreased energy consumption
Reduction of waste stream
Rational design of catalysts for a specific application is one of the Holy Grails of of modern chemistry and chemical engineering
Requires quantitative information about transition states
Intermediates low concentration and short lifetimes—thwart experimental evaluation
Will require state-of-the-art computation combined with experiements
Science Drivers
Organic photovoltaic cells
Potential applications: thin-film transistors, LEDs, solar
cells, optical switches
Advantages
Devices can be flexible
Inexpensive to produce
Limitations
Reduced power conversion energy
But, process leading to current generation not well
understood
Overcoming the Barriers
1 year conceptualization phase funded by NSF
First meeting Jan 2013
3 working groups
Highest priority
Portable parallel infrastructure
General-purpose tensor algebra algorithms
Protocols for information exchange and code
interoperability
Education and training
Portable parallel infrastructure
Technology trends
Massive concurrency on a chip
Massive number of sockets in largest supercomputers
Heterogeneity (CPU + GPU)
Deep, complex memory hierarchies
Memory and communication bandwidth limited
Bleeding edge applications may need to
Coordinate over 109 threads
Tolerate faults
Explicitly manage energy consumption
Sustainability of large and widely distributed
chemistry codes
Enable most computations in chemistry
Likely will run on leadership class machines
Working group will include computational chemists,
parallel programming experts, and reps from major
tech providers (NVIDIA, Intel, IBM)
Sustainability of software developed by smaller
research groups
Need to understand programming models and tools
Need to understand how both community and software can be better organized
Accelerate testing of new ideas at sufficient scale to determine their worth
Key: being able to write code and integrate into existing software.
Currently, new developments take months or years to migrate from developers software to other packages
General-purpose tensor algebra
algorithms
Tensor algebra ubiquitous in science and
engineering
Need new approaches for computing with high
dimensional tensors
Current software—8 or fewer
Emerging methods require 3N dimensions where N (the
number of electrons) may be O(100) or more.
Need common framework of reusable software
elements.
Challenges for high-rank tensors
Challenges
Develop robust implementations of algorithms
Standardize data structures and algorithms, APIs,
software elements, and frameworks
Automate the derivation, transformation, and
implementation of tensor expressions.
Will require cross-disciplinary collaborations
Infrastructure will include DSL, runtimes, compilers as
well as static and dynamic algorithm analyses
Protocols for Information Exchange and
Code Interoperability
Historically culture has been competitive rather than
collaborative
Sharing of code and data limited
Theoretical methods driving code towards greater
size and complexity
Currently, progress may require substantial
duplication of effort—code that could be reused is
not due to lack of standards
Impedes new science, wastes human labor
Information Sharing
Standards for data shared between codes
Standards (or methods to convert between
standards) for stored data to facilitate mining.
Data provenance
Cannot expect all code to use the same format, but
transformation leads to errors, computational
inefficiencies, complex interfaces
Code Sharing
Need to establish level of interoperability
Coarse-grained
Hartree-Fock code from one app, MP2 code from another
Fine-grained
Calculate most of one electron contributions to a Fock matrix
in one program, relativistic and solvent terms in another
Need architecture
Education and Training
Mastering existing codes is daunting task for grad students
Chemical education culture worse than many STEM fields
PhD only requires modest coursework
OK for most fields of chemistry where undergrad training is adequate preparation for hands-on lab research
Most students unprepared for research in computational chemistry
Ad hoc training by individual research groups is innefficient
How should students be prepared for 2020 and beyond?
Programming models and tools
Multidisciplinary foundation to computational science
Reasoning about software
Manipulating software with confidence and facility
Summer school
Summer school for grad students supported by
community
Fundamental algorithms of computational chemistry
Software best practices
If S2I2C2M2 is successful
Open access to new software tools and infrastructure
Training and educational opportunities for grad students and post-docs in
Algorithms
Code standards
Software best practices
NOT the goal to produce monolithic computational chemistry package to replace existing ones
Healthy competition is good
Sets of robust and properly validated software components that can be shared will benefit the entire community