the us doe exascale computing project (ecp) · and hardware innovations within doe facilities ecp...
TRANSCRIPT
The US DOE ExascaleComputing Project (ECP)Perspective for the HEP Community
Douglas B. Kothe (ORNL), ECP DirectorLori Diachin (LLNL), ECP Deputy DirectorErik Draeger (LLNL), ECP Deputy Director of Application DevelopmentTom Evans (ORNL), ECP Energy Applications Lead
Blueprint Workshop on A Coordinated Ecosystem for HL-LHC Computing R&DWashington, DCOctober 23, 2019
2
US DOE Office of Science (SC) and National Nuclear Security Administration (NNSA)
DOE Exascale Program: The Exascale Computing Initiative (ECI)
ECI partners
Accelerate R&D, acquisition, and deployment to deliver exascale computing capability to DOE national labs by the early- to mid-2020s
ECI mission
Delivery of an enduring and capable exascale computing capability for use by a wide range of applications of importance to DOE and the US
ECI focus
Exascale Computing
Project (ECP)
Exascale system procurement projects &
facilities
ALCF-3 (Aurora)
OLCF-5 (Frontier)
ASC ATS-4 (El Capitan)
Selected program office application
development (BER, BES,
NNSA)
Three Major Components of the ECI
3
ECP Mission and VisionEnable US revolutions in technology development; scientific discovery; healthcare; energy, economic, and national security
Develop exascale-ready applications and solutions that address currently intractable problems of strategic importance and national interest.
Create and deploy an expanded and vertically integrated software stack on DOE HPC exascale and pre-exascale systems, defining the enduring US exascale ecosystem.
Deliver US HPC vendor technology advances and deploy ECP products to DOE HPC pre-exascale and exascale systems.
ECP mission
Deliver exascale simulation and data science innovations and solutions to national problems that enhance US economic competitiveness, change our quality of life, and strengthen our national security.
ECP vision
4
Vision: Exascale Computing Project (ECP) Lifts all U.S. High Performance Computing to a New Trajectory
Time
Capability
2016 2021 2022 2023 2024 2025 2026 2027
10X
5X
5
Relevant US DOE Pre-Exascale and Exascale Systems for ECP
6
The three technical areas in ECP have the necessary components to meet national goals
ApplicationDevelopment (AD)
SoftwareTechnology (ST)
Hardware and Integration (HI)
Performant mission and science applications @ scale
Aggressive RD&D Project
Mission apps & integrated S/W stack
Deployment to DOE HPC Facilities
Hardware tech advances
Integrated delivery of ECP products on targeted systems at
leading DOE HPC facilities
6 US HPC vendors focused on exascale node and system
design; application integration and software deployment to
facilities
Deliver expanded and vertically integrated software stack to
achieve full potential of exascale computing
70 unique software products spanning programming models and run times, math libraries,
data and visualization
Develop and enhance the predictive capability of
applications critical to the DOE
24 applications including national security, to energy, earth
systems, economic security, materials, and data
7
Measure progress and ensure execution within scope, schedule, and budget
Develop and enhance predictive capability of applications critical to DOE across science, energy, and national security mission space
Build a comprehensive, coherent software stack that enables the productive development of highly parallel applications that effectively target diverse exascale architectures
A capable exascale computing ecosystem made possible by integrating ECP applications, software and hardware innovations within DOE facilities
ECP is a large, complex projectEffective project management with three technical focus areas designed to deliver a capable exascale ecosystem
Project Management (PM)
Application Development (AD)
Software Technology (ST)
Hardware and Integration (HI)
Distinctive characteristics
• RD&D and software development in nature
• Two sponsoring DOE programs
• Numerous participating institutions
• Decentralized cost system
• External project dependence
• Broad and qualitative mission need requirements
• Outcomes: Products and solutions
• Key performance parameters require innovation
• Application of scope contingency
• End of project transition
8
ECP by the Numbers
A seven-year, $1.8 B R&D effort that launched in 2016
Six core DOE National Laboratories: Argonne, Lawrence Berkeley, Lawrence Livermore, Oak Ridge, Sandia, Los Alamos
• Staff from most of the 17 DOE national laboratories take part in the project
More than 80 top-notch R&D teams
Four focus areas: Hardware and Integration, Software Technology, Application Development, Project Management
Hundreds of consequential milestones delivered on schedule and within budget since project inception
7 YEARS
$1.7B
6CORE DOE
LABS
4FOCUSAREAS
81 R&D TEAMS
1000 RESEARCHERS
9
Software Technology
Mike Heroux, SNLDirector
Jonathan Carter, LBNLDeputy Director
Hardware & Integration
Terri Quinn, LLNL
Director
Susan Coghlan, ANL
Deputy Director
Application Development
Andrew Siegel, ANLDirector
Erik Draeger, LLNLDeputy Director
Project Management
Kathlyn Boudwin, ORNL
Director
Manuel Vigil, LANL
Deputy Director
Doug Collins, ORNL
Associate Director
Al Geist, ORNL
Chief Technology Officer
Exascale Computing ProjectDoug Kothe, ORNL
Project Director
Lori Diachin, LLNL
Deputy Project Director
Project Office Support
Megan Fielden, Human Resources
Willy Besancenez, Procurement
Sam Howard, Export Control Analyst
Mike Hulsey, Business Management
Kim Milburn, Finance Officer
Susan Ochs, Partnerships
Michael Johnson, Legal
and Points of Contacts at the
Core Laboratories
Julia White, ORNL
Technical OperationsMike Bernhardt, ORNL
Communications
Doug CollinsIT & Quality
Monty MiddlebrookProject Controls & Risk
Industry Council
Dave Kepczynski, GE, Chair
Core Laboratories
Board of Directors
Bill Goldstein, Chair (Director, LLNL)
Thomas Zacharia, Vice Chair (Director, ORNL)
Laboratory Operations Task Force (LOTF)
DOE HPC Facilities
ECP Organization
Dan HoagFederal Project Director
Barb HellandASCR Program Manager
Thuc HoangASC Program Manager
10
Project Management2.1
Boudwin (ORNL)
Project Planning and Management
2.1.1Boudwin (ORNL)
Project Controls and Risk Management
2.1.2Middlebrook (ORNL)
Information Technology and Quality Management
2.1.5Collins (ORNL)
Business Management2.1.3
Hulsey (ORNL)
Procurement Management2.1.4
Besancenez (ORNL)
Communications and Outreach2.1.6
Bernhardt (ORNL)
Chemistry and Materials Applications
2.2.1
Energy Applications2.2.2
National Security Applications2.2.5
Earth and Space Science Applications
2.2.3
Co-Design2.2.6
Application Development2.2
Software Technology2.3
Heroux (SNL)
Programming Models and Runtimes
2.3.1Thakur (ANL)
Development Tools2.3.2
Vetter (ORNL)
Mathematical Libraries2.3.3
McInnes (ANL)
Data and Visualization2.3.4
Ahrens (LANL)
Chemistry and Materials Applications
2.2.1Deslippe (LBL)
Energy Applications2.2.2
Evans (ORNL)
National Security Applications2.2.5
Francois (LANL)
Earth and Space Science Applications
2.2.3Dubey (ANL)
Data Analytics and Optimization Applications
2.2.4Hart (SNL)
Co-Design2.2.6
Germann (LANL)
Application Development2.2
Siegel (ANL)
Chemistry and Materials Applications
2.2.1
Energy Applications2.2.2
National Security Applications2.2.5
Earth and Space Science Applications
2.2.3
Co-Design2.2.6
Application Development2.2
PathForward2.4.1
de Supinski (LLNL)
Hardware Evaluation2.4.2
Pakin (LANL)
Facility Resource Utilization2.4.5
White (ORNL)
Application Integration at Facilities
2.4.3Hill (ORNL)
Software Deployment at Facilities
2.4.4Adamson (ORNL)
Training and Productivity2.4.6
Barker (ORNL)
Hardware and Integration2.4
Quinn (LLNL)
Exascale Computing Project 2.0
Kothe (ORNL)
ECP Work Breakdown Structure (WBS)Key leaders at WBS Level 1, 2, 3
Software Ecosystem and Delivery2.3.5
Munson (ANL)
NNSA Software Technologies2.3.6
Neely (LLNL)
81 WBS L4 subprojects have set their FY20-23 performance baseline with scope and technical plans to execute on RD&D objectives in ECP’s Final Design
11
ECP High Level Schedule and Access to Systems
12
ECP applications target national problems in DOE mission areas
Health care
Accelerate and translate
cancer research (partnership with NIH)
Energy security
Turbine wind plant efficiency
Design and commercialization
of SMRs
Nuclear fission and fusion reactor materials design
Subsurface use for carbon capture, petroleum extraction,
waste disposal
High-efficiency, low-emission
combustion engine and gas turbine
design
Scale up of clean fossil fuelcombustion
Biofuel catalyst design
National security
Next-generation, stockpile
stewardship codes
Reentry-vehicle-environment simulation
Multi-physics science simulations of high-
energy density physics conditions
Economic security
Additive manufacturing
of qualifiablemetal parts
Reliable and efficient planning of the power grid
Seismic hazard risk assessment
Earth system
Accurate regional impact assessments
in Earth system models
Stress-resistant crop analysis and catalytic
conversion of biomass-derived
alcohols
Metagenomics for analysis of
biogeochemical cycles, climate
change, environmental remediation
Scientific discovery
Cosmological probe of the standard model
of particle physics
Validate fundamental laws of nature
Plasma wakefieldaccelerator design
Light source-enabled analysis of protein
and molecular structure and design
Find, predict, and control materials
and properties
Predict and control magnetically
confined fusion plasmas
Demystify origin of chemical elements
13
Co-design Subprojects
Co-design helps to ensure that applications effectively utilize
exascale systems
• Pull software and hardware developments into applications
• Pushes application requirements into software and hardware RD&D
• Evolved from best practice to an essential element of the development cycle
CD Centers focus on a unique collection of algorithmic motifs
invoked by ECP applications
• Motif: algorithmic method that drives a common pattern of computation and communication
• CD Centers must address all high priority motifs used by ECP applications, including the new motifs associated with data science applications
Efficient mechanism for delivering next-generation
community products with broad application impact
• Evaluate, deploy, and integrate exascale hardware-aware software designs and technologies for key crosscutting algorithmic motifs into applications
ExaLearnMachine Learning
ExaGraphGraph-based
algorithms
CEEDFinite element discretization
AMReXBlock structured
AMR
COPAParticles/mesh
methods
CODARData and
workflows
• Co-design centers address computational motifs common to multiple application projects
14
LLNLIBM/NVIDIA
Department of Energy (DOE) Roadmap to Exascale SystemsAn impressive, productive lineup of accelerated node systems supporting DOE’s mission
ANLIBM BG/Q
ORNLCray/AMD/NVIDIA
LBNLCray/AMD/NVIDIA
LANL/SNLTBD
ANL*Cray/Intel
ORNL*Cray/AMD
LLNL*TBD
LANL/SNLCray/Intel Xeon/KNL
2012 2016 2018 2020 2021-2023
ORNLIBM/NVIDIA
LLNLIBM BG/Q
Sequoia (13)
Cori (14)
Trinity (7)
Theta (28)Mira (24)
Titan (12) Summit (1)
NERSC-9Perlmutter
Aurora
ANLCray/Intel KNL
LBNLCray/Intel Xeon/KNL
First U.S. Exascale Systems*
Sierra (2)
Pre-Exascale Systems
Three different types of accelerators!
To date, only NVIDIA GPUs
15
New hardware requires fully re-examining approaches
Code PortingAlgorithmic
RestructuringAlternate Choice of
Physical Models
NewNumerical
Approaches
This is not just a porting exercise, codes are being redesigned with heterogeneous computing and portability in mind
Goal: Ensure exascale hardware impacts DOE science/engineering mission
Approach: Significant investment in scientific applications well in advance of exascale machines
16
ECP Software Technology Software Ecosystem
ECP Applications
Software Ecosystem & Delivery
DevelopmentTools
ProgrammingModels
Runtimes
Mathematical
LibrariesData &
Visualization
Facilities Vendors HPC Community
ECP Software Technology
Collaborators
Details available publicly at https://www.exascaleproject.org/wp-content/uploads/2019/02/ECP-ST-CAR.pdf
17
Programming Models & Runtimes
• Enhance & prepare OpenMP and MPI programming models (hybrid programming models, deep memory copies) for exascale
• Development of performance portability tools (e.g. Kokkos and Raja)
• Support alternate models for potential benefits and risk mitigation: PGAS (UPC++/GASNet) ,task-based models (Legion, PaRSEC)
• Libraries for deep memory hierarchy & power management
Development Tools
• Continued, multifaceted capabilities in portable, open-source LLVM compiler ecosystem to support expected ECP architectures, including support for F18
• Performance analysis tools that accommodate new architectures, programming models, e.g., PAPI, Tau
Math Libraries
• Linear algebra, iterative linear solvers, direct linear solvers, integrators and nonlinear solvers, optimization, FFTs, etc
• Performance on new node architectures; extreme strong scalability
• Advanced algorithms for multi-physics, multiscale simulation and outer-loop analysis
• Increasing quality, interoperability, complementarity of math libraries
Data and Visualization
• I/O libraries: HDF5, ADIOS, PnetCDF,
• I/O via the HDF5 API
• Insightful, memory-efficient in-situ visualization and analysis – Data reduction via scientific data compression
• Checkpoint restart
• Filesystem support for emerging solid state technologies.
Software Ecosystem
• Develop features in Spack necessary to support all ST products in E4S, and the AD projects that adopt it
• Development of Spack stacks for reproducible turnkey deployment of large collections of software
• Optimization and interoperability of containers on HPC systems
• Regular E4S releases of the ST software stack and SDKs with regular integration of new ST products
NNSA ST
• Projects that have both mission role and open science role
• Major technical areas: New programming abstractions, math libraries, data and viz libraries
• Cover most ST technology areas
• Open source NNSA Software projects
• Subject to the same planning, reporting and review processes
ECP software technologies are a fundamental underpinning in delivering on DOE’s exascale mission
18
Software Development Kits (SDKs): Key delivery vehicle for ECPA collection of related software products (packages) where coordination across package teams improves usability and practices, and foster community growth among teams that develop similar and complementary capabilities
• Domain scopeCollection makes functional sense
• Interaction modelHow packages interact; compatible, complementary, interoperable
• Community policiesValue statements; serve as criteria for membership
• Meta-infrastructureInvokes build of all packages (Spack), shared test suites
• Coordinated plansInter-package planning. Augments autonomous package planning
• Community outreachCoordinated, combined tutorials, documentation, best practices
ECP ST SDKs: Grouping similar products for collaboration & usability
Programming Models & Runtimes Core
Tools & Technologies
Compilers & Support
Math Libraries (xSDK)
Viz Analysis and Reduction
Data mgmt., I/O Services & Checkpoint/ Restart
“Unity in essentials, otherwise diversity”
19
ECP ST SDKs will span all technology areas
zfp
VisIt
ASCENT
Cinema
Catalyst
VTK-m
SZ
ParaView
Visualization Analysis and Reduction (9)
ROVER
xSDK (16)
MAGMA
DTK
Tasmanian
TuckerMPI
SUNDIALS
PETSc/TAO
libEnsemble
STRUMPACK
SuperLU
ForTrilinos
SLATE
MFEM
Kokkoskernels
Trilinos
hypre
FleSCI
PMR Core (17)
UPC++
MPICH
Open MPI
Umpire
AML
RAJA
CHAI
PaRSEC*
DARMA
GASNet-EX
Qthreads
BOLT
SICM
Legion
Kokkos (support)
QUO
Papyrus
Tools and Technology (11)
PAPI
Program Database Toolkit
Search (random forests)
Siboka
C2C
Sonar
Dyninst Binary Tools
Gotcha
Caliper
TAU
HPCToolkit
Compilers and Support (7)
OpenMP V & V
Flang/LLVM Fortran comp
LLVM
CHiLL autotuning comp
LLVM openMP comp
openarc
Kitsune
Data management, I/O Services, Checkpoint restart (12)
Parallel netCDF
ADIOS
Darshan
UnifyCR
VeloC
IOSS
HXHIM
ROMIO
Mercury (Mochi suite)
HDF5
SCR
FAODEL
Ecosystem/E4S at-large (12)
BEE
FSEFI
Kitten Lightweight Kernel
COOLR
NRM
ArgoContainers
Spack
MarFS
GUFI
Intel GEOPM
mpiFileUtils
TriBITS
Tools
PMR
Data and Vis
Ecosystems and delivery
Math Libraries Legend Each column is an SDK as defined in the initial breakdown process using criteria developed for choosing logical and effective groupings based on experience with the xSDK. The colored background denotes the ST technical area for each product.
20
ST Ecosystem: From products to SDKs to an integrated stack
ST Products
• Source: ECP L4 teams; Non-ECP Developers; Standards Groups
• Delivery: Apps directly; spack; vendor stack; facility stack
SDKs
• Source: ECP SDK teams; Non-ECP Products (policy compliant, spackified)
• Delivery: Apps directly; spack install sdk; future: vendor/facility
E4S
•Source: ECP E4S team; Non-ECP Products (all dependencies)
•Delivery: spack install e4s; containers; CI Testing
Levels of Integration Product Source and Delivery
• Group similar products
• Make interoperable
• Assure policy compliant
• Include external products
• Build all SDKs
• Build complete stack
• Containerize binaries
• Standard workflow
• Existed before ECP
ECP ST Open Product Integration Architecture
ECP ST Individual Products
21
Extreme-scale Scientific Software Stack (E4S)A Spack-based distribution of ECP ST products and related and dependent software tested for interoperability and portability to multiple architecturesLead: Sameer Shende, University of Oregon
• Provides distinction between SDK usability / general quality / community and deployment / testing goals
• Will leverage and enhance SDK interoperability thrust
• Releases:
– Oct: E4S 0.1: 24 full, 24 partial release products
– Jan: E4S 0.2: 37 full, 10 partial release products
• Current primary focus: Facilities deployment
http://e4s.io
Monte Carlo Transport on Accelerated Node ArchitecturesRecent efforts in the ECP ExaSMR Subproject
Thomas M. Evans
A Coordinated Ecosystem for HL-LHC Computing R&D
Catholic University, Oct 23, 2019
23
ExaSMR: Modeling and Simulation of Small Modular Reactors
• Small modular nuclear reactors present significant simulation challenges
– Small size invalidates existing low-order models
– Natural circulation flow requires high-fidelity fluid flow simulation
• ExaSMR will couple most accurate available methods to perform “virtual experiment” simulations
– Monte Carlo neutronics
– CFD with turbulence modelsReproduced with permission
MC Neutronics CFD
Petascale Exascale Petascale Exascale
• System-integrated responses
• Single physics
• Constant temperature
• Isotopic depletion on assemblies
• Reactor startup
• Pin-resolved (and sub-pin) responses
• Coupled with T/H
• Variable temperatures
• Isotopic depletion on full core
• Full-cycle modeling
• Single fuel assembly
• RANS
• Within-core flow
• Full reactor core
• Hybrid LES/RANS
• Entire coolant loop
Fuel assembly mixing vane
24
Physical Problem CharacteristicsProblem Parameters
• Core Characteristics
– Full core representative SMR model containing 37 assemblies with 17 × 17 pins per assembly and 264 fuels pins per assembly
– 1010 particles per eigenvalue iteration
– Pin-resolved reaction rate with 3 radial tally regions and 50 – 100 axial levels
– O(150) nuclides and O(8) reactions per nuclide in each tally region
• Geometry Size
– 𝑁𝑐𝑒𝑙𝑙𝑠 = 1.9 × 106 − 8.8 × 106
• Tally Sizes
– 𝑁𝑡,𝑐𝑒𝑙𝑙𝑠 = 4.8 × 105 − 5.9 × 106
– 𝑁𝑡,𝑏𝑖𝑛𝑠 = 1.5 × 109 − 1.8 × 1010
rf = 0.406 cmrg = 0.414 cmrc = 0.475 cm
Pin pitch = 1.26 cmAssembly pitch = 21.5 cmHeight = 227.56 cm
Fuel (UO2)
Clad (Zr) Gap (He)
25
Monte Carlo Neutron Transport Challenges
• MC neutronics is a stochastic method
– Independent random walks are not readily amenable to SIMT algorithms – on-node concurrency
– Sampling data is randomly accessed
– Sampling data is characterized by detailed structure
– Large variability in transport distributions both within and between particle histories
26
Developing GPU Continuous Energy Monte Carlo – Intra-Node
• Focus on high-level thread divergence
• Optimize for device occupancy
– Separate geometry and physics kernels to increase occupancy
– Boundary crossings (geometry)
– Collision (physics)
• Smaller kernels help address variability in particle transport distributions
• Partition macro cross section calculations between fuel and non-fuel regions – separate kernels for each
• Use of hardware atomics for tallies and direct sort addressing
• Judicious use of texture memory
– __ldg on data interpolation bounds
Simple Event-Based Transport Algorithm
get vector of source particleswhile any particles are alive dofor each living particle do
move particledist-to-collisiondist-to-boundarymove-to-next
end forfor each living particle do
process particle collisionend forsource particlessort/consolidate surviving particles
end while
27
Production continuous-energy Monte Carlo transport solver on GPUs
• Petascale implementation did not use GPU hardware
• Enables three-dimensional, fully-depleted SMR core
models simulated using continuous-energy physics and
pin-resolved reaction rates with temperature-dependence
• Algorithmic improvements offer 10x speedup relative to
initial implementation and nearly 60x per-node speedup
over Titan
• Nearly perfect parallel scaling efficiency on ORNL’s Summit
supercomputer
• GPU algorithm executes more than 20x faster than CPU
algorithm on Summit (per full node)
• Paper describes first production MC solver implementation
on GPUs
Hamilton, S.P., Evans, T.M., 2019. Continuous-energy Monte Carlo neutron transport on GPUs in the Shift code. Annals of Nuclear Energy 128, 236–247. https://doi.org/10.1016/j.anucene.2019.01.012
Total reaction rate in SMR core
Increase in particle tracking rate across GPU computing architectures
28
Cross section calculations
• Computing transport cross sections requires contributions from various constituents
Σ 𝐸 = σ𝑚=1𝑀 𝑁𝑚𝜎𝑚 (𝐸)
• Fuel compositions contain substantially more nuclides than non-fuel
• Partition mixtures into fuel and non-fuel
– Evaluate cross sections in separate kernels to reduce divergence
29
Occupancy
• Flattened algorithm allows small, focused kernels
– Split geometry/physics components to reduce register usage
– Smaller kernels = higher occupancy
MC type Algorithm Registers Occupancy
Multigroup History-based
Event-based
85
83
25%
25%
Continuous-Energy History-based
Event-based
168
62
12.5%
50%
30
Effect of varying occupancy
• Artificially limit occupancy by allocating shared memory
– kernel<<<grids, blocks, shared_mem>>>(…)
Algorithm
Occupancy (%) History-based Event-based Flattened event-based
12.5 3.7 3.4 8.2
25.0 - 5.8 13.3
37.5 - - 14.5
50.0 - - 16.9
62.5❊ - - 18.0
❊Only applied to “distance to collision kernel”
31
CPU v GPU performance
CPU tracking rate per core GPU core equivalent
GPU performance increases have outpaced corresponding CPU improvements
32
Device saturation
Depleted SMR core
Newest architectures remain unsaturated at 1M particles per GPU
33
Inter-node Scaling
Weak scaling on Summit – 1 GPU per MPI rank
Domain replication parallelism
0 1
2 3
0 1 2 3
Num Particles = N / 4 Num Particles = N / 4 Num Particles = N / 4 Num Particles = N / 4
Multi-set domain decomposition topology(in development – GPU)
Intra-set non-uniform block out to address load balancing
Ellis, J.A., Evans, T.M., Hamilton, S.P., Kelley, C.T., Pandya, T.M., 2019. Optimization of processor allocation for domain decomposed Monte Carlo calculations. Parallel Computing 87, 77–86. https://doi.org/10.1016/j.parco.2019.06.001
34
On-the-Fly Doppler Broadening
• Cross section resonances significantly broaden due to thermal motion of nuclei
• The cross section (𝜎) at any energy (𝐸) and temperature (𝑇) can be expressed as a summation over contributions from poles (𝑝𝑗) and corresponding residues (𝑟𝑗):
• A polynomial approximation can be used to reduce the number of 𝑊 ⋅ evaluations
𝜎 𝐸, 𝑇 =1
𝐸
𝐴𝜋
𝑘𝐵𝑇
𝑗
ℜ 𝑟𝑗 𝑊 𝐸 − 𝑝𝑗 𝐴/𝑘𝐵𝑇
𝜎 𝐸, 𝑇 =1
𝐸
𝐴𝜋
𝑘𝐵𝑇
𝑗
ℜ 𝑟𝑗 𝑊 𝐸 − 𝑝𝑗 𝐴/𝑘𝐵𝑇 +
𝑛=0
𝑁−1
𝑎𝑤,𝑛𝔇𝑛
35
GPU Performance
• Performance testing with a quarter-core model of the awaited NuScale Small Modular Reactor (SMR)
• No significant sacrifice of accuracy compared to standard continuous energy (CE) data
• Each GPU thread does individual Fadeeva evaluations (no vectorization over nuclides)
• Factor of 2-3 performance penalty on both the CPU and GPU for arbitrary temperature resolution
2x IBM Power8+4x NVIDIA Tesla P100
36
Geant-based proxy pilot
Goals
• Research and develop design patterns suitable for HEP transport on GPUs
• Produce a proxy app with limited but representative physics processes
• Execute and profile the proxy app at the scale needed by next-generation HEP experiments
Challenges
• Choosing a scope small enough to digest but can emulate the level of complexity of a real simulation
• Reconciling static (build-time) preference of GPU code with dynamic user requirements
• Effectively utilizing the GPU with a very broad, flat call graph (dozens of independent physics processes)
37
Geant-based proxy pilot
Complete
• Developed requirements document for the proxy app
• Constructed development framework (CMake/Docker/CUDA)
• Integrated CUDA-enabled VecGeom geometry
In progress
• Iterating on high-level code architecture and event loop
• Implementing physics kernels in CUDA
• Awaiting onboarding of postdoc...
Future work
• Explore HIP in preparation for Frontier
• Evaluate ClangJIT for GPU-friendly dynamicism