1 department of energy office of high energy and nuclear physics computational science: present and...
TRANSCRIPT
1
Department of Energy Office of High Energy and Nuclear Physics
Computational Science: present and projected potentialsComputational Science: present and projected potentials
David J. DeanORNL
NERSC-NUG/NUGeX meeting, 22-23 February 2001
Outline:• Very general overview of HENP• Some project overviews
• Lattice QCD• PDSF • Nuclear Structure• Accelerator Design• Astrophysics
Outline:• Very general overview of HENP• Some project overviews
• Lattice QCD• PDSF • Nuclear Structure• Accelerator Design• Astrophysics
2
Fermi National Accelerator LaboratoryTevatron
Stanford Linear Accelerator Center SLC, PEP-II
Thomas Jefferson National Accelerator Facility: CEBAF
Brookhaven National Lab.RHIC
Lawrence Berkeley National LaboratoryALS
Argonne National LaboratoryATLAS
DOE has led the Nation in Developing Major Accelerator Facilities
Los Alamos National LaboratoryLANSCE/Lujan
Oak Ridge National Lab.SNS
SSRL
APS
NSLS
IPNS
From Rob Ryne
3
vacuumvacuum
RHICRHIC
FNAL, SLACFNAL, SLACCEBAFCEBAF
RIARIA
Weak decaysWeak decaysmesonsmesons
nucleonsnucleonsQCDQCD
Standard ModelStandard Model
Few-body Few-body systemssystems
free NN forcefree NN force
Many-body Many-body systemssystems
effective NN forceeffective NN force
fewfewnucleonsnucleons
heavyheavynucleinuclei
quarksquarksgluonsgluons
quark-gluonquark-gluonplasmaplasma
QCDQCD
Some of the Science: HENP SNSneutrons and
molecules
4
Lattice Quantum ChromoDynamics (LQCD)
Comprehensive method to extract, with controlled systematic errors, first-principles predictions from QCD for a wide range of important particle phenomena. Scientific Motivations:
1) Tests of the Standard Model: • Quark mixing matrix elements: Vtd , Vts
• CP violating K-meson decays. 2) Quark and gluon distributions in hadrons3) Phase transitions of QCD (in search of the quark-gluon plasma).
Comprehensive method to extract, with controlled systematic errors, first-principles predictions from QCD for a wide range of important particle phenomena. Scientific Motivations:
1) Tests of the Standard Model: • Quark mixing matrix elements: Vtd , Vts
• CP violating K-meson decays. 2) Quark and gluon distributions in hadrons3) Phase transitions of QCD (in search of the quark-gluon plasma).
Conern I: Lattice Spacing (x,y,z,t)Concern II:
Quenched approximation
5
2 (of many) LQCD Examples and people
NERSC involvement (PI’s):370,000 (Toussaint)210,000 (Gupta)190,000 (Soni)187,500 (Sinclair)150,000 (Negele)100,000 (Liu) 40,000 (Lee)-----------1.2 million+ mpp hours
NERSC involvement (PI’s):370,000 (Toussaint)210,000 (Gupta)190,000 (Soni)187,500 (Sinclair)150,000 (Negele)100,000 (Liu) 40,000 (Lee)-----------1.2 million+ mpp hours
I: QGP formationI: QGP formation
1999
2008
II: Lepton decay constants of B-mesonsII: Lepton decay constants of B-mesons
Unitarity triangle: better LQCDcalculations constrain physical parameters tremendously.
Unitarity triangle: better LQCDcalculations constrain physical parameters tremendously.
6
LQCD Computational Needs(from Doug Toussaint)
The lattice is in 4 dimensions (3-space, 1-time):• lattice spacing 1/sqrt(2) current calculations.
• Implies 8X computer power. • Would cut systematic errors in half.
• Scientific gain: push to smaller quark masses and study more complicated phenomena like flavor singlet meson masses.
The lattice is in 4 dimensions (3-space, 1-time):• lattice spacing 1/sqrt(2) current calculations.
• Implies 8X computer power. • Would cut systematic errors in half.
• Scientific gain: push to smaller quark masses and study more complicated phenomena like flavor singlet meson masses.
What is important to this community?• Sustained memory bandwidth and cache performance (present performance on SP at SDSC: 170 Mflops/processor; on the big problem: 70 Mflops/processor due to less cache hits.• Node interconnect bandwidth and latency very important. Frequent global reductions (gsum). • Tremendous potential here, may not be a NERSC issue.
What is important to this community?• Sustained memory bandwidth and cache performance (present performance on SP at SDSC: 170 Mflops/processor; on the big problem: 70 Mflops/processor due to less cache hits.• Node interconnect bandwidth and latency very important. Frequent global reductions (gsum). • Tremendous potential here, may not be a NERSC issue.
Given the straw machine (60 Tflops)…Equation of state for high temperature QCD using 3 dynamical flavors and a lattice spacing of 0.13 fm would be practical.
Given the straw machine (60 Tflops)…Equation of state for high temperature QCD using 3 dynamical flavors and a lattice spacing of 0.13 fm would be practical.
Main Computational Challenge: Inversion of the fermi-matrix(sparse matrix solution).
Main Computational Challenge: Inversion of the fermi-matrix(sparse matrix solution).
7
Parallel Distributed Systems Facility
Today:BaBar:(SlAC B-Factory): CP violationE871: (AGS) CP violation in hyperon decaysCDF: (Fermilab): proton-antiproton colliderD0: (Fermilab)
E895: (AGS): RHIE896: (AGS): RHINA49: (CERN): RHIPhenix: RHIC at BrookhavenGC5: Data mining for the Quark Gluon PlasmaSTAR: RHIC at Brookhaven(85%)
AMANDA: Antarctic Muon and Neutrino Detector ArraySNO: (Sudbury): solar neutrinos.
Today:BaBar:(SlAC B-Factory): CP violationE871: (AGS) CP violation in hyperon decaysCDF: (Fermilab): proton-antiproton colliderD0: (Fermilab)
E895: (AGS): RHIE896: (AGS): RHINA49: (CERN): RHIPhenix: RHIC at BrookhavenGC5: Data mining for the Quark Gluon PlasmaSTAR: RHIC at Brookhaven(85%)
AMANDA: Antarctic Muon and Neutrino Detector ArraySNO: (Sudbury): solar neutrinos.
Evolving to ALICE at LHC ICE CUBE in the Antarctic
Evolving to ALICE at LHC ICE CUBE in the Antarctic
Leads to the experimental Example: One STAR event at RHIC
Computational Characteristics:• processing independent event data is naturally parallel• Large data sets• Distributed or global nature of complete computing picture.
Computational Characteristics:• processing independent event data is naturally parallel• Large data sets• Distributed or global nature of complete computing picture.
A theoretical point of view:
8
• Current STAR activity continues (very certain)• Upgrade to STAR that increases data rate by 3x around 2004• Another large expt (e.g., ALICE or ICE CUBE) chooses PDSF as a major center with usage comparable to STAR in 20005+
• Current STAR activity continues (very certain)• Upgrade to STAR that increases data rate by 3x around 2004• Another large expt (e.g., ALICE or ICE CUBE) chooses PDSF as a major center with usage comparable to STAR in 20005+
Planning Assumptions
Evolution of PDSF (from Doug Olson)
PDSF HOURS needed (1 PDSF hour = 1 T3E hour) FY01: 1.2 M FY02: 1.7 M FY03: 2.3 M FY04: 7.0 M FY05: 20 M FY06: 28 M
PDSF HOURS needed (1 PDSF hour = 1 T3E hour) FY01: 1.2 M FY02: 1.7 M FY03: 2.3 M FY04: 7.0 M FY05: 20 M FY06: 28 M
Disk Storage Capacity(terabytes) FY01: 16 FY02: 32 FY03: 45 FY04: 134 FY05: 375 FY06: 527
Disk Storage Capacity(terabytes) FY01: 16 FY02: 32 FY03: 45 FY04: 134 FY05: 375 FY06: 527
Mass Storage TeraBytes Millions FilesFY01: 16 1FY02: 32 2FY03: 45 3FY04: 134 9FY05: 376 20FY06: 527 30
Mass Storage TeraBytes Millions FilesFY01: 16 1FY02: 32 2FY03: 45 3FY04: 134 9FY05: 376 20FY06: 527 30
Throughput to NERSCFY01: 5 MB/sFY02: 10FY03: 15FY04: 45FY05: 120FY06: 165
Throughput to NERSCFY01: 5 MB/sFY02: 10FY03: 15FY04: 45FY05: 120FY06: 165
Other important factor: HENP experiments are moving towards data grid services: NERSC should plan to be a full function site on the grid.
Other important factor: HENP experiments are moving towards data grid services: NERSC should plan to be a full function site on the grid.
9
Computational Nuclear Structure
pro
ton
s
neutrons
82
50
28
28
50
82
2082
28
20
126
A=10
A=12 A~60
Density F
unctional T
heory
self-
consistent M
ean Field
Ab initiofew-body
calculations No-Core Shell ModelG-matrix
r-process
rp-p
roce
ss
0 h Shell Model
Towards a unifieddescription of the nucleus
Limits of nuclear existence
10
Nuclear Structure Examples: Quantum Monte Carlo
For A=10, each state takes 1.5 Tflop-hours
For A=10, each state takes 1.5 Tflop-hours
ANL/LANL/UIUCNN +3N interactionsANL/LANL/UIUC
NN +3N interactions
Physics of medium mass nuclei: Nuclear shell model with effective NN interactions; application to SN-IA nucleosynthesis
Physics of medium mass nuclei: Nuclear shell model with effective NN interactions; application to SN-IA nucleosynthesis
Start with realistic NN potential fit to low energy NN scattering data, and 3-body potential; calculations performed for nuclear structure using GFMC techniques.
Start with realistic NN potential fit to low energy NN scattering data, and 3-body potential; calculations performed for nuclear structure using GFMC techniques.
11
NERSC involvement (PI’s):125,000 (Pieper) 70,000 (Dean) 60,000 (Carlson) 60,000 (Alhassid)-----------0.32 million+ mpp hours
NERSC involvement (PI’s):125,000 (Pieper) 70,000 (Dean) 60,000 (Carlson) 60,000 (Alhassid)-----------0.32 million+ mpp hours
Projected needs for nuclear structure
Physics to be addressed using GFMC: 12C structure and 3-alpha burning nuclear matter at finite temperature asymmetric nuclear matter
Physics to be addressed using GFMC: 12C structure and 3-alpha burning nuclear matter at finite temperature asymmetric nuclear matter
FY K-MPP hours (total)02 40003 70004 100005 170006 3000
FY K-MPP hours (total)02 40003 70004 100005 170006 3000
Memory needs are very important:1 Gbyte memory/CPU by 2004. Memory needs are very important:1 Gbyte memory/CPU by 2004.
Sustained memory bandwidthand/or cache performance is alsovery important. Pieper is seeing a drop in performance when more CPUs are clustered on a node.
Sustained memory bandwidthand/or cache performance is alsovery important. Pieper is seeing a drop in performance when more CPUs are clustered on a node.
Physics to be addressed using AFMC/NSM:Nuclear structure of A=60-100 nuclei;studies of weak interactions, thermalproperties, and r-process nucleosynthesis
Physics to be addressed using AFMC/NSM:Nuclear structure of A=60-100 nuclei;studies of weak interactions, thermalproperties, and r-process nucleosynthesis
FY K-MPP hours (NERSC only)02 200 03 30004 45005 60006 800
FY K-MPP hours (NERSC only)02 200 03 30004 45005 60006 800
Memory needs:0.25 Gbyte memory/CPU by 2004. Memory needs:0.25 Gbyte memory/CPU by 2004.
Cache performance important (many matrixmatrix multiplies)
Cache performance important (many matrixmatrix multiplies)
12
Next-generation machines will require extreme precision & control; push frontiers of beam energy, beam intensity, system complexity
(supplied by Rob Ryne)
• Simulation requirements/issues:
– require high resolution
– are enormous in size
– CPU-intensive
– highly complex
• Physics issues:– highly three-dimensional– nonlinear– multi-scale– many-body– multi-physics
Tau3P
Omega3P
IMPACT
• Terascale simulation codes are being developed to meet the challenges
13
Challenges in Electromagnetic Systems Simulation:Example – NLC Accelerator Structure (RDDS) Design
Require 0.01% accuracyin accelerating frequency to maintain structure efficiency (High resolution modeling)
Parallel solvers needed to model large, complex 3D electromagnetic structures to high accuracy
• Start w/ cylindrical cell geometry• adjust geometry for maximum efficiency• add micron-scale variations from cell-to-
cell to reduce wakefields• stack into multi-cell structure• Add damping manifold to suppress long-
range wakefields, improve vaccum conductance, but preserve RDS performance. Highly 3D structure.
Verify wake suppressionin entire 206-cell section (System scale simulation)
14
Computer Science Issues
• Meshing. Mesh generations, refinements, quality.
– Complex 3-D geometries – structured and unstructured meshes, and eventually oversetting meshes.
Partitioning.– Domain decomposition.
• Load balancing.
• Impact of memory hierarchy on efficiency. Cache, locally-shared memory, remote memory.
• Visualization of large data sets.
PEP-II cavity modelw/ mesh refinement -
accurate wall losscalculation needed
to guide coolingchannel design
PEP-II cavity modelw/ mesh refinement -
accurate wall losscalculation needed
to guide coolingchannel design
• Performance, scalability, and tuning on terascale platforms
15
• Simulation size for 3D modeling of rf linacs:
– (1283-5123 grid points) x (~20 particles/point) = 40M-2B particles
– 2D linac simulations w/ 1M particles require 1 weekend on PC
• 100Mp PC simulation, if it could be performed, would take 7 months
• New 3D codes already enable 100Mp runs in 10 hours using 256 procs
• Intense beams in rings (PSR, AGS, SNS ring)
– 100 to 1000 times more challenging than linac simulations
• Simulation size for 3D modeling of rf linacs:
– (1283-5123 grid points) x (~20 particles/point) = 40M-2B particles
– 2D linac simulations w/ 1M particles require 1 weekend on PC
• 100Mp PC simulation, if it could be performed, would take 7 months
• New 3D codes already enable 100Mp runs in 10 hours using 256 procs
• Intense beams in rings (PSR, AGS, SNS ring)
– 100 to 1000 times more challenging than linac simulations
Challenges in Beam Systems Simulation
NERSC involvement 0.600+ million MPP hours (mainly Rob Ryne)
NERSC involvement 0.600+ million MPP hours (mainly Rob Ryne)
16
Supernova simulations
Spherically symmetric simulations of the core collapse including Boltzman neutrino transport fails to explode. Indicates need to a) improve nuclear physics inputs b) move to 2,3 dimensional simulationsCalculations done on PVP platforms, movingto MPP presently.
Spherically symmetric simulations of the core collapse including Boltzman neutrino transport fails to explode. Indicates need to a) improve nuclear physics inputs b) move to 2,3 dimensional simulationsCalculations done on PVP platforms, movingto MPP presently.
From Mezzacappa
Core Collapse Supernova
Type 1A supernova (from Nugent)
17
Supernova simulations computational needs
Important computational issues:• Optimized FP performance• memory performance is very important. • Infrequent 50 Mbyte messages (send/receive). • Communications with global file systems• Crucial need for global storage (GPFS) with many I/O nodes• HPSS is very important for data storage
Important computational issues:• Optimized FP performance• memory performance is very important. • Infrequent 50 Mbyte messages (send/receive). • Communications with global file systems• Crucial need for global storage (GPFS) with many I/O nodes• HPSS is very important for data storage
With the straw-system:• chaotic velocity fields, 2D maybe 3D calculations with good input physics. • SMP somewhat useless for this application (cpus on one node run independently using MPI).
With the straw-system:• chaotic velocity fields, 2D maybe 3D calculations with good input physics. • SMP somewhat useless for this application (cpus on one node run independently using MPI).
Supernova Cosmology projectCore Collapse Supernova project(hydro+Boltzman neutrino transport)
NERSC-4 platform:Year 3-D MGFLD models 2-D MGBT models node hrs Memory node hrs Memory1 520,000 62G --------------2 260,000 62G 260,000 25G3 ---------------- 750,000 25G4 ---------------- 1,000,000 100G5 ? -------- (3-D MGBG?) 2,000,000 256G
NERSC-4 platform:Year 3-D MGFLD models 2-D MGBT models node hrs Memory node hrs Memory1 520,000 62G --------------2 260,000 62G 260,000 25G3 ---------------- 750,000 25G4 ---------------- 1,000,000 100G5 ? -------- (3-D MGBG?) 2,000,000 256G
From Peter Nugent
Assumptions: Yr. 1: 3-D Newtonian MGFLD to understand convection when compared to 2-D Yr. 2: general relativistic 3-D MGFLD to compare to Newtonian models Yr. 3: 2-D MGBT at moderate resolution Yr. 4: 2-D MGBT at high resolution with AMR technology Yr. 5: may expand to 3-D MGBT….. But will require growth of NERSC (NERSC-5 phase in?)
Assumptions: Yr. 1: 3-D Newtonian MGFLD to understand convection when compared to 2-D Yr. 2: general relativistic 3-D MGFLD to compare to Newtonian models Yr. 3: 2-D MGBT at moderate resolution Yr. 4: 2-D MGBT at high resolution with AMR technology Yr. 5: may expand to 3-D MGBT….. But will require growth of NERSC (NERSC-5 phase in?)
From Doug SwestyCurrent NERSC involvement Nugent: 125,000 MPP Mezzacappa 43,500 MPP Total: 0.15+ MPP
Current NERSC involvement Nugent: 125,000 MPP Mezzacappa 43,500 MPP Total: 0.15+ MPP
18
People I left out
• Haiyan Gao (MIT) 3-body problem; relativistic effects in e,e’ scattering. 24,000 MPP/year• G. Malli / Walter Loveland (Simon Fraser) Coupled Cluster methods for the chemical structure of super heavy elements. ( 15,000 PVP).
• Haiyan Gao (MIT) 3-body problem; relativistic effects in e,e’ scattering. 24,000 MPP/year• G. Malli / Walter Loveland (Simon Fraser) Coupled Cluster methods for the chemical structure of super heavy elements. ( 15,000 PVP).
Big user who did not respond• Chan Joshi -- 287,500 MPP hours (Plasma driven accelerators).
Big user who did not respond• Chan Joshi -- 287,500 MPP hours (Plasma driven accelerators).
General Conclusion:• More CPU is good for most people. • Bytes/Flop ratio of 0.5 is okay for most people. • Concern with memory access on single node. • Concern with access to large disk space and HPSS (important to several groups)
• Exciting physics portfolio matched with DOE facilities and requiring computation.
General Conclusion:• More CPU is good for most people. • Bytes/Flop ratio of 0.5 is okay for most people. • Concern with memory access on single node. • Concern with access to large disk space and HPSS (important to several groups)
• Exciting physics portfolio matched with DOE facilities and requiring computation.