1 department of energy office of high energy and nuclear physics computational science: present and...

1

Department of Energy Office of High Energy and Nuclear Physics

Computational Science: present and projected potentialsComputational Science: present and projected potentials

David J. DeanORNL

NERSC-NUG/NUGeX meeting, 22-23 February 2001

Outline:• Very general overview of HENP• Some project overviews

• Lattice QCD• PDSF • Nuclear Structure• Accelerator Design• Astrophysics

Outline:• Very general overview of HENP• Some project overviews

• Lattice QCD• PDSF • Nuclear Structure• Accelerator Design• Astrophysics

2

Fermi National Accelerator LaboratoryTevatron

Stanford Linear Accelerator Center SLC, PEP-II

Thomas Jefferson National Accelerator Facility: CEBAF

Brookhaven National Lab.RHIC

Lawrence Berkeley National LaboratoryALS

Argonne National LaboratoryATLAS

DOE has led the Nation in Developing Major Accelerator Facilities

Los Alamos National LaboratoryLANSCE/Lujan

Oak Ridge National Lab.SNS

SSRL

APS

NSLS

IPNS

From Rob Ryne

3

vacuumvacuum

RHICRHIC

FNAL, SLACFNAL, SLACCEBAFCEBAF

RIARIA

Weak decaysWeak decaysmesonsmesons

nucleonsnucleonsQCDQCD

Standard ModelStandard Model

Few-body Few-body systemssystems

free NN forcefree NN force

Many-body Many-body systemssystems

effective NN forceeffective NN force

fewfewnucleonsnucleons

heavyheavynucleinuclei

quarksquarksgluonsgluons

quark-gluonquark-gluonplasmaplasma

QCDQCD

Some of the Science: HENP SNSneutrons and

molecules

4

Lattice Quantum ChromoDynamics (LQCD)

Comprehensive method to extract, with controlled systematic errors, first-principles predictions from QCD for a wide range of important particle phenomena. Scientific Motivations:

1) Tests of the Standard Model: • Quark mixing matrix elements: Vtd , Vts

• CP violating K-meson decays. 2) Quark and gluon distributions in hadrons3) Phase transitions of QCD (in search of the quark-gluon plasma).

Comprehensive method to extract, with controlled systematic errors, first-principles predictions from QCD for a wide range of important particle phenomena. Scientific Motivations:

1) Tests of the Standard Model: • Quark mixing matrix elements: Vtd , Vts

• CP violating K-meson decays. 2) Quark and gluon distributions in hadrons3) Phase transitions of QCD (in search of the quark-gluon plasma).

Conern I: Lattice Spacing (x,y,z,t)Concern II:

Quenched approximation

5

2 (of many) LQCD Examples and people

NERSC involvement (PI’s):370,000 (Toussaint)210,000 (Gupta)190,000 (Soni)187,500 (Sinclair)150,000 (Negele)100,000 (Liu) 40,000 (Lee)-----------1.2 million+ mpp hours

NERSC involvement (PI’s):370,000 (Toussaint)210,000 (Gupta)190,000 (Soni)187,500 (Sinclair)150,000 (Negele)100,000 (Liu) 40,000 (Lee)-----------1.2 million+ mpp hours

I: QGP formationI: QGP formation

1999

2008

II: Lepton decay constants of B-mesonsII: Lepton decay constants of B-mesons

Unitarity triangle: better LQCDcalculations constrain physical parameters tremendously.

Unitarity triangle: better LQCDcalculations constrain physical parameters tremendously.

6

LQCD Computational Needs(from Doug Toussaint)

The lattice is in 4 dimensions (3-space, 1-time):• lattice spacing 1/sqrt(2) current calculations.

• Implies 8X computer power. • Would cut systematic errors in half.

• Scientific gain: push to smaller quark masses and study more complicated phenomena like flavor singlet meson masses.

The lattice is in 4 dimensions (3-space, 1-time):• lattice spacing 1/sqrt(2) current calculations.

• Implies 8X computer power. • Would cut systematic errors in half.

• Scientific gain: push to smaller quark masses and study more complicated phenomena like flavor singlet meson masses.

What is important to this community?• Sustained memory bandwidth and cache performance (present performance on SP at SDSC: 170 Mflops/processor; on the big problem: 70 Mflops/processor due to less cache hits.• Node interconnect bandwidth and latency very important. Frequent global reductions (gsum). • Tremendous potential here, may not be a NERSC issue.

What is important to this community?• Sustained memory bandwidth and cache performance (present performance on SP at SDSC: 170 Mflops/processor; on the big problem: 70 Mflops/processor due to less cache hits.• Node interconnect bandwidth and latency very important. Frequent global reductions (gsum). • Tremendous potential here, may not be a NERSC issue.

Given the straw machine (60 Tflops)…Equation of state for high temperature QCD using 3 dynamical flavors and a lattice spacing of 0.13 fm would be practical.

Given the straw machine (60 Tflops)…Equation of state for high temperature QCD using 3 dynamical flavors and a lattice spacing of 0.13 fm would be practical.

Main Computational Challenge: Inversion of the fermi-matrix(sparse matrix solution).

Main Computational Challenge: Inversion of the fermi-matrix(sparse matrix solution).

7

Parallel Distributed Systems Facility

Today:BaBar:(SlAC B-Factory): CP violationE871: (AGS) CP violation in hyperon decaysCDF: (Fermilab): proton-antiproton colliderD0: (Fermilab)

E895: (AGS): RHIE896: (AGS): RHINA49: (CERN): RHIPhenix: RHIC at BrookhavenGC5: Data mining for the Quark Gluon PlasmaSTAR: RHIC at Brookhaven(85%)

AMANDA: Antarctic Muon and Neutrino Detector ArraySNO: (Sudbury): solar neutrinos.

Today:BaBar:(SlAC B-Factory): CP violationE871: (AGS) CP violation in hyperon decaysCDF: (Fermilab): proton-antiproton colliderD0: (Fermilab)

E895: (AGS): RHIE896: (AGS): RHINA49: (CERN): RHIPhenix: RHIC at BrookhavenGC5: Data mining for the Quark Gluon PlasmaSTAR: RHIC at Brookhaven(85%)

AMANDA: Antarctic Muon and Neutrino Detector ArraySNO: (Sudbury): solar neutrinos.

Evolving to ALICE at LHC ICE CUBE in the Antarctic

Evolving to ALICE at LHC ICE CUBE in the Antarctic

Leads to the experimental Example: One STAR event at RHIC

Computational Characteristics:• processing independent event data is naturally parallel• Large data sets• Distributed or global nature of complete computing picture.

Computational Characteristics:• processing independent event data is naturally parallel• Large data sets• Distributed or global nature of complete computing picture.

A theoretical point of view:

8

• Current STAR activity continues (very certain)• Upgrade to STAR that increases data rate by 3x around 2004• Another large expt (e.g., ALICE or ICE CUBE) chooses PDSF as a major center with usage comparable to STAR in 20005+

• Current STAR activity continues (very certain)• Upgrade to STAR that increases data rate by 3x around 2004• Another large expt (e.g., ALICE or ICE CUBE) chooses PDSF as a major center with usage comparable to STAR in 20005+

Planning Assumptions

Evolution of PDSF (from Doug Olson)

PDSF HOURS needed (1 PDSF hour = 1 T3E hour) FY01: 1.2 M FY02: 1.7 M FY03: 2.3 M FY04: 7.0 M FY05: 20 M FY06: 28 M

PDSF HOURS needed (1 PDSF hour = 1 T3E hour) FY01: 1.2 M FY02: 1.7 M FY03: 2.3 M FY04: 7.0 M FY05: 20 M FY06: 28 M

Disk Storage Capacity(terabytes) FY01: 16 FY02: 32 FY03: 45 FY04: 134 FY05: 375 FY06: 527

Disk Storage Capacity(terabytes) FY01: 16 FY02: 32 FY03: 45 FY04: 134 FY05: 375 FY06: 527

Mass Storage TeraBytes Millions FilesFY01: 16 1FY02: 32 2FY03: 45 3FY04: 134 9FY05: 376 20FY06: 527 30

Mass Storage TeraBytes Millions FilesFY01: 16 1FY02: 32 2FY03: 45 3FY04: 134 9FY05: 376 20FY06: 527 30

Throughput to NERSCFY01: 5 MB/sFY02: 10FY03: 15FY04: 45FY05: 120FY06: 165

Throughput to NERSCFY01: 5 MB/sFY02: 10FY03: 15FY04: 45FY05: 120FY06: 165

Other important factor: HENP experiments are moving towards data grid services: NERSC should plan to be a full function site on the grid.

Other important factor: HENP experiments are moving towards data grid services: NERSC should plan to be a full function site on the grid.

9

Computational Nuclear Structure

pro

ton

s

neutrons

82

50

28

28

50

82

2082

28

20

126

A=10

A=12 A~60

Density F

unctional T

heory

self-

consistent M

ean Field

Ab initiofew-body

calculations No-Core Shell ModelG-matrix

r-process

rp-p

roce

ss

0 h Shell Model

Towards a unifieddescription of the nucleus

Limits of nuclear existence

10

Nuclear Structure Examples: Quantum Monte Carlo

For A=10, each state takes 1.5 Tflop-hours

For A=10, each state takes 1.5 Tflop-hours

ANL/LANL/UIUCNN +3N interactionsANL/LANL/UIUC

NN +3N interactions

Physics of medium mass nuclei: Nuclear shell model with effective NN interactions; application to SN-IA nucleosynthesis

Physics of medium mass nuclei: Nuclear shell model with effective NN interactions; application to SN-IA nucleosynthesis

Start with realistic NN potential fit to low energy NN scattering data, and 3-body potential; calculations performed for nuclear structure using GFMC techniques.

Start with realistic NN potential fit to low energy NN scattering data, and 3-body potential; calculations performed for nuclear structure using GFMC techniques.

11

NERSC involvement (PI’s):125,000 (Pieper) 70,000 (Dean) 60,000 (Carlson) 60,000 (Alhassid)-----------0.32 million+ mpp hours

NERSC involvement (PI’s):125,000 (Pieper) 70,000 (Dean) 60,000 (Carlson) 60,000 (Alhassid)-----------0.32 million+ mpp hours

Projected needs for nuclear structure

Physics to be addressed using GFMC: 12C structure and 3-alpha burning nuclear matter at finite temperature asymmetric nuclear matter

Physics to be addressed using GFMC: 12C structure and 3-alpha burning nuclear matter at finite temperature asymmetric nuclear matter

FY K-MPP hours (total)02 40003 70004 100005 170006 3000

FY K-MPP hours (total)02 40003 70004 100005 170006 3000

Memory needs are very important:1 Gbyte memory/CPU by 2004. Memory needs are very important:1 Gbyte memory/CPU by 2004.

Sustained memory bandwidthand/or cache performance is alsovery important. Pieper is seeing a drop in performance when more CPUs are clustered on a node.

Sustained memory bandwidthand/or cache performance is alsovery important. Pieper is seeing a drop in performance when more CPUs are clustered on a node.

Physics to be addressed using AFMC/NSM:Nuclear structure of A=60-100 nuclei;studies of weak interactions, thermalproperties, and r-process nucleosynthesis

Physics to be addressed using AFMC/NSM:Nuclear structure of A=60-100 nuclei;studies of weak interactions, thermalproperties, and r-process nucleosynthesis

FY K-MPP hours (NERSC only)02 200 03 30004 45005 60006 800

FY K-MPP hours (NERSC only)02 200 03 30004 45005 60006 800

Memory needs:0.25 Gbyte memory/CPU by 2004. Memory needs:0.25 Gbyte memory/CPU by 2004.

Cache performance important (many matrixmatrix multiplies)

Cache performance important (many matrixmatrix multiplies)

12

Next-generation machines will require extreme precision & control; push frontiers of beam energy, beam intensity, system complexity

(supplied by Rob Ryne)

• Simulation requirements/issues:

– require high resolution

– are enormous in size

– CPU-intensive

– highly complex

• Physics issues:– highly three-dimensional– nonlinear– multi-scale– many-body– multi-physics

Tau3P

Omega3P

IMPACT

• Terascale simulation codes are being developed to meet the challenges

13

Challenges in Electromagnetic Systems Simulation:Example – NLC Accelerator Structure (RDDS) Design

Require 0.01% accuracyin accelerating frequency to maintain structure efficiency (High resolution modeling)

Parallel solvers needed to model large, complex 3D electromagnetic structures to high accuracy

• Start w/ cylindrical cell geometry• adjust geometry for maximum efficiency• add micron-scale variations from cell-to-

cell to reduce wakefields• stack into multi-cell structure• Add damping manifold to suppress long-

range wakefields, improve vaccum conductance, but preserve RDS performance. Highly 3D structure.

Verify wake suppressionin entire 206-cell section (System scale simulation)

14

Computer Science Issues

• Meshing. Mesh generations, refinements, quality.

– Complex 3-D geometries – structured and unstructured meshes, and eventually oversetting meshes.

Partitioning.– Domain decomposition.

• Load balancing.

• Impact of memory hierarchy on efficiency. Cache, locally-shared memory, remote memory.

• Visualization of large data sets.

PEP-II cavity modelw/ mesh refinement -

accurate wall losscalculation needed

to guide coolingchannel design

PEP-II cavity modelw/ mesh refinement -

accurate wall losscalculation needed

to guide coolingchannel design

• Performance, scalability, and tuning on terascale platforms

15

• Simulation size for 3D modeling of rf linacs:

– (1283-5123 grid points) x (~20 particles/point) = 40M-2B particles

– 2D linac simulations w/ 1M particles require 1 weekend on PC

• 100Mp PC simulation, if it could be performed, would take 7 months

• New 3D codes already enable 100Mp runs in 10 hours using 256 procs

• Intense beams in rings (PSR, AGS, SNS ring)

– 100 to 1000 times more challenging than linac simulations

• Simulation size for 3D modeling of rf linacs:

– (1283-5123 grid points) x (~20 particles/point) = 40M-2B particles

– 2D linac simulations w/ 1M particles require 1 weekend on PC

• 100Mp PC simulation, if it could be performed, would take 7 months

• New 3D codes already enable 100Mp runs in 10 hours using 256 procs

• Intense beams in rings (PSR, AGS, SNS ring)

– 100 to 1000 times more challenging than linac simulations

Challenges in Beam Systems Simulation

NERSC involvement 0.600+ million MPP hours (mainly Rob Ryne)

NERSC involvement 0.600+ million MPP hours (mainly Rob Ryne)

16

Supernova simulations

Spherically symmetric simulations of the core collapse including Boltzman neutrino transport fails to explode. Indicates need to a) improve nuclear physics inputs b) move to 2,3 dimensional simulationsCalculations done on PVP platforms, movingto MPP presently.

Spherically symmetric simulations of the core collapse including Boltzman neutrino transport fails to explode. Indicates need to a) improve nuclear physics inputs b) move to 2,3 dimensional simulationsCalculations done on PVP platforms, movingto MPP presently.

From Mezzacappa

Core Collapse Supernova

Type 1A supernova (from Nugent)

17

Supernova simulations computational needs

Important computational issues:• Optimized FP performance• memory performance is very important. • Infrequent 50 Mbyte messages (send/receive). • Communications with global file systems• Crucial need for global storage (GPFS) with many I/O nodes• HPSS is very important for data storage

Important computational issues:• Optimized FP performance• memory performance is very important. • Infrequent 50 Mbyte messages (send/receive). • Communications with global file systems• Crucial need for global storage (GPFS) with many I/O nodes• HPSS is very important for data storage

With the straw-system:• chaotic velocity fields, 2D maybe 3D calculations with good input physics. • SMP somewhat useless for this application (cpus on one node run independently using MPI).

With the straw-system:• chaotic velocity fields, 2D maybe 3D calculations with good input physics. • SMP somewhat useless for this application (cpus on one node run independently using MPI).

Supernova Cosmology projectCore Collapse Supernova project(hydro+Boltzman neutrino transport)

NERSC-4 platform:Year 3-D MGFLD models 2-D MGBT models node hrs Memory node hrs Memory1 520,000 62G --------------2 260,000 62G 260,000 25G3 ---------------- 750,000 25G4 ---------------- 1,000,000 100G5 ? -------- (3-D MGBG?) 2,000,000 256G

NERSC-4 platform:Year 3-D MGFLD models 2-D MGBT models node hrs Memory node hrs Memory1 520,000 62G --------------2 260,000 62G 260,000 25G3 ---------------- 750,000 25G4 ---------------- 1,000,000 100G5 ? -------- (3-D MGBG?) 2,000,000 256G

From Peter Nugent

Assumptions: Yr. 1: 3-D Newtonian MGFLD to understand convection when compared to 2-D Yr. 2: general relativistic 3-D MGFLD to compare to Newtonian models Yr. 3: 2-D MGBT at moderate resolution Yr. 4: 2-D MGBT at high resolution with AMR technology Yr. 5: may expand to 3-D MGBT….. But will require growth of NERSC (NERSC-5 phase in?)

Assumptions: Yr. 1: 3-D Newtonian MGFLD to understand convection when compared to 2-D Yr. 2: general relativistic 3-D MGFLD to compare to Newtonian models Yr. 3: 2-D MGBT at moderate resolution Yr. 4: 2-D MGBT at high resolution with AMR technology Yr. 5: may expand to 3-D MGBT….. But will require growth of NERSC (NERSC-5 phase in?)

From Doug SwestyCurrent NERSC involvement Nugent: 125,000 MPP Mezzacappa 43,500 MPP Total: 0.15+ MPP

Current NERSC involvement Nugent: 125,000 MPP Mezzacappa 43,500 MPP Total: 0.15+ MPP

18

People I left out

• Haiyan Gao (MIT) 3-body problem; relativistic effects in e,e’ scattering. 24,000 MPP/year• G. Malli / Walter Loveland (Simon Fraser) Coupled Cluster methods for the chemical structure of super heavy elements. ( 15,000 PVP).

• Haiyan Gao (MIT) 3-body problem; relativistic effects in e,e’ scattering. 24,000 MPP/year• G. Malli / Walter Loveland (Simon Fraser) Coupled Cluster methods for the chemical structure of super heavy elements. ( 15,000 PVP).

Big user who did not respond• Chan Joshi -- 287,500 MPP hours (Plasma driven accelerators).

Big user who did not respond• Chan Joshi -- 287,500 MPP hours (Plasma driven accelerators).

General Conclusion:• More CPU is good for most people. • Bytes/Flop ratio of 0.5 is okay for most people. • Concern with memory access on single node. • Concern with access to large disk space and HPSS (important to several groups)

• Exciting physics portfolio matched with DOE facilities and requiring computation.

General Conclusion:• More CPU is good for most people. • Bytes/Flop ratio of 0.5 is okay for most people. • Concern with memory access on single node. • Concern with access to large disk space and HPSS (important to several groups)

• Exciting physics portfolio matched with DOE facilities and requiring computation.

1 department of energy office of high energy and nuclear physics computational science: present and...

Documents