computational needs of fusion...
TRANSCRIPT
Max-Planck-Institut für Plasmaphysik
Computational needs of fusion scientists
David Coster
Outline
• Why fusion
• Computational needs
• Laptops to Exaflops
• A few examples
• Some personal observations
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 2
Fusion
• Energy source for the
sun and other stars
• Provides a potential
source of base load
energy production
• Been working on this
for more than 50 years
• Has turned out to be a
very difficult problem
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 3
Fusion
• Two main lines of research
• Inertial confinement
• Implosion of small pellets
• NIF at LLNL
• Magnetic confinement
• Two main lines of research at the
moment
– Stellarator – W7X
» Currently under construction
in Greifswald in Germany
– Tokamak – ITER
» To be constructed in
Cadarache in France
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 4
ITER
Involves 7 partners
representing more
than 50% world
population
Costs > 10 G$
Under construction
in Cadarache,
France
Key element on the
path to fusion
energy production David Coster | HPC Users Conference | Poznan | 2015-05 | Page 5
Units
Plasma Major
Radius 6.2 m
Plasma
Minor Radius 2.0 m
Plasma
Volume 840 m3
Plasma
Current 15.0 MA
Toroidal Field
on Axis 5.3 T
Fusion
Power 500 MW
Burn Flat Top >400 s
Power
Amplification >10
ITER
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 6
2010-07-15
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 7
2015-04-16
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 8
2015-04-16 …
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 9
2015-04-16 …
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 10
Computational needs vary tremendously
• At the low end, a laptop with a spreadsheet
• Experimental data acquisition
• Current experiments produce ~ 1 GB/s for ~ 10 s
• Next generation experiments will have pulse lengths of ~ 1000s
• Workflows in place to process acquired data
• Modelling needs
• Codes range from 0D – 6D
• Some can be run on that laptop
• Others require medium scale resources
• Others push the bounds of current technology
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 11
Fusion Experiment Use Case
Experimental data
• stored in the machines experimental data system
• “raw” data is not versioned and is immutable
• derived data depends on raw data, other data (e.g. calibration data), programs
• derived data is versioned
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 12
Fusion Modelling Use Case
Simulation data
• might use experimental
data as input
• might use other
“standard” data
• might use other
simulation data
• might be used for other
simulations
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 13
Need to do a better job of capturing Provenance Data
• H2020 proposal:
• PROVENCE: PROVenance ENabled Collaborative Environment
• Involves a number of partners including PSNC
• Waiting to hear back from the Commission
• Failed
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 14
1d
2d
Real problem is 3d
space, 2/3d velocity
Simulations
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 15
Models describing the plasma vary in
complexity
Ion
Turbulence
Atomic
10-9
10-6
10-3
1
10+3
10-9 10-6 10-3 1 10+3
meters
seconds
Sheath
10-12
ICRH
Core
Transport
Electron
Turbulence
ECRH
Slowing
Down
Erosion
Edge
Transport
5D
4-6D
3D
2-3D
1D
NTMs
AEs
Ion
Turbulence
Atomic
10-9
10-6
10-3
1
10+3
10-9 10-6 10-3 1 10+3
meters
seconds
Sheath
10-12
ICRH
Core
Transport
Electron
Turbulence
ECRH
Slowing
Down
Erosion
Edge
Transport
5D
4-6D
3D
2-3D
1D
NTMs
AEs
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 16
Paradigm shift in modelling: monolithic multiphysics
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 17
EQUIL
ETS
NBI ICRH NEUTRALS
ELM(pr)
CORE2EQ SOURCE_COMBINER
ELM(t)
ETS Workflow
NEO NTM(t) Sawteeth(t) ECRH
Converged No
Yes
Pellets (pr)
dt management
T=T+dt CORE2EQ
IMPURITIES
TURB
TRANSPORT_COMBINER
Sawteeth(pr)
Iteration loop
Time loop
Shape, position,
controller
Free Boundary
Equilibrium
EQUIL?
EQUIL?
European Transport Simulator
• Implemented in Kepler Scientific Workflow Engine
• Built on ontologies created by European Fusion
Development Agreement (EFDA) Task Force on
Integrated Modelling
• Now EUROfusion Work Package on Code Development for
Integrated Modelling (WPCD)
• Capable of using:
• Local (node) resources
• Local batch resources
• Connections to remote HPC facilities via UNICORE
• GRID computing resources
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 18
Also exploring other methodologies
• MAPPER project
• MUSCLE framework
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 19
Multi-scale necessity
• For example, in the field of fusion, the “holy grail” of understanding the behaviour of current and future tokamaks is to determine the effect of micro-turbulence on the global behaviour of the plasma.
• ASDEX Upgrade (a tokamak with a major radius of 1.65m), covering the transport time-scale, would require about 1.25x108 core hours.
• ITER (with a major radius of 6.2m) would require a small multiple of 3x1010 core-hours. • Using 80.000 cores and assuming perfect scaling this translates to 43
years.
• On a machine with 1000 times this number of cores it would require 16 days.
• The multiscale approach planned for this proposal [COMPAT] will reduce this considerably.
• These numbers might however be on the optimistic side since they are based on the assumption that ion scale dynamics is dominant. If, as some people fear, electron scale dynamics is also important, then the direct scaling would require something like 3x1013 core-hours for ASDEX Upgrade and 6x1015 core-hours for ITER - making a multi-scale approach absolutely crucial!
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 20
Rough complexity estimates
Core Pedestal Separatrix
n 1,00E+20 1,00E+20 4,00E+19
T 2,00E+04 5,00E+03 2,00E+02
B 5,00E+00 5,00E+00 5,00E+00
R 6,20E+00 6,20E+00 6,20E+00
Aspect ratio 3,00E+00 3,00E+00 3,00E+00
Kappa 1,50E+01 1,50E+01 1,50E+01
Area 2,01E+02 2,01E+02 2,01E+02
Volume 7,84E+03 7,84E+03 7,84E+03
time 1,00E+03 1,00E+03 1,00E+03
electron plasma frequency 8,98E+10 8,98E+10 5,68E+10
debye length 1,05E-04 5,25E-05 1,66E-05
space units 6,76E+15 5,41E+16 1,71E+18
time units 8,98E+13 8,98E+13 5,68E+13
ion gyrofrequency 7,60E+07 7,60E+07 7,60E+07
ion gyroradius 2,88E-03 1,44E-03 2,88E-04
space units 2,42E+07 9,67E+07 2,42E+09
electron gyrofrequency 1,40E+11 1,40E+11 1,40E+11
electron gyroradius 6,73E-05 3,37E-05 6,73E-06
space units 4,44E+10 1,78E+11 4,44E+12
particles 7,84E+23 7,84E+23 3,14E+23
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 21
What resources are EU fusion scientists using
• Local resources
• IPP (vary depending where you are, here IPP as an example)
• TOK-S cluster: 84 nodes with 20 (real) cores each, GBE
• TOK-P cluster: 42 nodes with 16 (real) cores each, IB
• MPG Hydra HPC:
– IPP has 15-20% of ~ 83.000 cores with a main memory of 280 TB and a peak performance of about 1.7 PetaFlop/s. The accelerator part of the HPC cluster has a peak performance of about 1 PetaFlop/s.
• JET
• 125 nodes with a total of 605 processor cores (738 Gigaflops/sec)
• ITM/WPCD Gateway
• 20 nodes with 16 (real) cores each, IB
• HELIOS HPC in Japan
• EU has ~ 50% of
• 1.555 Tflop/s [4500 node with 16 (real cores)]
• 0.412 Pflop/s [180 MIC nodes]
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 22
At the high end …
• “In November, the US government announced it will build Summit, a $325m supercomputer capable of performing 300 quadrillion calculations per second if you redline it.” [http://www.theregister.co.uk/2015/04/15/summit_projects/]
• “When installed at the Oak Ridge National Laboratory in 2017 and powered up by 2018, it will be the fastest computer in the world compared to its publicly known rivals as they stand today.” [http://www.theregister.co.uk/2015/04/15/summit_projects/]
• In preparation for next-generation supercomputer Summit, the Oak Ridge Leadership Computing Facility (OLCF) selected 13 partnership projects into its Center for Accelerated Application Readiness (CAAR) program. [https://www.olcf.ornl.gov/caar/] • Code: GTC
Science Domain: Plasma Physics Title: Particle Turbulence Simulations for Sustainable Fusion Reactions in ITER PI: Zhihong Lin, University of California–Irvine
• Code: XGC Science Domain: Plasma Physics Title: Multiphysics Magnetic Fusion Reactor Simulator, from Hot Core to Cold Wall PI: C.S. Chang, Princeton Plasma Physics Laboratory, Princeton University
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 23
Other needs
• Help with optimizing codes:
• In one, admittedly extreme, example:
• Factor 60 speed-up in a scientists code (going from 1 core to 20
cores)
• EUROfusion funded High Level Support Team
• Annual call for proposals
• One issue is that some of the big codes have been looked at by
• DEISA
• EUFORIA
• PRACE
• HLST
Significant improvements in these codes are hard to find
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 24
Some examples … SOLPS
• SOLPS
• Code in wide use to simulate the plasma in the edge of a
Tokamak
• Combination of B2 (fluid plasma) + EIRENE (Monte-Carlo
neutrals)
• Simulations for ITER take about 3 months each
• Would like to speed up the code by a factor of ~ 100
• Parallelization
• EIRENE 50-95% of time, MPI, “nearly perfect”
• B2 5-100% of time, OpenMP, factor 6 with 20 cores
• Also looking at other approaches
• Including
– Time parallelization (parareal)
– Reduced physics
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 25
In one slide …
David Coster | SOLPS-ITER Release Workshop | ITER | 2015-04-14 | Page 26
N 1 1 16 32 64 150 1 16 32 64 150
1e-7 -> 1e-5 100,00 100,00
1/4 grid
cells4,00 4,00
Bundling 2,14 8,43 8,43
Eirene MPI
(95 %)1,00 9,14 12,55 15,42 1,00 0,57 0,39 0,24 0,95
Eirene MPI
(80%)1,00 4,00 4,44 4,71 1,00 0,25 0,14 0,07 0,80
Fluid
neutrals2,00 20,00 20,00 0,00 0,00 0,00
B2
OpenMP1,00 6,00 6,00 6,00 1,00 0,38 0,19 0,09
B2-Eirene
(95%)1,00 14,77 26,30 43,15 1,00 0,92 0,82 0,67 0,95
B2-Eirene
(80%)1,00 12,00 17,14 21,82 1,00 0,75 0,54 0,34 0,80
B2-Eirene
(50%)1,00 8,73 10,11 10,97 1,00 0,55 0,32 0,17 0,50
Parareal 10,00 0,07
Better
feedback3,00 3,00
SOLPS Speed-Up
Speed-Up Gain Eirene
Fraction
Part of a parameter scan (species, power, DT-puff, Impurity puff)
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 27
• Full model: each point would take approximately 1 year
• Reduced model: each point takes less than a week
Some examples … JOREK
• With the current numerics, we get roughly the following estimate for a large simulation.. • 400 compute nodes on Helios for 150 hours
=> ~20 TB RAM => ~6000 cores => ~1M core-hours (~60k node-hours)
• If I assume that we would need to increase the resolution in each direction by a factor of two to three to get to the necessary resolution for ITER at realistic parameters, I get the following rough estimate (making rather optimistic assumptions on our scaling): • 20 TB * 100 = 1 PB RAM
number of nodes/cores to provide this amount of memory 1M * 1000 = 1G core-hours
• With better preconditioning, the memory consumption should drop a lot and the scalability should increase, but this has still to be tested and then implemented into the production code (order 3 years, I fear)
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 28
HELIOS Successor: Expert Group Recommendations (subset)
• purchase decision of an HPC platform be taken before the end of June 2015 for a start of operation in production phase by January 1st, 2017.
• computing capacity with a peak power of at least 8 PetaFlop/s dedicated to fusion research in Europe.
• the acquisition of the HPC system in two steps.
• the first step is the purchase of a 4 PFlop/s system to be installed by the end of 2016,
• to be followed by an extension up to 8 PFlop/s in 2018.
• computing capacity to be provided either
• by an HPC system to be hosted in an existing Computer Centre (CC) in Europe
• or, in the case where the Broader Approach (BA) agreement is extended beyond 2016, in the existing CC in Rokkasho with the investment and operation costs shared with Japan.
• The EG recommends initially considering the viability of the option of an HPC system hosted in a CC in Europe by issuing a Call for Expression of Interest in January 2015, with a deadline of the end of March 2015. This would allow 3 months (i.e. until end of June 2015) to examine other options.
• The EG recommends the system to be dominantly equipped with conventional processors only, but including some processing elements with new technology related to NVIDIA GPUs and Intel Xeon Phi systems.
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 29
Some observations from the HPC Questionnaire David Coster, 2014-11-04
• 48 responses (some still coming in!)
• Estimated MCPU-hours • Current (all HPC) 1505
• (biased by 1 point; if dropped then 758.5)
• Current (HELIOS) 320
• Current (Needs) 1210
• Predicted (Needs) 5155
• Ratio (to HELIOS) 16
• Ratio (to current) 4.3
• HELIOS accounts for more than 60% of cycles for 64% of users.
• 72% of users estimated needs going up by 2 – 10
• More than half of codes can already do OpenMP + MPI
• Number of cores used • Production (current, typical): 47% < 1024; 36% 1024-4095 (average: 2375)
• Production (current, maximum): 28% < 1024; 30% 1024-4095; 23% 4096-16383; 17% 16384-65535 (average: 11035) [factor 2.5 – 4.65 above current typical]
• Anticipated: 15% no improvement; 11% > 1048576 (average: 254627) [factor 4.65 – 23.1 above current max.]
• MIC/NVIDIA • 9% / 5% currently ready
• 23% / 26% have plans before 2017
• 5% / 9% have plans after 2017
• HLST • 33% will need support for more cores
• 65% will need support for MIC
• 63% will need support for NVIDIA
• 35% of codes need significantly more memory than currently available
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 30
The good, the bad, the ugly …
• Good • One account, not 1 account per project
• Support for distributed computing, co-allocation, experimentation
• Support for data handling • Shipping back results
• Long term storage (10 years)
• Open access ???
• Fast responses to user queries
• Transparent allocation of resources to projects
• The bad • “export control”
• Inflexible operations
• The ugly • 1 day outages every week
• Multiple week long outages each year
• “Unexpected behaviour” • Running the same job twice produces substantial differences in run-time (or
worse, results)
• Extrapolated MPI start up takes longer than the time allocation
• Extrapolated MPI memory usage larger than the available memory
• Appearance of conflicts of interest in resource allocation
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 31
End …
Thank you for your attention!
Are there questions?
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 32
Current US Allocations
• INCITE allocations
• Fusion (2014)
• 129 M processor hours XK7
• 150 M processor hours BG/Q
• CRESTA (EU Project)
• 42 M processor hours XK7
David Coster | HPC Users Conference | Poznan | 2015-05 | Page 33