the prace project and the application development programme (wp8-2ip) claudio gheller (eth-cscs)
TRANSCRIPT
![Page 1: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/1.jpg)
The PRACE project and the Application Development Programme (WP8-2IP)
Claudio Gheller (ETH-CSCS)
![Page 2: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/2.jpg)
PRACE - Partnership for Advanced Computing in Europe
• PRACE and has the aim of creating a European Research Infrastructure providing world class systems and services and coordinating their use throughout Europe.
2
![Page 3: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/3.jpg)
PRACE-2IP
HPCEUR HET
PRACE History – An Ongoing Success Story
3
2004 2005 2006 2007 2008
Creation of the Scientific Case
HPC part of the ESFRI Roadmap;
creation of a vision involving 15
European countries
Signature of the MoU Creation of the PRACE Research Infrastructure
PRACE RI
2009 2010 2011 2012
PRACE Initiative
PRACE Preparatory Phase Project
PRACE-1IP PRACE-3IP
2013
![Page 4: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/4.jpg)
• 22 partners (21 countries), funding 18 Million €
• Preparation/Coordination: FZJ/JSC/PRACE PMO
• 1.9.2011 – 31.8.2013, extended to 31.8.2014 (only
selected WPs)
• Main objectives:– Provision of HPC resources access
– Refactoring and scaling of major user codes
– Tier-1 Integration (DEISA PRACE)
– Consolidation of the Research Infrastructure
PRACE-2IP
5
![Page 5: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/5.jpg)
• Funding 20 Million €
• Started: summer 2013
• Objectives
– Provision of HPC resources access
– Planned: Pre-commercial procurement exercise
– Planned: Industry application focus
PRACE-3IP
6
![Page 6: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/6.jpg)
Access to Tier-0 supercomputers
Open Call for
Proposals
TechnicalPeer Review
Scientific
Peer Review
Technical experts in
PRACE systems and software
Access Committee
Priorisation +
Resource
Allocation
Project +
Final Report
ResearcherResearchers with expertise in
scientificfield of proposal
Researchers with expertise in
scientificfield of proposal
~ 2 Months~ 2 Months ~ 3 Months~ 3 Months ~ 1 year~ 1 year
PRACE director decides on the proposal of the Access Committee
--
![Page 7: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/7.jpg)
Distribution of resources
8
20%
8%
4%
27%
28%
3%10%
Astrophysics
Chemistry and Materials
Earth Sciences and Environment
Engineering and Energy
Fundamental Physics
Mathematics and Computer Science
Medicine and Life Sciences
![Page 8: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/8.jpg)
PRACE-2IP WP8: Enabling Scientific Codes to the Next Generation of HPC
Systems
9
![Page 9: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/9.jpg)
– WP1 Management– WP2 Framework for Resource Interchange– WP3 Dissemination– WP4 Training– WP5 Best Practices for HPC Systems Commissioning– WP6 European HPC Infrastructure Operation and Evolution– WP7 Scaling Applications for Tier-0 and Tier-1 Users– WP8 Community Code Scaling– WP9 Industrial Application Support– WP10 Advancing the Operational Infrastructure– WP11 Prototyping– WP12 Novel Programming Techniques
PRACE 2IP workpackages
10
ETH leading the WP
![Page 10: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/10.jpg)
WP8: involved centers
11
![Page 11: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/11.jpg)
WP8 objectives
• Initiate a sustainable program in application development for coming generation of supercomputing architectures with a selection of community codes targeted at problems of high scientific impact that require HPC.
• Refactoring of community codes in order to optimally map applications to future supercomputing architectures.
• Integrate and validate these new developments into existing applications communities.
12
![Page 12: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/12.jpg)
WP8 principles
• scientific communities, with their high-end research challenges, are the main drivers for software development;
• synergy between HPC experts and application developers from the communities;
• Supercomputer have to recast their service activities in order to support, guide and enable scientific program developers and researchers in refactoring codes and re-engineering algorithms
• strong commitment from the scientific community has to be granted
13
![Page 13: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/13.jpg)
WP8 workflow
14
Scientific Domains and Communities Selection
Scientific Communities Engagement
Codes screening
Codes Performance Analysis and Model Communities build-up
Codes and kernels selection
Communities consolidationCodes Refactoring
Prototypes experimentation
Code Validation and reintegration
Task 1
Task 3
Task 2
![Page 14: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/14.jpg)
Communities selection (task 1)• the candidate community must have high impact on science
and/or society; • the candidate community must rely on and leverage high
performance computing;• WP8 can have a high impact on the candidate community;• the candidate community must be willing to actively invest in
software refactoring and algorithm re-engineering.o Astrophysicso Climateo Material Scienceo Particle Physicso Engineering
15
![Page 15: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/15.jpg)
Codes and kernels selection methodology (task 1)
• Performance Modelling methodology• Objective and quantitative way to select code and
estimate possible performance improvements– Performance modelling goal is gaining insight into an application’s
performance on a given computer system.
– achieved first by measurement and analysis, and then by the
synthesis of the application and computing system characteristics
– also represents a predictive tool, estimating the behaviour on a
different computing architecture identifying the most promising
areas for performance improvement.
16
![Page 16: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/16.jpg)
Selected codes and institution in charge (task 1)
17
RAMSES ETHPFARM STFCEAF-PAMR UC-LCAOASIS CEAI/O ICHECICON ETHNEMO STFCFluidity/ICOM STFCABINIT CEAQuantumESPRESSO CINECAYAMBO UC-LCASIESTA BSCOCTOPUS UC-LCAEXCITING/ELK ETHPLQCD CASTORCELMER VSB-TUOCODE_SATURN STFCALYA BSCZFS HLRS
![Page 17: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/17.jpg)
Codes Refactoring (task 2)
• Still running (last few weeks)• Specific codes’ kernels are being re-designed and re-
implemented according to the workplans defined in task 1
• Each group works independently• Check points at Face to Face workshops and All-
Hands meetings• Specific Wiki Web site implemented for report
progresses, collect and exchange information and documents and to manage and release implemented code: http://prace2ip-wp8.hpcforge.org
18
![Page 18: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/18.jpg)
Codes validation and re-introduction (task 3)
• Collaborative work (daily basis) involving code developers and HPC experts
• Dedicated workshops • Face to Face meetings• Participation and contribution to conferences
This way, no actual need of a special/specific re-integration procedure was needed
19
![Page 19: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/19.jpg)
20
Case study: RAMSES• The RAMSES code was developed to study
the evolution of the large-scale structure of
the universe and the process of galaxy
formation.
• adaptive mesh refinement (AMR) multi-
species code (baryons – hydrodynamics –
plus dark matter – N-Body)
• Gravity couples the two components.
Solved by multigrid approach
• Other components supported (e.g. MHD,
radiative transfer), but not subject of our
anaysis
![Page 20: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/20.jpg)
Performance analysis exampleParallel Profiling, large test (5123 base grid 9 refinement levels – 250 GB): strong scaling
21
For this test Communication becomes the most relevant part, and it is dominated by synchronizations, due to the difficulties in load balancing the AMR-Multigrid algorithms
Strong improvements can be obtained tuning the load balance among computational elements (nodes?)
![Page 21: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/21.jpg)
Performance analysis: conclusions
The performance analysis identified the critical kernels of the code:
•Hydro: all the functions needed to solve the hydrodynamic problem are
included. Within these functions, we have those that collect from grids at
different resolutions the data necessary to update each single cell, those that
calculate fluxes to solve conservation equations, Riemann solvers, finite-
volume solvers.
•Gravity: this group comprises functions needed to calculate the
gravitational potential at different resolutions using a multigrid-relaxation
approach.
•MPI: comprises all the communication related MPI calls (data tranfer,
synchronisation, management)
22
![Page 22: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/22.jpg)
HPC architectures model
![Page 23: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/23.jpg)
Performance improvementsTwo main objectives
1.hybrid OpenMP+MPI parallelization, to exploit systems with distributed nodes, each
accounting for cores with shared memory
2.Exploitation of accelerators, in particular GPUs, adopting different paradigms
(CUDA, OpenCL, directives)
From the analysis of the performance and of the characteristics of the kernels under
investigation we can say that:
•The Hydro kernel is suitable for both approaches. Specific care must be posed to
memory access issues.
•The Gravity kernel can benefit from the hybrid implementation.
•Due to the multigrid structure, however, an efficient GPU version can be particularly
challenging, so it will be considered only if time and resources permit.
24
![Page 24: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/24.jpg)
Performance modeling
• Hybrid version (trivial modeling):
THYBRID = TMPI MPI,NTOT / (OMP,Ncores MPI,Nnodes)
• GPU version
TTOT = TCPU + TCPU-GPU + TGPU-GPU + TGPU
25
![Page 25: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/25.jpg)
Performance model example
26
![Page 26: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/26.jpg)
Results: Hybrid code (OpenMP+MPI)
27
![Page 27: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/27.jpg)
GPU implementation – approach 1
![Page 28: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/28.jpg)
GPU implementation – approach 1Step 2: solve Hydro equations for cell i,j,k
New Hydro variables
Copy to the CPU
Step 3: compose results array
Step 4: copy results back to the CPU
![Page 29: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/29.jpg)
ResultsSedov Blast wave test (hydro only, unigrid):
Times in secondsACC NVECTOR Ttot Tacc Ttransf Eff. Speed-upOFF - 1 Pe 10 94.54 0 0ON 512 55.83 38.22 9.2 2.012820513ON 1024 45.66 29.27 9.2 2.669969252ON 2048 42.08 25.36 9.2 3.068611987ON 4096 41.32 23.2 9.2 3.293965517ON 8192 41.19 23.15 9.2 3.304535637
20 GB tranferred in/out (constant overhead)
![Page 30: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/30.jpg)
Performance pitfalls
© CSCS 2013 -Claudio Gheller 31
• Amount of transferred data
– Overhead increasing linearly with data size
• Data structure, irregular data distribution
– PREVENTS any asynchronous operation: NO overlap of
computation and data transfer.
– Ineffective memory access
– Prevents coalesced memory access
• Low flops per byte ratio
– this is intrinsic to the algorithm…
• Asynchronous operations not permitted
– See above…
![Page 31: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/31.jpg)
GPU implementation – approach 2
Hydro variables
Gravitational forces
Other quantities CP
U m
emor
y
Step 1: compose data chunks on the CPU
Data chunks are the basic building block of the RAMSES’ AMR hierarchy:OCTs and their refinements
![Page 32: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/32.jpg)
• Data is moved to and from the GPU in chunks
• Data transfer and computation can be overlapped
New Hydro variables
Copy to the CPU
Step 2: copy multiple data chunks to the GPU
Step 3: solve Hydro equations for chuncks N, M…
Step 4: compose results array
![Page 33: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/33.jpg)
Advantages over previous implementation
• Data is regularly distributed in each chunk and its
access is efficient. Improved flop per byte ratio
• Effective usage of the GPU computing architecture
• Data re-organization is performed on the CPU and its
overhead hidden by asynchronous processes
• Data transfer overhead almost completely hidden
• AMR naturally supported
• DRAWBACKS: much more complex implementation
© CSCS 2013 -Claudio Gheller 34
![Page 34: The PRACE project and the Application Development Programme (WP8-2IP) Claudio Gheller (ETH-CSCS)](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649cb15503460f9497616c/html5/thumbnails/34.jpg)
Conclusions• PRACE is providing European scientist top level HPC
services • PRACE-2IP WP8 successfully introduced a
methodology for code development relying on a close synergy between scientists, community codes developers and HPC experts
• Many community codes re-design and implemented to exploit novel HPC architectures (see http://prace2ip-wp8.hpcforge.org/ for details)
• Most of WP8 results are already available to the community
• WP8 is going to be extended one more year (no similar activity in PRACE-3IP) 35