multicore/manycore with a @let@token scientific computing...
TRANSCRIPT
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
Multicore/manycore with a scienti�c computing culture:experience feedback with special focus on performance modeling
and redesign of computational methods
Florian De Vuyst
CMLA, ENS Cachan Université Paris-Saclay et CNRS
37è Forum ORAP � 17 mars 2016, CNRS Michel-Ange, Paris
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 1/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
Acknowledgements
The team :
I Marie Béchereau (PhD candidate, CMLA)
I Thibault Gasc (PhD candidate, MDLS, CEA, CMLA)
I Mathieu Peybernes (CEA Saclay, DEN)
I Raphael Poncet (CGG)
I Renaud Motte (CEA DAM DIF)
Acknowledgements :
I Thomas Guillet, Philippe Thierry (INTEL)
I NVIDIA (CUDA Research Center, 2013)
I MDLS for hardware support
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 2/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
1. GPU computing experience
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 3/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
Beginning of the story ...Real-time interaction of 2D �ow dynamics on boards/screens (2012)
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 4/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
Beginning of the story ...Real-time interaction of 2D �ow dynamics on boards/screens (2012)
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 5/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
GPU computing experience
I NVIDIA CUDA Language, TESLA boards
I Subtantial speedup observed
I Nice binding between computation and visualization
I Classical 2D FV CFD Cartesian code & LBM codesimplemented
I Performance issue : sometimes hard to choose the adequatedata structures
I Global performance : (most of the time) hard to understand
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 6/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
2. Performance modeling on (multicore) CPU
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 7/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
Performance modeling : objectives
I Derive a model which is able to predict the performance of agiven solver on a given CPU-based machine.
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 8/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
Elements of a performance model
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 9/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
Reference node-based performance model : Roo�ine
[S. Williams, A. Waterman & D. Patterson, Roo�ine : an insightful visual performance model formulticore architectures, Communications ACM 55(6) : 121-130 (2012)]
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 10/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
Execution Cache Memory (ECM) model[Hager & Treibig 2010]
Metrics : time [cycle / cache line update]
I Execution times at (ALU+L1 cache) level
I Transfer speeds between Lx-L(x+1)cache/memory
[G. Hager et al., Concurrency and Computation : Practiceand Experience, DOI : 10.1002/cpe.3180 (2013).]
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 11/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
ECM : from one-core to multi-core CPU
Registers
L1 Cache
L2 Cache
L3 Cache
Memory
256 bit/cy
256 bit/cy
256 bit/cy 128 bit/cy
*53 bit/cy
Execution units Computation
Data
T ransfe r
→
Registers
L1 Cache
L2 Cache
L3 Cache
Memory
256 bit/cy
256 bit/cy
*53 bit/cy
Registers
L1 Cache
L2 Cache
Privateresources
256 bit/cy
SharedRessources
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 12/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
3. Application : Lagrangian remap Hydrodynamicssolver
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 13/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
Lagrangian remap algorithm
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 14/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
Data�ow diagram
Figure: LEFT : Lagrangian step & RIGHT : remapping step data�owdiagrams (input data, kernels, output data)
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 15/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
Results
I Roo�ine is able to predict performance of compute-boundkernels (O(5%) errors)
I ECM is necessary for good performance prediction ofmemory-bound kernels (O(5%) errors)
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 16/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
Redesign of Lagrangian remapping
Why is it necessary to redesign ?
I Staggered velocity variables involve more communications(memory-bound)
I Alternating direction strategies multiply communications too
I The remapping process is not SIMD
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 17/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
Focus : remapping process
Figure: Geometric remapping : bad for SIMD
Figure: Geometry-free reformulation by balance of advection �uxes :SIMD-ready ⇒ Lagrange-�ux schemes
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 18/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
Comparison of results between legacy Lagrange-remap and Lagrange-�ux scheme
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 19/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
Multimaterial extension of Lagrange-�ux schemes
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 20/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
Data�ow diagram for the Lagrange-�ux scheme
Rem : easy parallelization & vecto by #pragma directives.
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 21/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
Performance prediction vs measurements with ECM
Figure: Performance metrics : nb. of cycles (cy) per cache line (CL)
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 22/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
Comparison of vecto + multicore scalability performance
Figure: Performance comparison in millions of cell updates (MCUPs),CPUs INTEL SandyBridge 2× 8 cores. Performed on a �ne mesh.
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 23/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
Research perspective
I Extend the performancemodeling methodology toGPU/manycore-based
I (NB Some literature alreadyexists on this subject)
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 24/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
Concluding remarks
I Nice speedups with GPU but sometimes unclear performanceissues
I Performance modeling is a powerful tool to understand globalperformance
I Performance analysis is insightful for bottleneck analysis
I Some parts of the solver may be redesigned for achieving betterarithmetic intensity, leading to new computational approaches
I Future work : accurate performance models foraccelerators/manycore/GPU proc. ?
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 25/26
1. GPU computing experience2. Performance modeling on (multicore) CPU
3. Application : Lagrangian remap Hydrodynamics solver
References
1. T. Gasc, F. De Vuyst, M. Peybernes, R. Poncet, R. Motte, Building a moree�cient Lagrange-remap scheme thanks to performance modeling, ECCOMAS2016, Paper P12210, Proc. of the Conference ECCOMAS 2016, Minisymposium�MS 414 - New trends in numerical methods for multi-material compressible�uid �ows�, submitted (2016).
2. F. De Vuyst, T. Gasc, R. Motte, M. Peybernes, R. Poncet, Lagrange-FluxEulerian schemes for compressible multimaterial �ows, ECCOMAS 2016, PaperE8851, Proc. of the Conference ECCOMAS 2016, Minisymposium �MS 414 -New trends in numerical methods for multi-material compressible �uid �ows�,submitted (2016).
3. F. De Vuyst, M. Béchereau, T. Gasc, R. Motte, M. Peybernes, R. Poncet, Stableand accurate low-di�usive interface capturing advection schemes, submitted toIJNMF, Special issue for the MULTIMAT 2015 Conf., submitted (2016).
4. R. Poncet, M. Peybernes, T. Gasc and F. De Vuyst, Performance modeling of acompressible hydrodynamics solver on multicore CPUs, Proc. of the PARCO2015 Conference, Edinburgh, accepted (2016).
F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 26/26