multicore/manycore with a @let@token scientific computing...

26

Upload: others

Post on 27-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

Multicore/manycore with a scienti�c computing culture:experience feedback with special focus on performance modeling

and redesign of computational methods

Florian De Vuyst

CMLA, ENS Cachan Université Paris-Saclay et CNRS

37è Forum ORAP � 17 mars 2016, CNRS Michel-Ange, Paris

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 1/26

Page 2: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

Acknowledgements

The team :

I Marie Béchereau (PhD candidate, CMLA)

I Thibault Gasc (PhD candidate, MDLS, CEA, CMLA)

I Mathieu Peybernes (CEA Saclay, DEN)

I Raphael Poncet (CGG)

I Renaud Motte (CEA DAM DIF)

Acknowledgements :

I Thomas Guillet, Philippe Thierry (INTEL)

I NVIDIA (CUDA Research Center, 2013)

I MDLS for hardware support

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 2/26

Page 3: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

1. GPU computing experience

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 3/26

Page 4: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

Beginning of the story ...Real-time interaction of 2D �ow dynamics on boards/screens (2012)

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 4/26

Page 5: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

Beginning of the story ...Real-time interaction of 2D �ow dynamics on boards/screens (2012)

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 5/26

Page 6: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

GPU computing experience

I NVIDIA CUDA Language, TESLA boards

I Subtantial speedup observed

I Nice binding between computation and visualization

I Classical 2D FV CFD Cartesian code & LBM codesimplemented

I Performance issue : sometimes hard to choose the adequatedata structures

I Global performance : (most of the time) hard to understand

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 6/26

Page 7: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

2. Performance modeling on (multicore) CPU

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 7/26

Page 8: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

Performance modeling : objectives

I Derive a model which is able to predict the performance of agiven solver on a given CPU-based machine.

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 8/26

Page 9: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

Elements of a performance model

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 9/26

Page 10: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

Reference node-based performance model : Roo�ine

[S. Williams, A. Waterman & D. Patterson, Roo�ine : an insightful visual performance model formulticore architectures, Communications ACM 55(6) : 121-130 (2012)]

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 10/26

Page 11: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

Execution Cache Memory (ECM) model[Hager & Treibig 2010]

Metrics : time [cycle / cache line update]

I Execution times at (ALU+L1 cache) level

I Transfer speeds between Lx-L(x+1)cache/memory

[G. Hager et al., Concurrency and Computation : Practiceand Experience, DOI : 10.1002/cpe.3180 (2013).]

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 11/26

Page 12: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

ECM : from one-core to multi-core CPU

Registers

L1 Cache

L2 Cache

L3 Cache

Memory

256 bit/cy

256 bit/cy

256 bit/cy 128 bit/cy

*53 bit/cy

Execution units Computation

Data

T ransfe r

Registers

L1 Cache

L2 Cache

L3 Cache

Memory

256 bit/cy

256 bit/cy

*53 bit/cy

Registers

L1 Cache

L2 Cache

Privateresources

256 bit/cy

SharedRessources

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 12/26

Page 13: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

3. Application : Lagrangian remap Hydrodynamicssolver

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 13/26

Page 14: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

Lagrangian remap algorithm

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 14/26

Page 15: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

Data�ow diagram

Figure: LEFT : Lagrangian step & RIGHT : remapping step data�owdiagrams (input data, kernels, output data)

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 15/26

Page 16: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

Results

I Roo�ine is able to predict performance of compute-boundkernels (O(5%) errors)

I ECM is necessary for good performance prediction ofmemory-bound kernels (O(5%) errors)

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 16/26

Page 17: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

Redesign of Lagrangian remapping

Why is it necessary to redesign ?

I Staggered velocity variables involve more communications(memory-bound)

I Alternating direction strategies multiply communications too

I The remapping process is not SIMD

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 17/26

Page 18: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

Focus : remapping process

Figure: Geometric remapping : bad for SIMD

Figure: Geometry-free reformulation by balance of advection �uxes :SIMD-ready ⇒ Lagrange-�ux schemes

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 18/26

Page 19: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

Comparison of results between legacy Lagrange-remap and Lagrange-�ux scheme

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 19/26

Page 20: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

Multimaterial extension of Lagrange-�ux schemes

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 20/26

Page 21: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

Data�ow diagram for the Lagrange-�ux scheme

Rem : easy parallelization & vecto by #pragma directives.

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 21/26

Page 22: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

Performance prediction vs measurements with ECM

Figure: Performance metrics : nb. of cycles (cy) per cache line (CL)

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 22/26

Page 23: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

Comparison of vecto + multicore scalability performance

Figure: Performance comparison in millions of cell updates (MCUPs),CPUs INTEL SandyBridge 2× 8 cores. Performed on a �ne mesh.

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 23/26

Page 24: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

Research perspective

I Extend the performancemodeling methodology toGPU/manycore-based

I (NB Some literature alreadyexists on this subject)

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 24/26

Page 25: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

Concluding remarks

I Nice speedups with GPU but sometimes unclear performanceissues

I Performance modeling is a powerful tool to understand globalperformance

I Performance analysis is insightful for bottleneck analysis

I Some parts of the solver may be redesigned for achieving betterarithmetic intensity, leading to new computational approaches

I Future work : accurate performance models foraccelerators/manycore/GPU proc. ?

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 25/26

Page 26: Multicore/manycore with a @let@token scientific computing ...orap.irisa.fr/wp-content/uploads/2015/11/FlorianDeVuyst-37Orap.pdf · Multicore/manycore with ascienti c computing culture:

1. GPU computing experience2. Performance modeling on (multicore) CPU

3. Application : Lagrangian remap Hydrodynamics solver

References

1. T. Gasc, F. De Vuyst, M. Peybernes, R. Poncet, R. Motte, Building a moree�cient Lagrange-remap scheme thanks to performance modeling, ECCOMAS2016, Paper P12210, Proc. of the Conference ECCOMAS 2016, Minisymposium�MS 414 - New trends in numerical methods for multi-material compressible�uid �ows�, submitted (2016).

2. F. De Vuyst, T. Gasc, R. Motte, M. Peybernes, R. Poncet, Lagrange-FluxEulerian schemes for compressible multimaterial �ows, ECCOMAS 2016, PaperE8851, Proc. of the Conference ECCOMAS 2016, Minisymposium �MS 414 -New trends in numerical methods for multi-material compressible �uid �ows�,submitted (2016).

3. F. De Vuyst, M. Béchereau, T. Gasc, R. Motte, M. Peybernes, R. Poncet, Stableand accurate low-di�usive interface capturing advection schemes, submitted toIJNMF, Special issue for the MULTIMAT 2015 Conf., submitted (2016).

4. R. Poncet, M. Peybernes, T. Gasc and F. De Vuyst, Performance modeling of acompressible hydrodynamics solver on multicore CPUs, Proc. of the PARCO2015 Conference, Edinburgh, accepted (2016).

F. De Vuyst (CMLA ENS Cachan) Multicore/manycore: experience feedback 26/26