Damien Genet IPL C2S@Exa - July 11, 2014
AeroSolSolver for CFD problem formodern architecture
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014
INTRODUCTION
Many CFD platforms around the world.Each platfom is specialized:
I One element support: Triangle, Tetrahedron, etc.I One dimension support: 2D, 3D.I Solution: Low or high orderI Language..
Each develop its solver and only a few share.
⇒ We want to achieve genericity, collaboration, andperformance.
Context
Software Architecture
Distributed Memory level
Shared Memory level
Conclusion, Perspectives
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014
Context
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 1
1ContextThe story begins...
Context
Bacchus
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 2
Bacchus, INRIA project:I HPC tools: Scotch , MMG3D , PaMPA .I CFD solvers: FluidBox , RealFluid , AeroSol .
Context
CAGIRE
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 3
Simulation and experimentationI Simulation: AeroSol platform.I Experimentation: Maveric test bench.
Context
AeroSol
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 4
Main goals for the AeroSol platform:I Collaborative: Anyone can ‘easily‘ contribute.I Genericity: No restrictions !I Maintainability: Well designed architecture.I Performance: Just performance.
Software Architecture
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 5
2Software Architecture
Software Architecture
Outside AeroSol
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 6
Software Architecture
Inside AeroSol
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 7
Software Architecture
Import - Export
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 8
I Import: GMSH in sequential or in parallel.I Export: Write solution in VTK, XDMF(HDF5), or
TecPlot.
Software Architecture
10
23
5
4
6
7 8
I An entity for the element.I An entity by face.I An entity by edge.I An entity by vertex.
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 9
Construction of a rich graph. Each element is decomposedhierarchically in a set of entities:
Add some relations between entities:I Ownership between element and face/edge.I Rotation between element and face.
⇒ PaMPA
Software Architecture
Time scheme
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 10
I does multi-step, multi-stage;I does non linear iterations (Newton);I does abstract operations on vectors;I commands the spatial scheme for computation.
Software Architecture
Spatial scheme
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 11
I knows the mesh, and iterates over it;I reconstructs a local cell
→ works for Continuous / Discontinuous;I uses Integrators to computes quantities;
→ generic for an element;→ parametrized by numerical flux in D.G.;
I assembles a matrix.
Software Architecture
Cell reconstruction
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 12
Triangle 1
Triangle 2
0 1 2 3 4 5 6 7 8 9
1 10 2 11 12 13 14 6 5 15
1
2
0 1
2
3 4
5
67
8 9
10
11
12
1314
15
-1+1
Software Architecture
Integrators
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 13
0 1
2
3
45
27 33
42
43
28
34
T
T−1
I works on reference element;I uses finite elements;I uses quadrature formulae;I uses geometry;I parametrized by Model or Numerical Flux;I computes quantities over an element, or a face.
→ works for both continuous and discontinuous
Software Architecture
Matrix class
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 14
I Distributed matrix.I Assembled or not.I Interfaced with many linear solvers:
I MUMPS , PETSc , UMFPACK
Distributed Memory level
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 15
3Distributed memorylevel
Distributed Memory level
PaMPA
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 16
I handles the mesh and gives visitors;I redistributes the entities;I remeshes;I computes the overlap and do the communications.
Distributed Memory level
Domain Decomposition
Pj
Pi
→ entites are scattered→ incomplete elements
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 17
Distributed Memory level
Overlap construction
Pj
Pi
→ the overlap completes each incomplete element.→ the overlap gathers entities needed for computations
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 18
Distributed Memory level
Results on Avakas cluster
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 19
Shared Memory level
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 20
4Shared memory level
Shared Memory level
Macroelements
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 21
We built a set of macroelements.
Mass Integrator
B,W
∇U∇V
B, RW
Assembly
B,RA,RW
1
23
4
5
6
7
8
9
10 1112
Low Priority Tasks
High priority Tasks
Spawn SpawnSpawn
→ limits the memory consumption;→ limits the number of tasks in the DAG.
Shared Memory level
Assembly
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 22
We focus on the assembly operations.
Race conditions occur when 2 or more elements shareunknowns.
1 2
34
5 6 1
2
3
4
5
6
1 2 3 4 5 61 2 3 4
3 4 5 6
3
4
5
6
1
2
3
4
→ many strategies to do the assembly.
Shared Memory level
State of the art
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 23
OpenMP: Parallelize the inner loops of the assembly, andtreats the element sequentially
Coloration + OpenMP: 2 elements sharing unknowns havedifferent colours.
Shared Memory level
Our strategies
1 2 3 4
1
2
3
4
1 2 3 4 5 6
1
2
3
4
5
6
3 4 5 6
3
4
5
6
0.1
0.5
tim
e in s
econds
tile size = 10000 tile size = 20000 tile size = 50000
204060 80
4 8 16 24 32
eff
icie
ncy
number of threads
tile size = 10000 tile size = 20000 tile size = 50000tile size = 10000 tile size = 20000 tile size = 50000
4 8 16 24 32
number of threads
tile size = 10000 tile size = 20000 tile size = 50000tile size = 10000 tile size = 20000 tile size = 50000
StarPU flat
StarPU fixed
StarPU adapt
StarPU no-chain
PaRSEC flat
PaRSEC fixed
PaRSEC adapt
PaRSEC no-chain
OpenMP
Colour
4 8 16 24 32
number of threads
tile size = 10000 tile size = 20000 tile size = 50000
StarPU
PaRSEC
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 24
Conclusion, Perspectives
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 25
5Conclusion,perspectives...
Conclusion, Perspectives
Conclusion
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 26
I Generic platform with continuous testing.I Time schemes: Explicit Runge Kutta, Implicit BDF.I Continuous: CG, Taylor Galerkin, SUPG scheme.I Discontinuous: Discontinuous Galerkin.I Model: Advection, advection-diffusion, Euler, N-S (DG).I FE: Up to 4th order, Dubiner, Legendre, Lagrange.I Runtimes: Laplacian solved on GPU + CPU with StarPU
.I Portability: Avakas, Plafrim clusters. Turing cluster.
Compilers : gcc, icc, xlc, Clang.
Conclusion, Perspectives
Perspectives
Damien Genet - AeroSol IPL C2S@Exa - July 11, 2014- 27
I DG: Marsu ADT, multigrid methods.I Scheme: platform continuously improved by ongoing
research.I Model: RANS, LES, Combustion, incompressible free
surface flows.I HPC: efficiency nonlinear iterations in long time
integration (interaction non-linear-linear solver), tightercoupling with runtime systems.
Conclusion, Perspectives
Thank you !
INRIA
Bordeaux Sud-Ouest