running m3d on advanced computing architectures

Running M3D on Advanced Computing Architectures

Jin Chen

PPPL

M3D Summery

• Multilevel / 3D using potential stream function, mhd / 2-fluid / particle, tokamak / stellarator• Semi-implicit time step applied• 13-19 elliptic solver calls per time step• Poisson Equation with Neumann b.c.• Matrices symmetrized / solvers optimized• Higher order triangular elements• Runs on NERSC: seaborg / jacquard NLCF : cheetah / ram / phoenix PPPL : fcc / mhd

M3D code structure

Initialization M3D fortran Elliptic solver & operator C

Postprocessing

Equilibrium

Checkpoint

Interpolation

Mesh generation

Coordinate setup

Triangle elements

M3D matrices

-rhoeqn: density

-weqn:

-chiaiap_3:

-vphieq:

-phieqn:

-aeqn:

-ceqn/:

-feqn:

poissc.c

hopoissc.c

hopoissc_diag.c

poissc_fsymm.cpoissc_fsymm_opt.c

dxdrc.c

checkpoint

hdf5 viz data†w U, ,I p

v

I

FR

making M3D matrices symmetry2 2

2 2

*

†

*

†

1

1

1

R Z

R R

R R

Conserved form

R

R

* *

† †† †

g

g

g

Iu f u f

t g

Iu f u f

t g

Iu f u f

t g

J. Chen, et al., Symmetric Solution in M3D, Computer Physics Communication 164,468(2004).

compiler options

• Compile Level 0: make BOPT=O update Level 1: make –f Makefile.fsymm BOPT=O update Level 2: make –f Makefile.fsymm_opt BOPT=O update

runtime optionsjob.old: works for both regular M3D and symmetrized M3D

-poisson_pc_type asm \-poisson_pc_asm_overlap 1 \-poisson_sub_pc_type ilu \-poisson_sub_pc_ilu_levels 3 \-poisson_ksp_type gmres \

-poissonH1_pc_type asm \-poissonH1_pc_asm_overlap 1 \-poissonH1_sub_pc_type ilu \-poissonH1_sub_pc_ilu_levels 3 \-poissonH1_ksp_type gmres \

-poissonH2_pc_type asm \-poissonH2_pc_asm_overlap 1 \-poissonH2_sub_pc_type ilu \-poissonH2_sub_pc_ilu_levels 3 \-poissonH2_ksp_type gmres \

-pc_type asm \-pc_asm_overlap 1 \-sub_pc_type ilu \-sub_pc_ilu_levels 3 \-ksp_type gmres \

job.opt: works only for symmetrized M3D

-poisson_pc_type hypre-poisson_pc_hypre_type boomeramg-poisson_ksp_type cg \

-poissonH1_pc_type jacobi\-poissonH1_ksp_type cg \

-poissonH2_pc_type asm \-poissonH2_pc_hypre_type boomeramg-poissonH2_ksp_type cg \

-pc_type jacobi \-ksp_type cg \

Iteration counts comparing (7067->1311->593)

gmres/ilu cg/hypre/jacobi

restart 30 200 200 fillin level 0 0 3

• Diffpar-operator_4-ibc_1-ID_0 : its = 3 3 2 6• poisD-operator_3 : its = 404 188 58 8• Diffpar-operator_4-ibc_1-ID_0 : its = 5 5 5 5• Diffpar-operator_4-ibc_0-ID_10 : its = 30 30 9 101• Diffpar-operator_4-ibc_0-ID_10 : its = 30 30 9 101• poisN-operator_1 : its = 3234 465 89 9• Diffpar-operator_4-ibc_1-ID_0 : its = 5 5 5 5• Diffpar-operator_4-ibc_0-ID_0 : its = 3 3 3 6• poisD-operator_1 : its = 371 181 56 9• Diffpar-operator_5-ibc_0-ID_0 : its = 2 2 2 4• poisD-operator_2 : its = 358 182 56 8• poisN-operator_1 : its = 2593 420 57 9

Improved M3D time components

Time (sec, 500 time steps, 1 node, seborg) Gmres/ilu CG/jacobi/hypre

M3D total time

(fortran code, solver, operator, data conversion)

5788 2424

elliptic solver time

KSPSolve time

3962

3222

1335

1189

M3D-Petsc data convertion

(par2m3d m3d2par Rpar2m3d m3d2Rpar)

974 118

Strong Scaling in direction

m3d time KSP time• 1 node (39481 eqs) 2242 sec 1237 sec• 4 nodes( 9871 eqs) 1010 sec 482 sec• 8 nodes( 4971 eqs) 734 sec 353 sec

• Note: seaborg. 1 node has 16 processors. The number of equations is counted on each processor. 100 timestep. 16/16/141/1/4/1-4-8. optimz/opt4_scaling_strong_16p/64p/128p/256p.wxo.

Weak Scaling in direction

m3d time KSP time• 1 node (7321 eqs) 129 sec 22 sec• 4 nodes(7261 eqs) 259 sec 42 sec• 8 nodes(7261 eqs) 321 sec 58 sec• 16 nodes(7261 eqs) 322 sec 68 sec

Note: seaborg. 10 timestep. Optimz/weaking_scaling_1.16_16_061/121/121/121_4/4/8/16_1/4/8/16_1node/4node/8node/16node.wxo

Most time saved … Neumann b.c.Weakly diagonal dominant matrix

u f x R

ug x R

n

Ax b

R R

fdx gdl

( , )

( , )0T b e

b be

be

e 1.Consistent

system

2.Unique solution

0Tx e 0R

udx

{ }

: 0

0

nullspace span e

Singularity Ae

A is semi definite

one eigenvalue is

Minimum length solution

,

.

if u is a solution

u c is also a solution

Higher order triangles

2nd order

3rd order

Regular higherJ. Chen, et al,

Solving Anisotropic Transport Equation on Misaligned Grids, LNCS 3516, pp. 1076-1079, 2005.

Lump higher order G. Cohen, et al,

Higher order trangular finite elements with mass lumping for the wave equation, Siam J. Numer. Anal. 38(2047-2078), 2001

2nd order meshes with p __from Linda

Run M3D with ho options

• Compiler options make BOPT=O update

• Runtime options– Regular 2nd order -hoelement -horder 2 \– Regular 3rd order -hoelement -horder 3 \– Lump 2nd order -hoelement -lump -horder 2 \– Lump 3rd order -hoelement -lump -horder 3 \

Benchmark ho code: m3d/code/m2.F

• 346 c.. determine mesh• 347 call dmesh• 348 • 349 c.. cjtest 1-dec-04 for linda start --• 355 call cvolea( one, sum )• 356 write(0,*)"TEST: cvolea sum for one = ", sum• 357 call cvol( one, sum )• 358 write(0,*)"TEST: cvol sum for one = ", sum• 359 c.. cjtest 1-dec-04 for linda end --• 360 • 362 call rnetc• 363 • 364 c... model test problem• 366 call ellip• 367 call circle• 369 c return• 370 • 371 cLS if(impp.eq.1.and.ioldinp.ne.1) call wread_mpp

Benchmark options

Compiler options to turn on

-DELLIP in m3d/grid/Makefile

-DHELMHOLTZ in m3d/interface/Makefile

-mhd/driver/test.c: mh3d_test

Runtime options

Ellip.F controled by elist

Circle.F controled by clist

M3D elliptic solvers

†

*

†

*

1: . .

2 : . .

3. . .

4. ( ) . .

5. ( ) . .

6. ( ) . .

7. . .

u f Dirichlet b c

u f Dirichlet b c

u f Dirichlet b c

I u f Dirichlet b c

I u f Dirichlet b c

I u f Dirichlet b c

u f Neumann b c

M3D operators

• iselect = 11 dudx 1st order partial derivatice• iselect = 12 dudz 1st order partial derivative• iselect = 13 dxdphi toroidal derivative• iselect = 14 cvol total toroidal volume• iselect = 15 cvolea toroidal volume contained in each • iselect = 16 d2udxdz - d2udzdx 2nd order derivative commute• iselect = 17 gradsq vector inner product• iselect = 18 gcro vector cross product• iselect = 19 delsq laplacian and bdy line integral• iselect = 20 div divergence

Numerical Accuracy (2nd order, RMS)

operators Linear Regualr HO Lump HO

• pure poiss .3133E-04 .1824E-10 .2778E-10• star poiss .7480E-04 .1741E-07 .9668E-11• dagg poiss .7689E-05 .1368E-07 .1316E-11• Helmholtz pure poiss .8375E-04 .5808E-06 .5921E-11• Helmholtz star poiss .2019E-03 .1122E-04 .1187E-10• Helmholtz dagg poiss .3648E-04 .1582E-06 .1542E-10• pure poiss Neumann u_x .3034E-02 .1049E-03 .1157E-03• u_y .2290E-02 .7860E-04 .8898E-04• dxdr .2424E-03 .7718E-11 .4413E-13• dxdz .9665E-03 .1709E-09 .2709E-13• d2xdrdz - d2xdzdr .9787E-03 .1705E-09 .5611E-11• grad .4251E-03 .7760E-13 .4639E-13• gcro .3830E-02 .6218E-09 .9326E-14• delsq .6927E-03 .9690E-10 .1106E-09

Numerical Efficiency (2nd order)

operators Linear Regular HO Lump HO

• pure poiss 11.505164 17.993580 15.881487• star poiss 11.936641 17.842965 15.577935• dagg poiss 11.487363 17.065694 15.590550• Helmholtz pure poiss 11.593001 17.850698 15.764700• Helmholtz star poiss 11.827986 17.617935 15.462633• Helmholtz dagg poiss 11.127486 17.504207 15.329060• pure poiss Neumann 11.800331 17.994744 15.368874

• dxdr 0.325041 2.822974 0.443981• dxdz 0.467021 2.539099 0.419528• d2xdrdz - d2xdzdr 0.560459 9.457784 2.098601• grad 0.680051 2.715444 0.961536• gcro 0.234130 2.418649 0.544330• Delsq(Laplacian) 0.355726 6.733015 0.554883

poisson solver scales to # of eqs

Lump HO is used.

Gmres/ilu.

Application of ho code to anisotropic transport on misaligned grids

M3D on X1

• m3dp.x

m3dp_vec.x

• m3dp_fsymm.x

m3dp_fsymm_vec.x ??

• m3dp_fsymm_opt.x

m3dp_fsymm_opt_vec.x ??

Optimizing M3D on X1—cont’d

• Petsc, Matrix Vector Product

• MatMult flops (16MSP):

Standard petsc Optimized petsc

6.81 MFlops 54.0 MFlops

running m3d on advanced computing architectures

Documents