1 mark f. adams scidac - 27 june 2005 ax=b: the link between gyrokinetic particle simulations of...

1

Mark F. Adams

SciDAC - 27 June 2005

Ax=b: The Link between Gyrokinetic Particle Simulations of Turbulent

Transport in Burning Plasmas and Micro-FE Analysis of Whole Vertebral Bodies in Orthopaedic Biomechanics

2

Outline

Algebraic multigrid (AMG) introduction Micro-FE bone modeling Olympus parallel FE framework Scalability study on IBM SPs Gyrokinetic Particle Simulations of

Turbulent Transport in Burning Plasmas

3

Multigrid smoothing and coarse grid correction

(projection)smoothing

Finest Grid

Prolongation (P=RT)

The MultigridV-cycle

First Coarse Grid

Restriction (R)

Note:smaller grid

4

Multigrid V() - cycle Given smoother S and coarse grid space (P)

Columns of “prolongation” operator P, discrete rep. of coarse grid space

Function u = MG-VMG-V(A,f) if A is small

u A-1f

else u S(f, u) -- steps of smoother (pre) rH PT( f – Au )

uH MG-VMG-V(PTAP, rH ) -- recursionrecursion (Galerkin)

u u + PuH

u S(f, u) -- steps of smoother (post)

Iteration matrix w/ R = PT: T = S ( I - P(RAP)-1RA ) S multiplicative

5

Smoothed Aggregation Coarse grid space & smoother MG method Piecewise constant function: “Plain” agg. (P0)

Start with kernel vectors B of operator eg, 6 RBMs in elasticity

Nodal aggregation B P0

“Smoothed” aggregation: lower energy of functions One Jacobi iteration: P ( I - D-1 A ) P0

6

Outline



7

Trabecular Bone

5-mm Cube

Cortical bone

Trabecular bone

8

Micro-Computed Tomography

CT @ 22 m resolution

3D image

Mechanical TestingE, yield, ult, etc.

2.5 mm cube44 m elements

FE mesh

Methods: FE modeling

9

the vertebral body you are showing is pretty healthy from a 80 year old female and it is a T-10 that is thoracic. So it is pretty close to the mid-spine. Usually research is done from T-10 downward to the lumbar vertebral bodies. There are 12 thoracic VB's and 5 lumbar. The numbers go up as you go down.

the vertebral body you are showing is pretty healthy from a 80 year old female and it is a T-10 that is thoracic. So it is pretty close to the mid-spine. Usually research is done from T-10 downward to the lumbar vertebral bodies. There are 12 thoracic VB's and 5 lumbar. The numbers go up as you go down.

10

Motivation• Calibrate material models for continuum elements

– eg, explicit computation of a yield surface

• Validation for low order model• Investigation of effects that are not accessible with lower order models

– role of cortical shell in load carrying of vertebra– effects of drug treatment on continuum properties

1 mm slice from vertebral body

11

Outline



12

Athena: Parallel FE ParMetis

Parallel Mesh Partitioner (Univerisity of Minnesota)

Prometheus Multigrid Solver

FEAP Serial general purpose

FE application (University of California)

PETSc Parallel numerical

libraries (Argonne National Labs)

FE MeshInput File

Athena ParMetis

FE input file(in memory)

FE input file(in memory)

Partition to SMPs

Athena Athena ParMetis

File File File File

FEAPFEAPFEAPFEAP Material Card

Silo DBSilo DB

Silo DBSilo DB

Visit

Prometheus

PETScParMetis

METISMETISMETISMETIS

pFEAP

Computational Architecture

Olympus

13

Geometric& Material non-linear

2.25% strain

8 procs.DataStar

(SP4at UCSD)

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

14

ParMetis partitions

15

Outline



16

80 µm w/ shell

Vertebral Body With Shell

Large deformation elast. 6 load steps (3% strain) Scaled speedup

~131K dof/processor 7 to 537 million dof 4 to 292 nodes IBM SP Power3

14 of 16 procs/node used Double/Single Colony switch

1780 µm w/o shell

Inexact Newton CG linear solver

Variable tolerance

Smoothed aggregation AMG preconditioner

Nodal block diagonal smoothers: 2nd order Chebeshev (add.) Gauss-Seidel (multiplicative)

Scalability

18

Computational phases Mesh setup (per mesh):

Coarse grid construction (aggregation) Graph processing

Matrix setup (per matrix): Coarse grid operator construction

Sparse matrix triple product RAP (expensive for S.A.)

Subdomain factorizations

Solve (per RHS): Matrix vector products (residuals, grid transfer) Smoothers (Matrix vector products)

19

131K dof / proc - Flops/sec/proc .47 Teraflop/s - 4088 processors

20

Sources of scale inefficiencies in solve phase

7.5M dof 537M dof

#iteration 450 897

#nnz/row 50 68

Flop rate 76 74

#elems/pr 19.3K 33.0K

model 1.00 2.78

Measured 1.00 2.61

21

Strong speedup with 7.5M dof problem (1 to 128 nodes)

22

Outline



24

Finite Element (FEM) Elliptic Solver Developed for GTC Global Field Aligned Mesh

• FEM adapted for logically non-rectangular grids.

• Need adjustments of elements at different toroidal angles.

• Linear sparse matrix solver– PETSc (ANL)

• Enabled implementing split-weight (Manuilskiy & Lee, POP2000)

– and hybrid electron models (Lin & Chen, PoP2001)

• Ongoing studies of kinetic electron effects on ITG and TEM turbulence

• Ongoing studies of electromagnetic turbulences:

25

Performance

• Multigrid preconditioned Krylov solver– Prometheus (Columbia) & HYPRE (LLNL)

• Scaled speedup– ~38K dof per processor– 1 to 32 processors/plane– 8 planes, 20 time steps, 4 particles per cell

26

Thank YouGordon Bell Prize winner 2004:

Ultrascalable implicit finite element analyses in solid mechanics with over a

half a billion degrees of freedom

M.F. Adams, H.H. Bayraktar,T.M. Keaveny, P. Papadopoulos

ACM/IEEE Proceedings of SC2004: High Performance Networking and Computing

27

Linear solver iterations

Newton

Load

Small (7.5M dof) Large (537M dof)

1 2 3 4 5 1 2 3 4 5 6

1 5 14 20 21 18 5 11 35 25 70 2

2 5 14 20 20 20 5 11 36 26 70 2

3 5 14 20 22 19 5 11 36 26 70 2

4 5 14 20 22 19 5 11 36 26 70 2

5 5 14 20 22 19 5 11 36 26 70 2

6 5 14 20 22 19 5 11 36 26 70 2

1 mark f. adams scidac - 27 june 2005 ax=b: the link between gyrokinetic particle simulations of...

Documents

art pdebases modelingmethods

fe modelingstate

microfe analysis

coarse grid spacefunction

small u

lumbar vertebral bodies

n2 cyclegiven smoother

memoryfe input file