1 mark f. adams scidac - 27 june 2005 ax=b: the link between gyrokinetic particle simulations of...
TRANSCRIPT
1
Mark F. Adams
SciDAC - 27 June 2005
Ax=b: The Link between Gyrokinetic Particle Simulations of Turbulent
Transport in Burning Plasmas and Micro-FE Analysis of Whole Vertebral Bodies in Orthopaedic Biomechanics
2
Outline
Algebraic multigrid (AMG) introduction Micro-FE bone modeling Olympus parallel FE framework Scalability study on IBM SPs Gyrokinetic Particle Simulations of
Turbulent Transport in Burning Plasmas
3
Multigrid smoothing and coarse grid correction
(projection)smoothing
Finest Grid
Prolongation (P=RT)
The MultigridV-cycle
First Coarse Grid
Restriction (R)
Note:smaller grid
4
Multigrid V() - cycle Given smoother S and coarse grid space (P)
Columns of “prolongation” operator P, discrete rep. of coarse grid space
Function u = MG-VMG-V(A,f) if A is small
u A-1f
else u S(f, u) -- steps of smoother (pre) rH PT( f – Au )
uH MG-VMG-V(PTAP, rH ) -- recursionrecursion (Galerkin)
u u + PuH
u S(f, u) -- steps of smoother (post)
Iteration matrix w/ R = PT: T = S ( I - P(RAP)-1RA ) S multiplicative
5
Smoothed Aggregation Coarse grid space & smoother MG method Piecewise constant function: “Plain” agg. (P0)
Start with kernel vectors B of operator eg, 6 RBMs in elasticity
Nodal aggregation B P0
“Smoothed” aggregation: lower energy of functions One Jacobi iteration: P ( I - D-1 A ) P0
6
Outline
Algebraic multigrid (AMG) introduction Micro-FE bone modeling Olympus parallel FE framework Scalability study on IBM SPs Gyrokinetic Particle Simulations of
Turbulent Transport in Burning Plasmas
7
Trabecular Bone
5-mm Cube
Cortical bone
Trabecular bone
8
Micro-Computed Tomography
CT @ 22 m resolution
3D image
Mechanical TestingE, yield, ult, etc.
2.5 mm cube44 m elements
FE mesh
Methods: FE modeling
9
the vertebral body you are showing is pretty healthy from a 80 year old female and it is a T-10 that is thoracic. So it is pretty close to the mid-spine. Usually research is done from T-10 downward to the lumbar vertebral bodies. There are 12 thoracic VB's and 5 lumbar. The numbers go up as you go down.
the vertebral body you are showing is pretty healthy from a 80 year old female and it is a T-10 that is thoracic. So it is pretty close to the mid-spine. Usually research is done from T-10 downward to the lumbar vertebral bodies. There are 12 thoracic VB's and 5 lumbar. The numbers go up as you go down.
10
Motivation• Calibrate material models for continuum elements
– eg, explicit computation of a yield surface
• Validation for low order model• Investigation of effects that are not accessible with lower order models
– role of cortical shell in load carrying of vertebra– effects of drug treatment on continuum properties
1 mm slice from vertebral body
11
Outline
Algebraic multigrid (AMG) introduction Micro-FE bone modeling Olympus parallel FE framework Scalability study on IBM SPs Gyrokinetic Particle Simulations of
Turbulent Transport in Burning Plasmas
12
Athena: Parallel FE ParMetis
Parallel Mesh Partitioner (Univerisity of Minnesota)
Prometheus Multigrid Solver
FEAP Serial general purpose
FE application (University of California)
PETSc Parallel numerical
libraries (Argonne National Labs)
FE MeshInput File
Athena ParMetis
FE input file(in memory)
FE input file(in memory)
Partition to SMPs
Athena Athena ParMetis
File File File File
FEAPFEAPFEAPFEAP Material Card
Silo DBSilo DB
Silo DBSilo DB
Visit
Prometheus
PETScParMetis
METISMETISMETISMETIS
pFEAP
Computational Architecture
Olympus
13
Geometric& Material non-linear
2.25% strain
8 procs.DataStar
(SP4at UCSD)
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
14
ParMetis partitions
15
Outline
Algebraic multigrid (AMG) introduction Micro-FE bone modeling Olympus parallel FE framework Scalability study on IBM SPs Gyrokinetic Particle Simulations of
Turbulent Transport in Burning Plasmas
16
80 µm w/ shell
Vertebral Body With Shell
Large deformation elast. 6 load steps (3% strain) Scaled speedup
~131K dof/processor 7 to 537 million dof 4 to 292 nodes IBM SP Power3
14 of 16 procs/node used Double/Single Colony switch
1780 µm w/o shell
Inexact Newton CG linear solver
Variable tolerance
Smoothed aggregation AMG preconditioner
Nodal block diagonal smoothers: 2nd order Chebeshev (add.) Gauss-Seidel (multiplicative)
Scalability
18
Computational phases Mesh setup (per mesh):
Coarse grid construction (aggregation) Graph processing
Matrix setup (per matrix): Coarse grid operator construction
Sparse matrix triple product RAP (expensive for S.A.)
Subdomain factorizations
Solve (per RHS): Matrix vector products (residuals, grid transfer) Smoothers (Matrix vector products)
19
131K dof / proc - Flops/sec/proc .47 Teraflop/s - 4088 processors
20
Sources of scale inefficiencies in solve phase
7.5M dof 537M dof
#iteration 450 897
#nnz/row 50 68
Flop rate 76 74
#elems/pr 19.3K 33.0K
model 1.00 2.78
Measured 1.00 2.61
21
Strong speedup with 7.5M dof problem (1 to 128 nodes)
22
Outline
Algebraic multigrid (AMG) introduction Micro-FE bone modeling Olympus parallel FE framework Scalability study on IBM SPs Gyrokinetic Particle Simulations of
Turbulent Transport in Burning Plasmas
23
24
Finite Element (FEM) Elliptic Solver Developed for GTC Global Field Aligned Mesh
• FEM adapted for logically non-rectangular grids.
• Need adjustments of elements at different toroidal angles.
• Linear sparse matrix solver– PETSc (ANL)
• Enabled implementing split-weight (Manuilskiy & Lee, POP2000)
– and hybrid electron models (Lin & Chen, PoP2001)
• Ongoing studies of kinetic electron effects on ITG and TEM turbulence
• Ongoing studies of electromagnetic turbulences:
25
Performance
• Multigrid preconditioned Krylov solver– Prometheus (Columbia) & HYPRE (LLNL)
• Scaled speedup– ~38K dof per processor– 1 to 32 processors/plane– 8 planes, 20 time steps, 4 particles per cell
26
Thank YouGordon Bell Prize winner 2004:
Ultrascalable implicit finite element analyses in solid mechanics with over a
half a billion degrees of freedom
M.F. Adams, H.H. Bayraktar,T.M. Keaveny, P. Papadopoulos
ACM/IEEE Proceedings of SC2004: High Performance Networking and Computing
27
Linear solver iterations
Newton
Load
Small (7.5M dof) Large (537M dof)
1 2 3 4 5 1 2 3 4 5 6
1 5 14 20 21 18 5 11 35 25 70 2
2 5 14 20 20 20 5 11 36 26 70 2
3 5 14 20 22 19 5 11 36 26 70 2
4 5 14 20 22 19 5 11 36 26 70 2
5 5 14 20 22 19 5 11 36 26 70 2
6 5 14 20 22 19 5 11 36 26 70 2