epic - efficient local resorting techniques with space ...efficient local resorting techniques...

49
Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky and Annika Fuchs AWI, Tsunami-Modelling-Group The 10th International Workshop on Multiscale (Un-)structured Mesh Numerical Modelling for coastal, shelf and global ocean dynamics Alfred Wegener Institute for Polar and Marine Research Bremerhaven, 22 - 25 August 2011 N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 1 / 35

Upload: others

Post on 27-Mar-2021

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

Efficient Local Resorting Techniqueswith Space Filling Curves

Applied to a Parallel Tsunami Simulation Model

Natalja Rakowsky and Annika FuchsAWI, Tsunami-Modelling-Group

The 10th International Workshop on Multiscale (Un-)structured MeshNumerical Modelling for coastal, shelf and global ocean dynamics

Alfred Wegener Institute for Polar and Marine ResearchBremerhaven, 22 - 25 August 2011

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 1 / 35

Page 2: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

Outline

introducing TsunAWI

motivation for resorting

construction of Hilbert space filling curve (SFC) ordering

comparison to other sortings

conclusions

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 2 / 35

Page 3: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

The AWI Tsunami Modell TsunAWI

TsunAWI in a nutshellshallow water equations with inundation

unstructured P1 − PNC1 finite element grid

explicit time stepping scheme

OpenMP parallel Fortran90 code

Most important application:

German-Indonesian Tsunami Early Warning System

3470 scenarios for different prototypic ruptures3h modeltime (10.800 timesteps of 1s)

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 3 / 35

Page 4: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

The AWI Tsunami Modell TsunAWI

TsunAWI in a nutshellshallow water equations with inundation

unstructured P1 − PNC1 finite element grid

explicit time stepping scheme

OpenMP parallel Fortran90 code

Most important application:

German-Indonesian Tsunami Early Warning System

3470 scenarios for different prototypic ruptures3h modeltime (10.800 timesteps of 1s)

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 3 / 35

Page 5: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 4 / 35

Page 6: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

TsunAWI: example for a computational domainregional grid for the Sunda Arc

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 5 / 35

Page 7: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

TsunAWI: example for a computational domainregional grid for the Sunda Arc

The computational grid discretizes thedomain with

varying resolution50m areas of interest500m all other coastal areas15km deep ocean

2.366.319 nodes

4.721.884 elements

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 6 / 35

Page 8: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

TsunAWI: example for a computational domainregional grid for the Sunda Arc, focus on Bali

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 7 / 35

Page 9: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

TsunAWI: example for a computational domainregional grid for the Sunda Arc, focus on Bali

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 8 / 35

Page 10: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

TsunAWI: example for a computational domainregional grid for the Sunda Arc, focus on Bali

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 9 / 35

Page 11: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

TsunAWI: example for a computational domainregional grid for the Sunda Arc, focus on Bali

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 10 / 35

Page 12: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

TsunAWI: example for a computational domainOriginal numbering of nodes as provided by the grid generator

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 11 / 35

Page 13: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

adjacency matrix, original grid

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 12 / 35

Page 14: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

adjacency matrix, original grid

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 13 / 35

Page 15: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

Motivation for resorting

Data locality on the original grid is very, very bad.

E.g., each computation on all nodes of one element results in atleast one cache miss.

Most time consuming routines in every timestep:

compute velocity at nodes v(node) = F(adjacent edges, elems)

compute velocity v(edge) = F(adjacent elems, nodes)

compute ssh ssh(node) = F(adjacent elems, nodes)

compute gradient gradx ,y (elem) = F(adjacent nodes)

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 14 / 35

Page 16: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

Motivation for resorting

Data locality on the original grid is very, very bad.

E.g., each computation on all nodes of one element results in atleast one cache miss.

Most time consuming routines in every timestep:

compute velocity at nodes v(node) = F(adjacent edges, elems)

compute velocity v(edge) = F(adjacent elems, nodes)

compute ssh ssh(node) = F(adjacent elems, nodes)

compute gradient gradx ,y (elem) = F(adjacent nodes)

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 14 / 35

Page 17: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

Ideas for resorting

SFC like Sierpinski curve in adaptive grid (J. Behrens etal., KlimaCampus Uni Hamburg) could help.But how to derive SFC for highly unstructured grid?

���������@

@@@@@@@@�

����

@@@

@@�����

@@

@@@

���

@@@

@@@

Construct SFC like 3D Hilbert curve in particle codeGadget-2 (communication with T. Rung, TUHamburg-Harburg)

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 15 / 35

Page 18: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

Ideas for resorting

SFC like Sierpinski curve in adaptive grid (J. Behrens etal., KlimaCampus Uni Hamburg) could help.But how to derive SFC for highly unstructured grid?

���������@

@@@@@@@@�

����

@@@

@@�����

@@

@@@

���

@@@

@@@

Construct SFC like 3D Hilbert curve in particle codeGadget-2 (communication with T. Rung, TUHamburg-Harburg)

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 15 / 35

Page 19: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

SFC construction

•n

0

1 2

3

0

1 2

301

2 3

For all nodes n calculatethe index in the Hilbertcurve as a quadnumber:

SFC index(n) =

132. . .

e.g. for 8 levels:

SFC index(n) =

1·48 + 3·47 + 2·46 + . . .

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 16 / 35

Page 20: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

SFC construction

•n

0

1 2

3

0

1 2

301

2 3

For all nodes n calculatethe index in the Hilbertcurve as a quadnumber:

SFC index(n) =

132. . .

e.g. for 8 levels:

SFC index(n) =

1·48 + 3·47 + 2·46 + . . .

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 16 / 35

Page 21: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

SFC construction

•n

0

1 2

3

0

1 2

301

2 3

For all nodes n calculatethe index in the Hilbertcurve as a quadnumber:

SFC index(n) =1

32. . .

e.g. for 8 levels:

SFC index(n) =

1·48 + 3·47 + 2·46 + . . .

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 16 / 35

Page 22: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

SFC construction

•n

0

1 2

3

0

1 2

301

2 3

For all nodes n calculatethe index in the Hilbertcurve as a quadnumber:

SFC index(n) =1

32. . .

e.g. for 8 levels:

SFC index(n) =

1·48 + 3·47 + 2·46 + . . .

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 16 / 35

Page 23: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

SFC construction

•n

0

1 2

3

0

1 2

3

01

2 3

For all nodes n calculatethe index in the Hilbertcurve as a quadnumber:

SFC index(n) =13

2. . .

e.g. for 8 levels:

SFC index(n) =

1·48 + 3·47 + 2·46 + . . .

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 16 / 35

Page 24: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

SFC construction

•n

0

1 2

3

0

1 2

301

2 3

For all nodes n calculatethe index in the Hilbertcurve as a quadnumber:

SFC index(n) =13

2. . .

e.g. for 8 levels:

SFC index(n) =

1·48 + 3·47 + 2·46 + . . .

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 16 / 35

Page 25: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

SFC construction

•n

0

1 2

3

0

1 2

3

01

2 3

For all nodes n calculatethe index in the Hilbertcurve as a quadnumber:

SFC index(n) =132

. . .

e.g. for 8 levels:

SFC index(n) =

1·48 + 3·47 + 2·46 + . . .

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 16 / 35

Page 26: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

SFC construction

•n

0

1 2

3

0

1 2

3

01

2 3

For all nodes n calculatethe index in the Hilbertcurve as a quadnumber:

SFC index(n) =132. . .

e.g. for 8 levels:

SFC index(n) =

1·48 + 3·47 + 2·46 + . . .

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 16 / 35

Page 27: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

SFC construction

•n

0

1 2

3

0

1 2

3

01

2 3

For all nodes n calculatethe index in the Hilbertcurve as a quadnumber:

SFC index(n) =132. . .

e.g. for 8 levels:

SFC index(n) =

1·48 + 3·47 + 2·46 + . . .

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 16 / 35

Page 28: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

SFC reordering

Reorder the nodes according to SFC index.

Reorder the elementsby an SFC separatly, ornumerically by node indicees(more efficient for TsunAWI)

Edges are constructed in TsunAWI (sorted along the nodes)

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 17 / 35

Page 29: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

SFC ordering of the nodesfor TsunAWI regional indonesian grid

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 18 / 35

Page 30: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

SFC ordering of the nodesfor TsunAWI regional indonesian grid

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 19 / 35

Page 31: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

adjacency matrix for SFC sorted grid

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 20 / 35

Page 32: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

adjacency matrix for SFC sorted grid

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 21 / 35

Page 33: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

Comparison: RCM orderingadjacency matrix

RCM (reverse Cuthill McKee) ordering obtained via adjacency matrixand Matlab symrcm for sparse matrices.

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 22 / 35

Page 34: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

Comparison: RCM orderingadjacency matrix

RCM (reverse Cuthill McKee) ordering obtained via adjacency matrixand Matlab symrcm for sparse matrices.

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 23 / 35

Page 35: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

Comparison: RCM orderingfor TsunAWI regional indonesian grid

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 24 / 35

Page 36: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

Comparison: RCM orderingfor TsunAWI regional indonesian grid

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 25 / 35

Page 37: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

Comparison: AMD orderingadjacency matrix

AMD (approximate minimum degree) ordering obtained via adjacencymatrix and Matlab symamd for sparse matrices.

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 26 / 35

Page 38: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

Comparison: AMD orderingadjacency matrix

AMD(approximate minimum degree) ordering obtained via adjacency matrixand Matlab symamd for sparse matrices.

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 27 / 35

Page 39: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

Comparison: AMD orderingfor TsunAWI regional indonesian grid

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 28 / 35

Page 40: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

Comparison: AMD orderingfor TsunAWI regional indonesian grid

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 29 / 35

Page 41: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

SFC compared to unsorted, RCM, SymAMDcomputation time: IBM Power6

Computational time [seconds] for timestep on a cluster node1× IBM Power6 (4 Cores, 2× hyperthreading)

OMP NUM THREADS1 2 4 8

orig. 9.77 4.08 2.91 1.57RCM 2.78 1.77 0.97 0.69AMD 2.76 1.42 0.95 0.66SFC 2.69 1.58 0.92 0.60

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 30 / 35

Page 42: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

SFC compared to unsorted, RCM, SymAMDHardware counters: IBM Power6

IBM Hardware counter hpmcount for 1000 timesteps on1× IBM Power6 (4 Cores, 2× hyperthreading,OMP NUM THREADS=8)

hpmcount event

L2 cache missesNumber of loadsper load miss

orig. 274,478,564,540 17.8RCM 57,244,100,260 64.0AMD 54,709,662,295 65.6SFC 49,980,798,689 88.5

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 31 / 35

Page 43: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

SFC compared to unsorted, RCM, SymAMDcomputation time: Intel Xeon Nehalem-EX

Computational time [seconds] for one timestep onone blade SGI Altix UV (HLRN, ZIB Berlin and RRZN Hannover)2× Intel Xeon 5570 (8 Cores, 2× hyperthreading)

OMP NUM THREADS

32, No

1 2 4 8 16 32

64 First Touch

orig. 3.84 2.16 1.48 0.89 0.52 0.40

1.63 0.51

RCM 1.64 1.12 0.59 0.35 0.20 0.19

0.37 0.32

AMD 1.47 0.77 0.50 0.30 0.18 0.16

0.32 0.19

SFC 1.47 0.90 0.51 0.31 0.17 0.14

0.30 0.18

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 32 / 35

Page 44: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

SFC compared to unsorted, RCM, SymAMDcomputation time: Intel Xeon Nehalem-EX

Computational time [seconds] for one timestep onone blade SGI Altix UV (HLRN, ZIB Berlin and RRZN Hannover)2× Intel Xeon 5570 (8 Cores, 2× hyperthreading)

OMP NUM THREADS

32, No

1 2 4 8 16 32 64

First Touch

orig. 3.84 2.16 1.48 0.89 0.52 0.40 1.63

0.51

RCM 1.64 1.12 0.59 0.35 0.20 0.19 0.37

0.32

AMD 1.47 0.77 0.50 0.30 0.18 0.16 0.32

0.19

SFC 1.47 0.90 0.51 0.31 0.17 0.14 0.30

0.18

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 32 / 35

Page 45: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

SFC compared to unsorted, RCM, SymAMDcomputation time: Intel Xeon Nehalem-EX

Computational time [seconds] for one timestep onone blade SGI Altix UV (HLRN, ZIB Berlin and RRZN Hannover)2× Intel Xeon 5570 (8 Cores, 2× hyperthreading)

OMP NUM THREADS 32, No1 2 4 8 16 32 64 First Touch

orig. 3.84 2.16 1.48 0.89 0.52 0.40 1.63 0.51RCM 1.64 1.12 0.59 0.35 0.20 0.19 0.37 0.32AMD 1.47 0.77 0.50 0.30 0.18 0.16 0.32 0.19SFC 1.47 0.90 0.51 0.31 0.17 0.14 0.30 0.18

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 32 / 35

Page 46: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

Remark on OpenMPimportance of first touch for data locality

allocate(array(dim))

array(:) = 0.

!$OMP PARALLEL DOdo n=1,dimarray(n) = 0.end do!$OMP END PARALLEL DO

!$OMP PARALLEL DOdo n=1,dimarray(n) = ...end do!$OMP END PARALLEL DO

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 33 / 35

Page 47: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

Remark on OpenMPimportance of first touch for data locality

allocate(array(dim))

array(:) = 0.

!$OMP PARALLEL DOdo n=1,dimarray(n) = 0.end do!$OMP END PARALLEL DO

!$OMP PARALLEL DOdo n=1,dimarray(n) = ...end do!$OMP END PARALLEL DO

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 33 / 35

Page 48: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

properties of resorting by a SFC

SFC is a very valuable method, because

it is cheap to compute

provides good data localityon all levels of the memory hierarchy

as domain decomposition, it keeps interfaces small(though not optimal)

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 34 / 35

Page 49: EPIC - Efficient Local Resorting Techniques with Space ...Efficient Local Resorting Techniques with Space Filling Curves Applied to a Parallel Tsunami Simulation Model Natalja Rakowsky

work to do

Influence of SFC ordering onILU based preconditioners

fill-incomputational loadconvergence rate

sparse matrix computations in general

SFC compared to generic partitioning algorithms(MeTiS, scotch,. . . )

TsunAWIfurther optimize OpenMP parallelization

MPI parallelization

N. Rakowsky, A. Fuchs SFC in TsunAWI IMUM 2011, Bremerhaven 35 / 35