performing acoustic, vibro-acoustic and aero-acoustic...

26
Performing acoustic, vibro-acoustic and aero-acoustic computations using MUMPS Presented By: Eveline Rosseel 29 May 2013 1 FFT Confidential 5/29/2013 29 May 2013

Upload: others

Post on 12-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

Performing acoustic, vibro-acoustic and

aero-acoustic computations using

MUMPS

Presented By: Eveline Rosseel

29 May 2013

1FFT Confidential 5/29/2013

29 May 2013

Page 2: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

• Introduction on Free Field Technologies

• MUMPS in Actran

• Benchmark of sparse direct solvers

• Conclusions

Outline

2FFT Confidential 5/29/2013

Page 3: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

Free Field Technologies: leader in acoustic, vibro-acoustic and aero-acoustic CAE

• Free Field Technologies (FFT) -Software Development since 1998

• Main activities

– Development of the Actran software

– Services: training, consulting,

technology transfer, …

3FFT Confidential 5/29/2013

technology transfer, …

– Research in acoustic CAE

and related fields

• Our customers’ fields

– Automotive

– Aerospace

– Electronic

– Heavy equipment

Page 4: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

Free Field Technologies around the world

Headquarted in Mont-Saint-Guibert, Belgium, FFT has offices in Toulouse, France, Tokyo, Japan, and

Troy, MI, USA.

FFT is part of MSC Software Corporation, international leading provider in Virtual Product

Development technology.

4FFT Confidential 5/29/2013

Our software is distributed in each global region and used by more than 250 customers around the world.

Page 5: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

MUMPS in Actran

5FFT Confidential 5/29/2013

Page 6: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

MUMPS: default solver in Actran

Target applications

• Mostly complex, unsymmetric sparse systems with a symmetric structure

• Up to a few million of DOFs, up to a few 1000 RHS

• Out-of-core computations

• Shared and distributed memory computing

• Application dependent sparsity patterns

• Mostly complex, unsymmetric sparse systems with a symmetric structure

• Up to a few million of DOFs, up to a few 1000 RHS

• Out-of-core computations

• Shared and distributed memory computing

• Application dependent sparsity patterns

6FFT Confidential 5/29/2013

• Application dependent sparsity patterns• Application dependent sparsity patterns

Page 7: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

MUMPS: highlighted experiencesBacktransformation phase

Out-of-core computations: congestion due to shared scratch disk

• Example: frequency parallelism, every proc runs its own MUMPS instance• Example: frequency parallelism, every proc runs its own MUMPS instance

Proc 1

solve freq 1

Memory

proc 1

Proc 2

solve freq 2

Memory

proc 2

Proc n

solve freq n

Memory

proc n

Time Factorize Solve

1 proc

sequential

39 min 7 min

7FFT Confidential 5/29/2013

Configuration: 600 KDOF, 253 RHS, Westmere-ex Intel 2.26 GHz, 4x8 cores, raid-0 sata scratch disk

• Solution: ICNTL(27) and introduction of additional synchronization amongst procs

Configuration: 600 KDOF, 253 RHS, Westmere-ex Intel 2.26 GHz, 4x8 cores, raid-0 sata scratch disk

• Solution: ICNTL(27) and introduction of additional synchronization amongst procs

Scratch disk

sequential

8 procs =

8 sequential

MUMPS

instances

44 min Up to 4.5h

Page 8: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

MUMPS: highlighted experiencesBacktransformation phase

Out-of-core computations

• Reduced I/O

congestion using

ICNTL(27) and

additional

synchronization points

• Optimal value

• Reduced I/O

congestion using

ICNTL(27) and

additional

synchronization points

• Optimal value

8FFT Confidential 5/29/2013

• Optimal value

ICNTL(27)=NRHS

• Additional

synchronization points:

backtransformation

step of processors

sharing the same

scratch is done in

sequential mode

• Optimal value

ICNTL(27)=NRHS

• Additional

synchronization points:

backtransformation

step of processors

sharing the same

scratch is done in

sequential mode

Page 9: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

• Quality of reordered matrix (METIS, SCOTCH, …) influences memory consumption

factorization phase

• Distributed computations:

• Memory consumption peak on proc 0 during sequential analysis phase

surpasses memory consumption parallel factorization phase

• Quality of reordered matrix (METIS, SCOTCH, …) influences memory consumption

factorization phase

• Distributed computations:

• Memory consumption peak on proc 0 during sequential analysis phase

surpasses memory consumption parallel factorization phase

MUMPS: highlighted experiencesAnalysis phase

9FFT Confidential 5/29/2013

Analysis phase

Factorization

Configuration:

1.9 MDOF, 1 RHS

SCOTCH ordering

Out-of-core run on

Westmere-ex

2.4GHz processor

with 4x10 cores

and 256 GB RAM

Page 10: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

MUMPS: highlighted experiencesAnalysis phase

Scalability of MPI computations: need for parallel analysis

• Avoid memory consumption peak at sequential analysis phase

by using a parallel analysis phase: PT-Scotch or Parmetis

• Avoid memory consumption peak at sequential analysis phase

by using a parallel analysis phase: PT-Scotch or Parmetis

Configuration:

1.9 MDOF, 1 RHS

PT-SCOTCH

ordering

10FFT Confidential 5/29/2013

• Problem: time and memory consumption has increased compared to run with Scotch

-> scalability issue parallel analysis

• Problem: time and memory consumption has increased compared to run with Scotch

-> scalability issue parallel analysis

ordering

Out-of-core run on

Westmere-ex

2.4GHz processor

with 4x10 cores and

256 GB RAMFactorization

Analysis phase

Page 11: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

MUMPS in Actran:future plans

• Increasing model sizes:

• Need for 64bit integer version

• Increasing number of nodes in distributed memory computing:

• Need for robust parallel analysis

• More investigations on hybrid iterative/direct solver use

• Increasing model sizes:

• Need for 64bit integer version

• Increasing number of nodes in distributed memory computing:

• Need for robust parallel analysis

• More investigations on hybrid iterative/direct solver use

11FFT Confidential 5/29/2013

• More investigations on hybrid iterative/direct solver use• More investigations on hybrid iterative/direct solver use

Page 12: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

Benchmark of sparse direct solvers

12FFT Confidential 5/29/2013

Page 13: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

Overview solvers

MUMPS (4.10.0) Pardiso 10.3 Intel MKL UMFPACK (5.6.2)

out-of-core ✓ ✓ X

multithreading BLAS ✓ BLAS

MPI support ✓ X X

Motivation: assess performance of MUMPS with respect to other sparse direct solvers

13FFT Confidential 5/29/2013

MPI support ✓ X X

ordering

(PAR)Metis, (PT)Scotch,

PORD, (Q)AMD, AMF MD or Metis

(COL)AMD, Metis or

NESDIS

multiple rhs

block approach

(block size)

(no block size) X

iterative

refinement ✓ ✓ ✓single/double

precision ✓ ✓ ✓

Page 14: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

Solver benchmark settings

• Sequential and multithreaded tests (no MPI)

• Ordering METIS

• Pivot threshold 0.01

• Double precision

• Internal RHS block size 16

• No iterative refinement

• Sequential and multithreaded tests (no MPI)

• Ordering METIS

• Pivot threshold 0.01

• Double precision

• Internal RHS block size 16

• No iterative refinement

14FFT Confidential 5/29/2013

• No iterative refinement

• Memory relaxation 20%

• No iterative refinement

• Memory relaxation 20%

Page 15: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

Acoustic test cases

� Represent 20-30% of our customers (mainly automotive)

� All tests successful

ACOUSTIC RADIATION Symmetric? NDOF NRHS

IFEM-VS No 15.6K 1

PML-DC YES 280K 1

15FFT Confidential 5/29/2013

PML-DC YES 280K 1

IFEM-DC No 405K 1

RC_Indus_hpc_MUMPS case 3 (MB) No 730K 1

RC_Indus_hpc_MUMPS case 2 (IFE) No 872K 1

PML-DF YES 1.05M 1

IFEM-DF No 1.38M 1

RC_Indus_hpc_MUMPS case 1 (IFE) No 1.90M 3

Page 16: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

Memory consumption: in-core

Pardiso MKL and MUMPS: comparable memory

requirements

Pardiso MKL: lowest

memory consumption

16FFT Confidential 5/29/2013

memory consumption

UMFPACK: highest memory

consumption, large difference

with other solvers

Page 17: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

Memory consumption: out-of-core

Large difference between memory

consumption of OOC Pardiso and

OOC MUMPS on acoustic tests

17FFT Confidential 5/29/2013

Page 18: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

Computational cost: sequential runs

MUMPS: lowest overall factorization

time for the sequential runs

UMFPACK: very high factorization

time on largest test case

18FFT Confidential 5/29/2013

Page 19: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

Computational cost: multithreaded runs

Absolute timings Parallel efficiency

19FFT Confidential 5/29/2013

Pardiso MKL: nearly optimal parallel efficiency

UMFPACK: very high computing times

Page 20: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

Vibro-acoustic test cases

� Represent 50-60% of our customers

MUMPS PARDISO MKL UMFPACK

RC_Indus_hpc_MUMPS 4 OK OK OK

276 KDOF, 2 RHS

20FFT Confidential 5/29/2013

276 KDOF, 2 RHS

Ship OK OK OK

410 KDOF, 3 RHS

RC_Indus_hpc_MUMPS 5

1.86 MDOF, 1 RHS OK OK OK

Pl case

1.75 MDOF, 20 RHS OK

IC: zero pivot

error – OOC: OK

Out of memory

(> 250 GB)

Cockpit

3.09 MDOF, 50 RHS

IC: memory

allocation error OK

N.A. (symmetric

matrix)

Page 21: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

Vibro-acoustic test case RC_Indus_hpc_MUMPS_C5

Pardiso MKL MUMPS UMFPACK

IN-CORE 47.3 52.2 76.9

Pardiso MKL MUMPS

Peak memory consumption (Gbyte)

21FFT Confidential 5/29/2013

OUT-OF-CORE 10.9 11.7

� IC: Pardiso lowest memory requirements – UMFPACK highest

Computation time

� Same trend as for pure acoustic problems: MUMPS has lowest factorization time

� UMFPACK: largest sequential computation time

Page 22: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

TM test cases

� Represent 10-15% of our customers

MUMPS PARDISO MKL UMFPACK

Inlet-Nacelle

542 KDOF, 44 RHS OK OK OK

Inlet-APU

22FFT Confidential 5/29/2013

Inlet-APU

818 KDOF, 7 RHS OK OK OK

By-pass DUCT

596 KDOF, 253 RHS OK OK OK

By-pass

1.54 MDOF, 521 RHS OK OK OK

Inlet-Nacelle

3.02 MDOF, 151 RHS

IC: memory

allocation error,

OOC: OK OK

Out of memory (>

250 GB)

Page 23: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

TM test cases: memory

Inlet Nacelle

600KDOF

ByPass Duct

600KDOF

23FFT Confidential 5/29/2013

Page 24: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

Real, symmetric test cases

� Low importance to our customers (5%)

� 3 pure vibro test cases (316 to 733 KDOF) and 1 pure acoustic test (419KDOF)

� MUMPS: large memory requirements on

pure acoustic test case w.r.t. Pardiso MKL

� Pardiso: large memory requirements on

pure vibro test cases

24FFT Confidential 5/29/2013

pure vibro test cases

Page 25: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

Conclusions

25FFT Confidential 5/29/2013

Page 26: Performing acoustic, vibro-acoustic and aero-acoustic ...mumps.enseeiht.fr/doc/ud_2013/slides_rosseel.pdf · Pardiso MKL MUMPS UMFPACK IN-CORE 47.3 52.2 76.9 Pardiso MKL MUMPS Peak

• UMFPACK:

• Many restrictions: in-core, only 1 RHS at a time, non-symmetric matrices

• UMFPACK:

• Many restrictions: in-core, only 1 RHS at a time, non-symmetric matrices

Performing acoustic, vibro-acoustic and aero-acoustic simulations with MUMPS

MUMPS in Actran

Solver benchmarks

• Default solver in Actran: very good results obtained -- thanks to MUMPS developers!

• Interested in more extensive use of parallel analysis tool

• Default solver in Actran: very good results obtained -- thanks to MUMPS developers!

• Interested in more extensive use of parallel analysis tool

26FFT Confidential 5/29/2013

• No improvement of MUMPS: excessive memory consumption and computation time

• Pardiso:

• Low memory requirements: especially on OOC acoustic test cases

• Good multithreaded behaviour: almost optimal scalability

• MUMPS:

• Fast solver: overall the lowest factorization time

• Low memory requirements for out-of-core version, especially on vibro(-acoustic) tests

• No improvement of MUMPS: excessive memory consumption and computation time

• Pardiso:

• Low memory requirements: especially on OOC acoustic test cases

• Good multithreaded behaviour: almost optimal scalability

• MUMPS:

• Fast solver: overall the lowest factorization time

• Low memory requirements for out-of-core version, especially on vibro(-acoustic) tests