parallel iterative solution of the hermite collocation equations on gpus

Post on 06-Jul-2015

66 Views

Category:

Technology

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Parallel Iterative solution of the

Hermite Collocation Equations

on GPUs

Emmanuel N. Mathioudakis

Co-Authors : Elena Papadopoulou – Yiannis Saridakis – Nikolaos Vilanakis

TECHNICAL UNIVERSITY OF CRETE

DEPARTMENT OF SCIENCES

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

73100 CHANIA - CRETE - GREECE

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Talk Overview

Hermite Collocation for elliptic BVPs &

Derivation of the Collocation linear system

Development of a parallel algorithm for the

Schur Complement method on Shared

Memory Parallel Architectures

Parallel implementation on multicore computers

with GPUs

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

( , ) ( , ) , ( , )

( , ) ( , ) , ( , )

u x y f x y x y

u x y g x y x y

L

BBV

P

( , ) ( , ) 0 , ( , ):

( , ) ( , ) 0 , (,

, )

y y yx x xi j i j i j

y y yx x xi j i j i j

ai j

u f

u g

L

B

Hermite Collocation Method

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Hermite Collocation Method…

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Hermite Collocation Method…

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Red – Black Collocation Linear system

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Red – Black Collocation Linear system

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Red – Black Collocation Linear system

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Red – Black Collocation Linear system

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Red – Black Collocation Linear system

with 0

( , ) ( , ) ( , ) , ( , )

( , ) ( , ) , ( , )

u x y u x y f x y x y

u x y g x y x y

2M

od

el

Pro

ble

m

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Helmholtz Collocation

Matrix

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Red – Black Collocation Linear system

The collocation matrix is large, sparse and enjoys no pleasant

properties (e.g. symmetric, definite)

Iterative

+

Parallel

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Iterative Solution

with

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Iterative Solution

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Eigenvalues of Collocation matrix

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Schur Complement Iterative Solution

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Parallel Iterative Solution of Collocation Linear system

on Shared Memory Architectures

Uniform Load Balancing between core threads

Minimal Idle Cycles of core threads

Minimal Communication

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

case of ns = 2p

( )1Rx ( )

2Rx ( )

3Rx ( )

4Rx

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

case of ns = 2p

( )1

Bp

x

( )2

Bp

x

( )3

Bp

x

( )4

Bp

x

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

case of ns = 2p

V1

( )1Rx ( )

2Rx ( )

3Rx ( )

4Rx( )

1B

px

( )

2B

px

( )

3B

px

( )4

Bp

x

V2 V3 V4 V5 V6 V7 V8

V1 V2 V3 V4 V5

Mapping into a fixed size Architecture of N Cores

case of k = 2p/N even 2k virtual threads

l=(j-1)k+1,…,jk

l=2p+(j-1)k+1,…,2p+jk

j=1,…,N

( )Bl

x

( )Rl

x

V2 . . .

Pj

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Parallel Schur Complement Iterative Solution

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Parallel BiCGSTAB

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Parallel BiCGSTAB

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

The Dirichlet Helmholtz Problem

2100( 0.1)2with

( , ) 10 ( ) ( ) , ( , ) [0,1] [0,1]

( ) ( ) x

u x y x y x y

x x x e

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

HP SL390s

6 core Xeon@2.8GHz

24GB memory

Oracle Linux 6.3 x64

PGI 13.5 Fortran

PCI-e gen2 x16

HP SL390s – Tesla M2070 GPUs

+ 2 x

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Realization on HP SL390s Tesla GPU machine Iterations / Error measurements

ns BiCGSTAB

Iterations || b – Axn||2

256 294 6.06e-11

512 589 2.85e-11

1024 1161 1.39e-11

2048 3726 9.59e-12

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Realization on HP SL390s Tesla GPU machine Time measurements

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Realization on HP SL390s Tesla GPU machine Time measurements

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Realization on HP SL390s Tesla GPU machine Time measurements

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Realization on HP SL390s Tesla GPU machine Time measurements

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Realization on HP SL390s Tesla GPU machine Time measurements

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Realization on HP SL390s Tesla GPU machine Time measurements

Conclusions

• A new parallel algorithm implementing the Schur complement with BiCGSTAB iterative method for Hermite Collocation equations has been developed.

• The algorithm is realized on Shared Memory multi-core machines with GPUs .

• A performance acceleration of up to 30% is observed.

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

Future work

• Design an efficient parallel Schur complement algorithm of the Hermite Collocation equations for Multiprocessor /Grid machines with GPUs.

TECHNICAL UNIVERSITY OF CRETE

APPLIED MATHEMATICS AND COMPUTERS LABORATORY

top related