sparsedirectsolversforextreme-scalecomputing · 2017. 3. 5. ·...

55
Sparse Direct Solvers for Extreme-Scale Computing Iain Duff Joint work with Florent Lopez and Jonathan Hogg STFC Rutherford Appleton Laboratory SIAM Conference on Computational Science and Engineering (CSE17). Hilton Atlanta, Atlanta, Georgia, USA. 27 February - 3 March 2017.

Upload: others

Post on 02-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Sparse Direct Solvers for Extreme-Scale Computing

Iain DuffJoint work with Florent Lopez and Jonathan Hogg

STFC Rutherford Appleton Laboratory

SIAM Conference on Computational Science and Engineering (CSE17).Hilton Atlanta, Atlanta, Georgia, USA. 27 February - 3 March 2017.

Page 2: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

I H2020 FET-HPC Project 671633I Funding of around 4M EurosI Partners are

I Umea University, Sweden .. Coordinator: Bo KagstromI University of Manchester, UK .. Jack DongarraI INRIA, Paris, France .. Laura GrigoriI STFC, UK .. Iain Duff

I Started 1 November 2015 (effectively Jan 2016)I 36 months project terminating on 31 October 2018

2 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 3: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

NLAFET Work PackagesI WP1: Management and coordination

I WP2: Dense linear systems and eigenvalue problem solvers

I WP3: Direct solution of sparse linear systems

I WP4: Communication-optimal algorithms for iterative methods

I WP5: Challenging applications– a selectionMaterial science, power systems, study of energy solutions, and data analysis inastrophysics

I WP6: Cross-cutting issuesScheduling and runtime systems, auto-tuning, fault tolerance

I WP7: Dissemination and community outreach

3 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 4: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

NLAFET .. Workpackage 3

T3.1 Lower Bounds on Communication for Sparse Matrices

T3.2 Direct Methods for (Near-)Symmetric Sparse Systems

T3.3 Direct Methods for Highly Unsymmetric Sparse Systems

T3.4 Hybrid Direct-Iterative Methods

4 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 5: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Solution of sparse linear systems

We wish to solve the sparse linear system

Ax = b

where the sparse matrix A is sparse and of large dimension,typically 106 or greater, and we want to solve the system on anextreme scale computer.

5 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 6: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Direct methods

We will consider the factorization:

PrAPc → LU

L : Lower triangular (sparse)U : Upper triangular (sparse)

Permutations Pr and Pc chosen to preserve sparsity and maintain stability

When A is symmetric U = DLT and Pc = PTr

6 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 7: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Sparse direct methods

I Black boxes available

I Complexity can be low. Almost linear storage in 2D

I Routinely solving problems of order in millions

I There can be issues with storage requirement

I Target is half asymptotic speed of machine

7 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 8: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Sparse direct methods

I Black boxes available

I Complexity can be low. Almost linear storage in 2D

I Routinely solving problems of order in millions

I There can be issues with storage requirement

I Target is half asymptotic speed of machine

7 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 9: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Sparse direct methods

I Black boxes available

I Complexity can be low. Almost linear storage in 2D

I Routinely solving problems of order in millions

I There can be issues with storage requirement

I Target is half asymptotic speed of machine

7 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 10: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Sparse direct methods

I Black boxes available

I Complexity can be low. Almost linear storage in 2D

I Routinely solving problems of order in millions

I There can be issues with storage requirement

I Target is half asymptotic speed of machine

7 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 11: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Sparse direct methods

I Black boxes available

I Complexity can be low. Almost linear storage in 2D

I Routinely solving problems of order in millions

I There can be issues with storage requirement

I Target is half asymptotic speed of machine

7 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 12: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Task 3.2 Direct Methods for (Near-)Symmetric Systems

Sparse LLT , LDLT , LU

I Tree-based solvers

I Use DAGs with runtime scheduling systems (WP6)

8 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 13: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Runtime systems

I Within the framework of NLAFET, we are primarily concerned with the runtimesystems

I StarPU using an STF (sequential task flow) model, andI PaRSEC using PTG (parametrized task graph) modelI OpenMP Version 4.0 or above, using task features

I In all cases, we are using a task-based approach and the structure involved is adirected acyclic graph (DAG)

9 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 14: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Sparse factorization

The kernel of most sparse direct codes is a dense factorization.

We feel it is useful to show this via a mini-tutorial

10 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 15: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

× × × • • × •× × × • • • ×× × × • • • •• • • × × × •• • • × × • •× • • × • × ×• × • • • × × 1

3

2

(1,2,3) (4,5)

(6,7)

11 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 16: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

× × × • • × •× × × • • • ×× × × • • • •• • • × × × •• • • × × • •× • • × • × ×• × • • • × × 1

3

2

(1,2,3) (4,5)

(6,7)

At step 1

1 2 3 6 71 × × × ×2 × × × × ×3 × × × × ×

6 × × × × ×7 × × × ×

11 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 17: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

× × × • • × •× × × • • • ×× × × • • • •• • • × × × •• • • × × • •× • • × • × ×• × • • • × × 1

3

2

(1,2,3) (4,5)

(6,7)

At step 2

4 5 64 × × ×5 × × ×

6 × × ×

11 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 18: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

× × × • • × •× × × • • • ×× × × • • • •• • • × × × •• • • × × • •× • • × • × ×• × • • • × × 1

3

2

(1,2,3) (4,5)

(6,7)

At step 3

6 76 × ×7 × ×

+6 7

6 × ×7 × ×

+6

6 × −→6 7

6 × ×7 × ×

11 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 19: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Computation at nodeThe computation at a node involves dense factorization. Pivots are chosen from thetop left block in the picture below but elimination operations are performed on thewhole frontal matrix. Rows and columns of the factors can be stored and the resultingSchur complement (in bottom right) is passed up the tree for future assemblies.

12 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 20: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Sparse parallelism

I There are several levels of parallelism in sparse systems

I Partitioning ... block diagonal or block triangular formI Tree level parallelismI Node parallelism (including multi-threaded BLAS)I Inter-node parallelism

13 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 21: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Sparse parallelism

I There are several levels of parallelism in sparse systems

I Partitioning ... block diagonal or block triangular formI Tree level parallelismI Node parallelism (including multi-threaded BLAS)I Inter-node parallelism

13 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 22: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Sparse parallelism

I There are several levels of parallelism in sparse systemsI Partitioning ... block diagonal or block triangular formI Tree level parallelismI Node parallelism (including multi-threaded BLAS)I Inter-node parallelism

13 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 23: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Tree parallelism

Available nodes at start of factorization. Work corresponding to leaf nodes canproceed immediately and independently.

14 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 24: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Tree parallelism

Situation part way through the elimination. When all children of a node complete thenwork can commence at parent node.

14 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 25: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Node and tree parallelismTree Leaf nodes Top 3 levels

Matrix Order nodes No. Av. size No. Av. size % ops

bratu3d 27 792 12 663 11 132 8 296 37 56cont-300 180 895 90 429 74 673 6 10 846 41cvxqp3 17 500 8 336 6 967 4 48 194 70mario001 38 434 15 480 8 520 4 10 131 25ncvxqp7 87 500 41 714 34 847 4 91 323 61bmw3 2 227 362 14 095 5 758 50 11 1 919 44

Statistics on front sizes in assembly tree. From Duff, Erisman, Reid (2016).

15 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 26: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Node and tree parallelism

I Near the root there is not much tree parallelism but the nodes are large, that isthe dense matrices at these nodes are of large dimension and so there is plenty ofnode parallelism.

I Conversely, near the leaves, there is little node parallelism but plenty of treeparallelism.

16 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 27: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Inter-node parallelism

nb

Part of tree

f

s s

u u

f s

f

ss s

uu u

fs s

f

s s

u u u

f s u f

a

a

a

a

a a

Directed acyclic graph17 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 28: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Sparse Cholesky using runtime systems

100

150

200

250

300

350

400

450

500

550

600

26 28 30 32 34 36 38

GFlop/s

Matrix #

Factorization GFlop/s - 28 cores

MA87SpLLT-STF (OpenMP)SpLLT-STF (StarPU)

18 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 29: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

The main problem with using the runtime systems (for example matrix 15) is thatwhen the tasks are small, the overhead in setting them up in the runtime systempredominates.

We can avoid some of this by grouping together into a single task nodes near theleaves of the tree. We call this tree pruning.

19 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 30: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Tree pruning strategy

20 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 31: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Effect of tree pruning on OpenMP version

100

150

200

250

300

350

400

450

500

550

600

26 28 30 32 34 36 38

GFlop/s

Matrix #

Factorization GFlop/s - 28 cores

MA87SpLLT-STF (OpenMP) SpLLT-STF (OpenMP) \w pruning

21 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 32: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Runs on large three-dimensional problems

0

50

100

150

200

250

300

350

400

450

500

20 40 60 80 100 120 140 160 180

GFlop/s

Mesh size

Poisson 3D - Factor. GFlop/s - 28 cores

MA87SpLLT-STF (OpenMP)

SpLLT-STF w/ pruning (OpenMP)SpLLT-STF (StarPU)

SpLLT-STF w/ pruning (StarPU)

22 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 33: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Scalability

0

100

200

300

400

500

0 5 10 15 20 25 30

GFlop/s

# Cores

Poisson 3D - Factor. GFlop/s

SpLLT-STF (OpenMP), N=60SpLLT-STF (StarPU), N=60

SpLLT-STF (OpenMP), N=100SpLLT-STF (StarPU), N=100SpLLT-STF (OpenMP), N=160SpLLT-STF (StarPU), N=160

23 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 34: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Symmetric indefinite matrices

If the matrix is indefinite then numerical pivoting is needed.

A simple example is the matrix [0 ×× 0

]

24 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 35: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Numerical pivoting in indefinite case

Good news is that we can stably factorize an indefinite matrix using only 1× 1 and2× 2 pivots (Bunch and Kaufmann).

As is standard in sparse factorization, we use threshold rather than partial pivoting sowe want ...

‖Pivot‖ ≥ u × ‖Largest in column‖

where u is the threshold parameter (0 < u ≤ 1).

25 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 36: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Numerical pivoting in indefinite case

21A

A11

Pivots can only be chosen from A11.

Can restrict pivoting to A11 or onlychoose pivots from A11 but then fail iflarge entry in A21 causes pivot to failtest.

Is a real problem when implementingalgorithm on parallel machine.

26 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 37: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Threshold Partial Pivotinga11 x

α x

TPP Algorithm

I Work column by column

I Bring column up-to-date

I Find maximum element α in column of A21

I Pivot test α/a11 < u−1. Accept/reject pivot

Problems

I Very stop-start (one column at a time)

I Communication for every column

27 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 38: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

A Posteriori Pivoting

I Work by blocks of A21

I Every block uses factors of A11

I Every block checks max | l21 | < u−1

I Communication when all blocks are done

I Discard all columns to right of failed entry

I Scaling and ordering ⇒ failed pivots are rare

28 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 39: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Flop rate

Machine achieves 794 Gflop/s on LINPACK

29 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 40: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Compare new code SSIDS with standard TPP code HSL MA97 and with PARDISO asimplemented in MKL.

30 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 41: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Hard indefinite on 28-core machine

0.1

1

10

0 5 10 15 20 25

Sp

eed

up

vsH

SL

MA

97

Problem

SSIDSPARDISO

31 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 42: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Numerical pivoting in indefinite case

I HSL MA97 Threshold partial pivoting

I PARDISO Only chooses pivots from A11

I SSIDS Uses a posteriori pivoting

I parodize these as

I HSL MA97 PAY

I PARDISO PRAY

I SSIDS PLAY

32 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 43: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Numerical pivoting in indefinite case

I HSL MA97 Threshold partial pivoting

I PARDISO Only chooses pivots from A11

I SSIDS Uses a posteriori pivoting

I parodize these as

I HSL MA97 PAY

I PARDISO PRAY

I SSIDS PLAY

32 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 44: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Numerical pivoting in indefinite case

I HSL MA97 Threshold partial pivoting

I PARDISO Only chooses pivots from A11

I SSIDS Uses a posteriori pivoting

I parodize these as

I HSL MA97 PAY

I PARDISO PRAY

I SSIDS PLAY

32 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 45: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Numerical pivoting in indefinite case

I HSL MA97 Threshold partial pivoting

I PARDISO Only chooses pivots from A11

I SSIDS Uses a posteriori pivoting

I parodize these as

I HSL MA97 PAY

I PARDISO PRAY

I SSIDS PLAY

32 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 46: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Numerical pivoting in indefinite case

I HSL MA97 Threshold partial pivoting

I PARDISO Only chooses pivots from A11

I SSIDS Uses a posteriori pivoting

I parodize these as

I HSL MA97 PAY

I PARDISO PRAY

I SSIDS PLAY

32 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 47: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Hard indefinite on 28-core machineMatrix stokes128 cvxqp3 ncvxqp7

Order ×103 49.7 17.5 87.5Entries ×106 0.30 0.07 0.31

Factor timeHSL MA97 0.15 1.52 8.18PARDISO 0.12 0.33 1.50SSIDS V2 0.11 0.29 1.67

Backward errorHSL MA97 1.6 10−15 3.1 10−11 4.4 10−9

PARDISO 3.9 10−3 1.1 10−6 1.4 10−7

SSIDS V2 1.4 10−15 2.0 10−11 7.3 10−9

33 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 48: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Summary

I There is lots of parallelism in sparse direct solvers

I Using a runtime system can compete with hand-coded versions

I Pivoting can be accommodated with little overhead in performance

I We meet our goal of running at half the peak performance

I Programming this is tough

34 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 49: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Summary

I There is lots of parallelism in sparse direct solvers

I Using a runtime system can compete with hand-coded versions

I Pivoting can be accommodated with little overhead in performance

I We meet our goal of running at half the peak performance

I Programming this is tough

34 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 50: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Summary

I There is lots of parallelism in sparse direct solvers

I Using a runtime system can compete with hand-coded versions

I Pivoting can be accommodated with little overhead in performance

I We meet our goal of running at half the peak performance

I Programming this is tough

34 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 51: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Summary

I There is lots of parallelism in sparse direct solvers

I Using a runtime system can compete with hand-coded versions

I Pivoting can be accommodated with little overhead in performance

I We meet our goal of running at half the peak performance

I Programming this is tough

34 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 52: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Summary

I There is lots of parallelism in sparse direct solvers

I Using a runtime system can compete with hand-coded versions

I Pivoting can be accommodated with little overhead in performance

I We meet our goal of running at half the peak performance

I Programming this is tough

34 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 53: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

THANK YOU FOR YOUR ATTENTION

35 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 54: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Direct methods

Published by OUP in 1986

36 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017

Page 55: SparseDirectSolversforExtreme-ScaleComputing · 2017. 3. 5. · SparseDirectSolversforExtreme-ScaleComputing Iain Du Joint work withFlorent Lopez and Jonathan Hogg STFC Rutherford

Direct methods

OXFORD SCIENCE PUBLICATIONS

DU

FF

E

RIS

MA

N

RE

ID

Dire

ct Me

tho

ds fo

r S

parse

Matrice

s

29 780198 508380

ISBN 978-0-19-850838-0

1

NUMERICAL MATHEMATICSAND SCIENTIFIC COMPUTATION

Direct Methods for Sparse Matrices

Second Edition

I. S. DUFF, A. M. ERISMAN, AND J. K. REID

NUMERICAL MATHEMATICS AND SCIENTIFIC COMPUTATION is a series designed to provide texts and monographs for graduate students and researchers on a wide range of mathematical topics at the interface of computational science and numerical analysis.

Direct Methods for Sparse Matrices I. S. Duff, A. M. Erisman, and J. K. Reid

This book is concerned with solving very large sets of linear equations, where each equation involves only a small number of variables. Many applications involve equations of this kind and they often need to be solved repeatedly as their entries change. Very special methods are needed to make these calculations feasible. The authors have been involved in designing special algorithms and writing codes to implement them for over 40 years. This book aims to describe in a clear and simple way those algorithms that have stood the test of time, as well as those that have been developed recently to enable the efficient solution of far larger systems and to take advantage of modern hardware.

Direct Methods for Sparse Matrices, second edition, is a complete rewrite of the first edition, which was published 30 years ago. Much has changed since that time. Problems have grown greatly in size and complexity, and computer architectures are now much more complex, requiring new ways of adapting algorithms to parallel environments with memory hierarchies. Because the area is such an important one to computational science and engineering, a huge amount of research has been done since the first edition. This new research is integrated into the text with a clear explanation of the underlying mathematics and algorithms.

New research described here includes new techniques for scaling and error control, new orderings, new combinatorial techniques for partitioning both symmetric and unsymmetric problems, and a detailed description of the multifrontal approach to solving systems that was pioneered by the research of the authors and colleagues. This includes a discussion of techniques for exploiting parallel architectures and new work for indefinite and unsymmetric systems.

Written to be as accessible as possible, this book will be a useful resource to a wide range of readers: from researchers and practitioners, to applications scientists, to graduate students, and their teachers.

Second Edition

37 / 37Iain Duff, STFC Rutherford Appleton Laboratory MS 101. SIAM Conference on CS & E (CSE17). Atlanta. 28th February 2017