solving pdes on supercomputers i: modern supercomputer ... · solving pdes on supercomputers i:...

163
Supercomputer architecture Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell MMSC: Python in Scientific Computing May 17, 2015 P. E. Farrell (Oxford) SPS I May 17, 2015 1 / 17

Upload: lyminh

Post on 08-May-2018

247 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Supercomputer architecture

Solving PDEs on Supercomputers I:modern supercomputer architecture

Patrick Farrell

MMSC: Python in Scientific Computing

May 17, 2015

P. E. Farrell (Oxford) SPS I May 17, 2015 1 / 17

Page 2: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Supercomputer architecture

Moore’s Law

Moore’s Law

The number of transistors per unit area on integrated circuitsdoubles every two years. (1965)

P. E. Farrell (Oxford) SPS I May 17, 2015 2 / 17

Page 3: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Supercomputer architecture

Moore’s Law

The consequence

Individual computers aren’t getting faster: we’re getting more of them.

P. E. Farrell (Oxford) SPS I May 17, 2015 3 / 17

Page 4: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Supercomputer architecture

A modern supercomputer

In this lecture we will give a brief overview of modern supercomputerarchitecture.

ARCHER is composed of 4920 nodes, each with 24 cores, for a total of118,080 cores.

P. E. Farrell (Oxford) SPS I May 17, 2015 4 / 17

Page 5: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Supercomputer architecture

A node

Algorithmic consequence

Extreme pressure on memory and memory bandwidth.

P. E. Farrell (Oxford) SPS I May 17, 2015 5 / 17

Page 6: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Supercomputer architecture

A node

Algorithmic consequence

Extreme pressure on memory and memory bandwidth.

P. E. Farrell (Oxford) SPS I May 17, 2015 5 / 17

Page 7: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Supercomputer architecture

A socket

Algorithmic consequence

Want to have multiple cores working on the same data.

P. E. Farrell (Oxford) SPS I May 17, 2015 6 / 17

Page 8: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Supercomputer architecture

A socket

Algorithmic consequence

Want to have multiple cores working on the same data.

P. E. Farrell (Oxford) SPS I May 17, 2015 6 / 17

Page 9: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Supercomputer architecture

A core

Algorithmic consequence

Vectorisation essential for maximum floating point performance.

P. E. Farrell (Oxford) SPS I May 17, 2015 7 / 17

Page 10: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Supercomputer architecture

A core

Algorithmic consequence

Vectorisation essential for maximum floating point performance.

P. E. Farrell (Oxford) SPS I May 17, 2015 7 / 17

Page 11: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Supercomputer architecture Hardware properties

Some relative timings

On a 3.0 GHz Intel Core 2 Duo E8400:

I One clock cycle: ∼ 1/3 nanoseconds (∼ 10 light-cm!).I Accessing L1 data cache (32 KB): 3 cyclesI Accessing L2 cache (6 MB): 14 cyclesI Accessing main memory: ∼ 250 cyclesI Accessing disk: ∼ 40 million cycles

P. E. Farrell (Oxford) SPS I May 17, 2015 8 / 17

Page 12: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Supercomputer architecture Hardware properties

Some relative timings

On a 3.0 GHz Intel Core 2 Duo E8400:

I One clock cycle: ∼ 1/3 nanoseconds (∼ 10 light-cm!).I Accessing L1 data cache (32 KB): 3 cyclesI Accessing L2 cache (6 MB): 14 cyclesI Accessing main memory: ∼ 250 cyclesI Accessing disk: ∼ 40 million cycles

Analogy

I Register: the data is on your working paper.

I L1 cache: the data is on your desk (3 seconds).

I L2 cache: the data is on your bookshelf (14 seconds).

I Main memory: the data is in the library (a 4 minute walk).

I Disk: go backpacking for 1.2 years.

P. E. Farrell (Oxford) SPS I May 17, 2015 8 / 17

Page 13: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Supercomputer architecture Hardware properties

Some relative timings

On a 3.0 GHz Intel Core 2 Duo E8400:

I One clock cycle: ∼ 1/3 nanoseconds (∼ 10 light-cm!).I Accessing L1 data cache (32 KB): 3 cyclesI Accessing L2 cache (6 MB): 14 cyclesI Accessing main memory: ∼ 250 cyclesI Accessing disk: ∼ 40 million cycles

Analogy

I Register: the data is on your working paper.

I L1 cache: the data is on your desk (3 seconds).

I L2 cache: the data is on your bookshelf (14 seconds).

I Main memory: the data is in the library (a 4 minute walk).

I Disk: go backpacking for 1.2 years.

P. E. Farrell (Oxford) SPS I May 17, 2015 8 / 17

Page 14: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Supercomputer architecture Hardware properties

The interconnect

P. E. Farrell (Oxford) SPS I May 17, 2015 9 / 17

Page 15: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Supercomputer architecture Hardware properties

Some more timings

On the Cray Aries interconnect, to send a message:

I Within a socket: 800 cycles

I Within a node: 1600 cycles

I Across the machine: 8000 cycles

Algorithmic consequence

Interleave communication and computation.

P. E. Farrell (Oxford) SPS I May 17, 2015 10 / 17

Page 16: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Supercomputer architecture Hardware properties

Some more timings

On the Cray Aries interconnect, to send a message:

I Within a socket: 800 cycles

I Within a node: 1600 cycles

I Across the machine: 8000 cycles

Algorithmic consequence

Interleave communication and computation.

P. E. Farrell (Oxford) SPS I May 17, 2015 10 / 17

Page 17: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

MPI and OpenMP

Domain decomposition

The coarsest level of parallelism used is domain decomposition over MPI.

from dolfin import *

mesh = UnitCubeMesh(32, 32, 32)

partitioning = CellFunction("size_t", mesh)

partitioning.set_all(MPI.rank(mpi_comm_world()))

File("output/partitioning.xdmf") << partitioning

$ mpiexec -n 4 python partition.py

P. E. Farrell (Oxford) SPS I May 17, 2015 11 / 17

Page 18: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

MPI and OpenMP

MPI: basic model

MPI

Separate processes with separate memory spaces communicate viamessage passing.

MPI concepts:

I communicator

I collective

I rank

I blocking and nonblocking communication

I reductions

Each subdomain is assigned to one MPI rank.

P. E. Farrell (Oxford) SPS I May 17, 2015 12 / 17

Page 19: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

MPI and OpenMP

Main communication patterns in finite elements

Assembly

Assembly requires exchanging halo data with your neighbours.

processor 0

processor 1

core

ow

ned

exec

no

n-e

xec

core

ow

ned

exec

no

n-e

xec

halos

P. E. Farrell (Oxford) SPS I May 17, 2015 13 / 17

Page 20: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

MPI and OpenMP

Main communication patterns in finite elements

Krylov solvers

I Neighbour communications for sparse matrix-vector product.

I Global reductions (allreduce for dot products)

I Preconditioner application

I Multigrid: extremely complicated.

P. E. Farrell (Oxford) SPS I May 17, 2015 14 / 17

Page 21: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

MPI and OpenMP

OpenMP: basic model

OpenMP

Separate threads operate on the same memory space.

I Less overhead in parallelexecution

I Multiple cores can act on thesame data

I Less pressure on memory andmemory bandwidth

I Easier load balancing

I Extremely difficult to programcorrectly

I Subtle race conditions possible

I Colouring and locks required tosynchronise

P. E. Farrell (Oxford) SPS I May 17, 2015 15 / 17

Page 22: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

MPI and OpenMP

DOLFIN can also run in OpenMP mode for assembly:

from dolfin import *

parameters["num_threads"] = 4

# ...

solve(F == 0, u) # must use a threaded solver

# (e.g. pastix)!

You can’t use MPI and OpenMP at the same time (yet).

P. E. Farrell (Oxford) SPS I May 17, 2015 16 / 17

Page 23: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Algorithmic consequences

General algorithmic consequences

I Need algorithms with high arithmetical intensity.

I Caches greatly dislike unstructured memory accesses.

I Flops are (approximately) free.

I Large stencils induce extra communication.

I Must overlap communication and computation.

I Solver algorithms must be O(n) or O(nlogn).

General algorithmic trends

I Domain-decomposed high-order FE on semi-structured meshes.

I Multigrid/multilevel solvers with Krylov accelerators.

I Hybrid parallelism strategies (MPI/OpenMP/AVX).

P. E. Farrell (Oxford) SPS I May 17, 2015 17 / 17

Page 24: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Algorithmic consequences

General algorithmic consequences

I Need algorithms with high arithmetical intensity.

I Caches greatly dislike unstructured memory accesses.

I Flops are (approximately) free.

I Large stencils induce extra communication.

I Must overlap communication and computation.

I Solver algorithms must be O(n) or O(nlogn).

General algorithmic trends

I Domain-decomposed high-order FE on semi-structured meshes.

I Multigrid/multilevel solvers with Krylov accelerators.

I Hybrid parallelism strategies (MPI/OpenMP/AVX).

P. E. Farrell (Oxford) SPS I May 17, 2015 17 / 17

Page 25: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Algorithmic consequences

General algorithmic consequences

I Need algorithms with high arithmetical intensity.

I Caches greatly dislike unstructured memory accesses.

I Flops are (approximately) free.

I Large stencils induce extra communication.

I Must overlap communication and computation.

I Solver algorithms must be O(n) or O(nlogn).

General algorithmic trends

I Domain-decomposed high-order FE on semi-structured meshes.

I Multigrid/multilevel solvers with Krylov accelerators.

I Hybrid parallelism strategies (MPI/OpenMP/AVX).

P. E. Farrell (Oxford) SPS I May 17, 2015 17 / 17

Page 26: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Algorithmic consequences

General algorithmic consequences

I Need algorithms with high arithmetical intensity.

I Caches greatly dislike unstructured memory accesses.

I Flops are (approximately) free.

I Large stencils induce extra communication.

I Must overlap communication and computation.

I Solver algorithms must be O(n) or O(nlogn).

General algorithmic trends

I Domain-decomposed high-order FE on semi-structured meshes.

I Multigrid/multilevel solvers with Krylov accelerators.

I Hybrid parallelism strategies (MPI/OpenMP/AVX).

P. E. Farrell (Oxford) SPS I May 17, 2015 17 / 17

Page 27: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Algorithmic consequences

General algorithmic consequences

I Need algorithms with high arithmetical intensity.

I Caches greatly dislike unstructured memory accesses.

I Flops are (approximately) free.

I Large stencils induce extra communication.

I Must overlap communication and computation.

I Solver algorithms must be O(n) or O(nlogn).

General algorithmic trends

I Domain-decomposed high-order FE on semi-structured meshes.

I Multigrid/multilevel solvers with Krylov accelerators.

I Hybrid parallelism strategies (MPI/OpenMP/AVX).

P. E. Farrell (Oxford) SPS I May 17, 2015 17 / 17

Page 28: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Algorithmic consequences

General algorithmic consequences

I Need algorithms with high arithmetical intensity.

I Caches greatly dislike unstructured memory accesses.

I Flops are (approximately) free.

I Large stencils induce extra communication.

I Must overlap communication and computation.

I Solver algorithms must be O(n) or O(nlogn).

General algorithmic trends

I Domain-decomposed high-order FE on semi-structured meshes.

I Multigrid/multilevel solvers with Krylov accelerators.

I Hybrid parallelism strategies (MPI/OpenMP/AVX).

P. E. Farrell (Oxford) SPS I May 17, 2015 17 / 17

Page 29: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Algorithmic consequences

General algorithmic consequences

I Need algorithms with high arithmetical intensity.

I Caches greatly dislike unstructured memory accesses.

I Flops are (approximately) free.

I Large stencils induce extra communication.

I Must overlap communication and computation.

I Solver algorithms must be O(n) or O(nlogn).

General algorithmic trends

I Domain-decomposed high-order FE on semi-structured meshes.

I Multigrid/multilevel solvers with Krylov accelerators.

I Hybrid parallelism strategies (MPI/OpenMP/AVX).

P. E. Farrell (Oxford) SPS I May 17, 2015 17 / 17

Page 30: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Solving PDEs on Supercomputers II:practical matters of using supercomputers

Patrick Farrell

MMSC: Python in Scientific Computing

May 17, 2015

P. E. Farrell (Oxford) SPS 2 May 17, 2015 1 / 7

Page 31: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Logging on

Supercomputers are accessed by sshing to the login nodes.

$ ssh [email protected]

You configure your environment with modules:

$ module list

No Modulefiles Currently Loaded.

$ module avail

...

$ module use -a /data/math-farrellp/crichardson/modules

$ module load fenics/1.5.0

$ module list

Modules are generally awful, but nothing better exists yet.

P. E. Farrell (Oxford) SPS 2 May 17, 2015 2 / 7

Page 32: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Running jobs interactively

The simplest way to run a job is interactively. This is mainly used fordebugging.

$ qsub -I -l nodes=1:ppn=16 -l walltime=0:10:00 -q develq

qsub: waiting for job 312485.headnode1.arcus.osc.local to start

# wait until PBS allocates us the resources we asked for ...

qsub: job 312485.headnode1.arcus.osc.local ready

$ cd $PBS_O_WORKDIR

$ module use -a /data/math-farrellp/crichardson/modules

$ module load fenics/1.5.0

$ mpirun $MPI_HOSTS python poisson.py

P. E. Farrell (Oxford) SPS 2 May 17, 2015 3 / 7

Page 33: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Running jobs in batch mode

ARCUS-A and ARCHER are managed using PBS, the Portable BatchSystem. Users submit jobs to the batch system which decides when andwhere they get executed.

The main PBS commands:

I qsub

I qdel

I qstat

The argument to qsub is a PBS script.

P. E. Farrell (Oxford) SPS 2 May 17, 2015 4 / 7

Page 34: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Running jobs in batch mode

#!/bin/bash

# set the number of nodes and processes per node

#PBS -l nodes=1:ppn=16

# set max wallclock time

#PBS -l walltime=1:00:00

# set name of job

#PBS -N poisson

# mail alert at start, end and abortion of execution

#PBS -m bea

# send mail to this address

#PBS -M [email protected]

# start job from the directory it was submitted

cd $PBS_O_WORKDIR

module use -a /data/math-farrellp/crichardson/modules

module load fenics/1.5.0

. enable_arcus_mpi.sh

mpirun $MPI_HOSTS python poisson.py | tee poisson.log

P. E. Farrell (Oxford) SPS 2 May 17, 2015 5 / 7

Page 35: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

HPC 02 Challenge!

Investigate the weak scaling of the 2D Poisson solver with parallel LU thatyou developed last week:

I Have the code refine the mesh once each time the number of coresquadruples.Hint:size = MPI.size(mpi_comm_world())

...

for i in nrefine:

mesh = refine(mesh, redistribute=False)

I Run the code on 1, 4 and 16 cores. What happens to the runtime asthe problem is scaled weakly?

I . . .

P. E. Farrell (Oxford) SPS 2 May 17, 2015 6 / 7

Page 36: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

HPC 02 Challenge!

Which components of the solver are taking the longest?Profile the code with

I DOLFIN timing system: list timings()

I PETSc timing system:

import petsc4py

petsc4py.init("-log_summary summary.log".split())

from dolfin import *

I Now switch to HYPRE algebraic multigrid and compare the timingsagain. Hint: to get more details about the AMG solve, call

PETScOptions.set("pc_hypre_boomeramg_print_statistics", 1)

P. E. Farrell (Oxford) SPS 2 May 17, 2015 7 / 7

Page 37: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Solving PDEs on Supercomputers III:an introduction to PETSc

Patrick Farrell

MMSC: Python in Scientific Computing

May 17, 2015

P. E. Farrell (Oxford) SPS 3 May 17, 2015 1 / 5

Page 38: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

PETSc

PETSc is a library of linear and nonlinear solvers for sparse PDEs.

It has won most awards going:

I SIAM/ACM Prize in Computational Science and Engineering, 2015

I 2009 R&D Award

I Gordon Bell Prizes in 2009, 2004, 2003, 1999

I . . .

PETSc makes it easy to express complex hierarchical composed solvers ascompactly as possible.

P. E. Farrell (Oxford) SPS 3 May 17, 2015 2 / 5

Page 39: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Fundamental objects

[Vec, Mat, PC, KSP, SNES]

Vec

Vec represents a dense vector, decomposed in parallel.

Example

ierr = VecCreateMPI(PETSC COMM WORLD, local, global, &x);

ierr = VecDuplicate(x, &y);

ierr = VecDotBegin(x, y, &xTy);

/* other computations */

ierr = VecDotEnd(x, y, &xTy);

P. E. Farrell (Oxford) SPS 3 May 17, 2015 3 / 5

Page 40: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Fundamental objects

[Vec, Mat, PC, KSP, SNES]

Mat

Mat represents a sparse matrix, decomposed in parallel.

Example

ierr = MatCreateAIJ(PETSC COMM WORLD, ..., &mat);

for (i = 0; i < local rows; i++)

ierr = MatSetValues(mat, ...);

ierr = MatAssemblyBegin(mat, MAT FINAL ASSEMBLY);

ierr = MatAssemblyEnd(mat, MAT FINAL ASSEMBLY);

ierr = MatMult(mat, x, y);

P. E. Farrell (Oxford) SPS 3 May 17, 2015 3 / 5

Page 41: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Fundamental objects

[Vec, Mat, PC, KSP, SNES]

PC

PC represents a linear preconditioner (Jacobi, Gauss-Seidel, ILU, ICC,AMG, additive Schwarz, ...)

Example

ierr = PCCreate(PETSC COMM WORLD, &pc);

ierr = PCSetOperators(pc, A, P);

ierr = PCSetType(pc, PCILU);

ierr = PCSetUp(pc);

ierr = PCApply(pc, x, y);

P. E. Farrell (Oxford) SPS 3 May 17, 2015 3 / 5

Page 42: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Fundamental objects

[Vec, Mat, PC, KSP, SNES]

KSP

KSP represents a linear solver (CG, GMRES, TFQMR, BICGSTAB,MINRES, GCR, Richardson, Chebyshev, ...)

Example

ierr = KSPCreate(PETSC COMM WORLD, &ksp);

ierr = KSPSetOperators(ksp, A, P);

ierr = KSPSetType(ksp, KSPCG);

ierr = KSPSetUp(ksp);

ierr = KSPSolve(ksp, b, x);

P. E. Farrell (Oxford) SPS 3 May 17, 2015 3 / 5

Page 43: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Fundamental objects

[Vec, Mat, PC, KSP, SNES]

SNES

SNES represents a nonlinear solver (Newton, reduced-space Newton,NGMRES, NCG, Anderson acceleration, FAS, ...)

Example

ierr = SNESCreate(PETSC COMM WORLD, &snes);

ierr = SNESSetFunction(snes, r, residual);

ierr = SNESSetJacobian(snes, J, P, jacobian);

ierr = SNESSetType(snes, SNESVINEWTONRSLS);

ierr = SNESSetVariableBounds(snes, xl, xu);

ierr = SNESSetUp(snes);

ierr = SNESSolve(snes, b, x);

P. E. Farrell (Oxford) SPS 3 May 17, 2015 3 / 5

Page 44: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Hierarchical composition

Principle

All objects are composable.

Principle

All objects are configurable.

(example from variational fracture mechanics)

P. E. Farrell (Oxford) SPS 3 May 17, 2015 4 / 5

Page 45: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Hierarchical composition

Principle

All objects are composable.

Principle

All objects are configurable.

(example from variational fracture mechanics)

P. E. Farrell (Oxford) SPS 3 May 17, 2015 4 / 5

Page 46: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Hierarchical composition

Principle

All objects are composable.

Principle

All objects are configurable.

(example from variational fracture mechanics)

P. E. Farrell (Oxford) SPS 3 May 17, 2015 4 / 5

Page 47: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Wiring PETSc and FEniCS

We’re going to need fine control to design our solvers.

A simple interface between FEniCS and PETSc:

$ git clone https://bitbucket.org/pefarrell/dolfin-snes-interface.git

P. E. Farrell (Oxford) SPS 3 May 17, 2015 5 / 5

Page 48: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Solving PDEs on Supercomputers IV:algebraic multigrid

Patrick Farrell

MMSC: Python in Scientific Computing

May 18, 2015

P. E. Farrell (Oxford) SPS 4 May 18, 2015 1 / 13

Page 49: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Multilevel solvers

At the core of most PDE solvers is the solution of a linear system

Linear system

Ax = b

The most powerful solvers for PDEs exploit the fact that there exists aninfinite hierarchy of discretisations, all approximating the same problem:

Hierarchy of linear systems

· · ·Ahxh = bh

A2hx2h = b2h

A4hx4h = b4h

· · ·P. E. Farrell (Oxford) SPS 4 May 18, 2015 2 / 13

Page 50: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Geometric multigrid: review

Geometric multigrid algorithm

I Begin with an initial guess.

I Apply a relaxation method to smooth the error.

I Solve for the smooth error on a coarse grid.

P. E. Farrell (Oxford) SPS 4 May 18, 2015 3 / 13

Page 51: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Geometric multigrid: review

Geometric multigrid algorithm

I Begin with an initial guess.

I Apply a relaxation method to smooth the error.

I Solve for the smooth error on a coarse grid.

P. E. Farrell (Oxford) SPS 4 May 18, 2015 3 / 13

Page 52: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Geometric multigrid: review

Geometric multigrid algorithm

I Begin with an initial guess.

I Apply a relaxation method to smooth the error.

I Solve for the smooth error on a coarse grid.

P. E. Farrell (Oxford) SPS 4 May 18, 2015 3 / 13

Page 53: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Why did geometric multigrid work?

Geometric multigrid worked on the Laplacian because:

I simple relaxation methods yielded geometrically smooth errors;

I those errors could be well-represented on coarse grids.

What about problems where the error isn’t smooth after relaxation?

P. E. Farrell (Oxford) SPS 4 May 18, 2015 4 / 13

Page 54: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Why did geometric multigrid work?

Geometric multigrid worked on the Laplacian because:

I simple relaxation methods yielded geometrically smooth errors;

I those errors could be well-represented on coarse grids.

What about problems where the error isn’t smooth after relaxation?

Anisotropic Laplacian

−auxx − buyy = f in Ω = [0, 1]2

u = g on ∂Ω

a = b if x < 1/2

a b if x ≥ 1/2.

P. E. Farrell (Oxford) SPS 4 May 18, 2015 4 / 13

Page 55: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Why did geometric multigrid work?

Geometric multigrid worked on the Laplacian because:

I simple relaxation methods yielded geometrically smooth errors;

I those errors could be well-represented on coarse grids.

What about problems where the error isn’t smooth after relaxation?

P. E. Farrell (Oxford) SPS 4 May 18, 2015 4 / 13

Page 56: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Two responses

GMG:

I design increasingly arcane relaxation methods that do smooth;

I semi-coarsening, multi-coarsening, etc.

AMG:

I fix a simple relaxation method;

I algebraically construct coarse grids and interpolation operators;

I demand that these can well represent the error after relaxation.

A nice side effect: AMG requires much less infrastructure:

I No need to supply coarse grids

I No need to supplyinterpolation operators

I Only applies to linear problems

I Requires global linearisation(memory)

I Requires near-nullspace ofoperator

P. E. Farrell (Oxford) SPS 4 May 18, 2015 5 / 13

Page 57: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Two responses

GMG:

I design increasingly arcane relaxation methods that do smooth;

I semi-coarsening, multi-coarsening, etc.

AMG:

I fix a simple relaxation method;

I algebraically construct coarse grids and interpolation operators;

I demand that these can well represent the error after relaxation.

A nice side effect: AMG requires much less infrastructure:

I No need to supply coarse grids

I No need to supplyinterpolation operators

I Only applies to linear problems

I Requires global linearisation(memory)

I Requires near-nullspace ofoperator

P. E. Farrell (Oxford) SPS 4 May 18, 2015 5 / 13

Page 58: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Two responses

GMG:

I design increasingly arcane relaxation methods that do smooth;

I semi-coarsening, multi-coarsening, etc.

AMG:

I fix a simple relaxation method;

I algebraically construct coarse grids and interpolation operators;

I demand that these can well represent the error after relaxation.

A nice side effect: AMG requires much less infrastructure:

I No need to supply coarse grids

I No need to supplyinterpolation operators

I Only applies to linear problems

I Requires global linearisation(memory)

I Requires near-nullspace ofoperator

P. E. Farrell (Oxford) SPS 4 May 18, 2015 5 / 13

Page 59: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Anisotropic Laplacian again

P. E. Farrell (Oxford) SPS 4 May 18, 2015 6 / 13

Page 60: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Anisotropic Laplacian again

P. E. Farrell (Oxford) SPS 4 May 18, 2015 6 / 13

Page 61: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Fundamental principles of AMG I: relaxation and error

Recall Richardson iteration with a preconditioner P :

Richardson iteration

xk+1 = xk + P−1 (b−Axk) .

A simple error analysis shows

Error analysis of Richardson iteration

ek+1 =(I − P−1A

)ek

Now if ek+1 ≈ ek then

Near-nullspace of A

P−1Aek ≈ 0 =⇒ Aek ≈ 0.

P. E. Farrell (Oxford) SPS 4 May 18, 2015 7 / 13

Page 62: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Fundamental principles of AMG I: relaxation and error

Recall Richardson iteration with a preconditioner P :

Richardson iteration

xk+1 = xk + P−1 (b−Axk) .

A simple error analysis shows

Error analysis of Richardson iteration

ek+1 =(I − P−1A

)ek

Now if ek+1 ≈ ek then

Near-nullspace of A

P−1Aek ≈ 0 =⇒ Aek ≈ 0.

P. E. Farrell (Oxford) SPS 4 May 18, 2015 7 / 13

Page 63: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Fundamental principles of AMG I: relaxation and error

Recall Richardson iteration with a preconditioner P :

Richardson iteration

xk+1 = xk + P−1 (b−Axk) .

A simple error analysis shows

Error analysis of Richardson iteration

ek+1 =(I − P−1A

)ek

Now if ek+1 ≈ ek then

Near-nullspace of A

P−1Aek ≈ 0 =⇒ Aek ≈ 0.

P. E. Farrell (Oxford) SPS 4 May 18, 2015 7 / 13

Page 64: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Fundamental principles of AMG I: relaxation and error

Error after relaxation

The error after relaxation is related to the near-nullspace of the operator.

P. E. Farrell (Oxford) SPS 4 May 18, 2015 8 / 13

Page 65: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Fundamental principles of AMG II: interpolation

Recall that in one multigrid cycle we approximate the fine error as

Approximation of fine error

eh ≈ PhHeH

Thus, we want the near-nullspace to be in the range of PhH .

P. E. Farrell (Oxford) SPS 4 May 18, 2015 9 / 13

Page 66: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coarse grid generation: an example

Classical AMG: coarse-grid generation

1. Select C-point with maximal measure2. Select neighbours as F-points3. Update measures of neighbours

P. E. Farrell (Oxford) SPS 4 May 18, 2015 10 / 13

Page 67: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coarse grid generation: an example

Classical AMG: coarse-grid generation

1. Select C-point with maximal measure2. Select neighbours as F-points3. Update measures of neighbours

P. E. Farrell (Oxford) SPS 4 May 18, 2015 10 / 13

Page 68: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coarse grid generation: an example

Classical AMG: coarse-grid generation

1. Select C-point with maximal measure2. Select neighbours as F-points3. Update measures of neighbours

P. E. Farrell (Oxford) SPS 4 May 18, 2015 10 / 13

Page 69: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coarse grid generation: an example

Classical AMG: coarse-grid generation

1. Select C-point with maximal measure2. Select neighbours as F-points3. Update measures of neighbours

P. E. Farrell (Oxford) SPS 4 May 18, 2015 10 / 13

Page 70: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coarse grid generation: an example

Classical AMG: coarse-grid generation

1. Select C-point with maximal measure2. Select neighbours as F-points3. Update measures of neighbours

P. E. Farrell (Oxford) SPS 4 May 18, 2015 10 / 13

Page 71: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coarse grid generation: an example

Classical AMG: coarse-grid generation

1. Select C-point with maximal measure2. Select neighbours as F-points3. Update measures of neighbours

P. E. Farrell (Oxford) SPS 4 May 18, 2015 10 / 13

Page 72: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coarse grid generation: an example

Classical AMG: coarse-grid generation

1. Select C-point with maximal measure2. Select neighbours as F-points3. Update measures of neighbours

P. E. Farrell (Oxford) SPS 4 May 18, 2015 10 / 13

Page 73: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coarse grid generation: an example

Classical AMG: coarse-grid generation

1. Select C-point with maximal measure2. Select neighbours as F-points3. Update measures of neighbours

P. E. Farrell (Oxford) SPS 4 May 18, 2015 10 / 13

Page 74: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coarse grid generation: an example

Classical AMG: coarse-grid generation

1. Select C-point with maximal measure2. Select neighbours as F-points3. Update measures of neighbours

P. E. Farrell (Oxford) SPS 4 May 18, 2015 10 / 13

Page 75: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coarse grid generation: an example

Classical AMG: coarse-grid generation

1. Select C-point with maximal measure2. Select neighbours as F-points3. Update measures of neighbours

P. E. Farrell (Oxford) SPS 4 May 18, 2015 10 / 13

Page 76: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coarse grid generation: an example

Classical AMG: coarse-grid generation

1. Select C-point with maximal measure2. Select neighbours as F-points3. Update measures of neighbours

P. E. Farrell (Oxford) SPS 4 May 18, 2015 10 / 13

Page 77: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coarse grid generation: an example

Smoothed-aggregation AMG: coarse-grid generation

Phase 1:1. Pick a root point not adjacent to an aggregation2. Aggregate root and neighboursPhase 2: Move points into nearby aggregations

P. E. Farrell (Oxford) SPS 4 May 18, 2015 11 / 13

Page 78: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coarse grid generation: an example

Smoothed-aggregation AMG: coarse-grid generation

Phase 1:1. Pick a root point not adjacent to an aggregation2. Aggregate root and neighboursPhase 2: Move points into nearby aggregations

P. E. Farrell (Oxford) SPS 4 May 18, 2015 11 / 13

Page 79: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coarse grid generation: an example

Smoothed-aggregation AMG: coarse-grid generation

Phase 1:1. Pick a root point not adjacent to an aggregation2. Aggregate root and neighboursPhase 2: Move points into nearby aggregations

P. E. Farrell (Oxford) SPS 4 May 18, 2015 11 / 13

Page 80: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coarse grid generation: an example

Smoothed-aggregation AMG: coarse-grid generation

Phase 1:1. Pick a root point not adjacent to an aggregation2. Aggregate root and neighboursPhase 2: Move points into nearby aggregations

P. E. Farrell (Oxford) SPS 4 May 18, 2015 11 / 13

Page 81: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coarse grid generation: an example

Smoothed-aggregation AMG: coarse-grid generation

Phase 1:1. Pick a root point not adjacent to an aggregation2. Aggregate root and neighboursPhase 2: Move points into nearby aggregations

P. E. Farrell (Oxford) SPS 4 May 18, 2015 11 / 13

Page 82: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coarse grid generation: an example

Smoothed-aggregation AMG: coarse-grid generation

Phase 1:1. Pick a root point not adjacent to an aggregation2. Aggregate root and neighboursPhase 2: Move points into nearby aggregations

P. E. Farrell (Oxford) SPS 4 May 18, 2015 11 / 13

Page 83: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coarse grid generation: an example

Smoothed-aggregation AMG: coarse-grid generation

Phase 1:1. Pick a root point not adjacent to an aggregation2. Aggregate root and neighboursPhase 2: Move points into nearby aggregations

P. E. Farrell (Oxford) SPS 4 May 18, 2015 11 / 13

Page 84: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coarse grid generation: an example

Smoothed-aggregation AMG: coarse-grid generation

Phase 1:1. Pick a root point not adjacent to an aggregation2. Aggregate root and neighboursPhase 2: Move points into nearby aggregations

P. E. Farrell (Oxford) SPS 4 May 18, 2015 11 / 13

Page 85: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coarse grid generation: an example

Smoothed-aggregation AMG: coarse-grid generation

Phase 1:1. Pick a root point not adjacent to an aggregation2. Aggregate root and neighboursPhase 2: Move points into nearby aggregations

P. E. Farrell (Oxford) SPS 4 May 18, 2015 11 / 13

Page 86: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

HPC 04 Challenge!

Consider the linear elasticity equation

−∇ · σ(u) = f in Ω

u = 0 on ∂ΩD

σ · n = 0 on ∂ΩN

on the pulley mesh, where

ε(u) =1

2

(∇u+∇uT

),

σ(u) = 2µε(u) + λtr(ε(u))I,f = (ρω2x, ρω2y, 0),

∂ΩD = (x, y, z) ∈ ∂Ω | x2 + y2 < (3.75− 0.17z)2∂ΩN = ∂Ω \ ∂ΩD,

E = 109, ν = 0.3, ρ = 10, ω = 300.

P. E. Farrell (Oxford) SPS 4 May 18, 2015 12 / 13

Page 87: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

HPC 04 Challenge!

Solve this problem using only smoothed aggregation algebraic multigrid(no Krylov accelerator, -ksp type richardson

-ksp monitor true residual -pc type gamg).

How many iterations does it take to converge to atol 10−12

(a) without the near-nullspace

(b) with the near-nullspace?

Here the near-nullspace is the rigid body translations and rotations.

Now investigate the configuration of the smoothed aggregation AMGsolver and the Krylov accelerator. (Hint: -help -snes view). By tuningthe solver, can you achieve faster convergence?

P. E. Farrell (Oxford) SPS 4 May 18, 2015 13 / 13

Page 88: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Solving PDEs on Supercomputers V:algebraic multigrid on nonsymmetric problems

Patrick Farrell

MMSC: Python in Scientific Computing

May 19, 2015

P. E. Farrell (Oxford) SPS 5 May 19, 2015 1 / 4

Page 89: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

HPC 05 Challenge! (1/3)

Implement a solver for the Yamabe equation

−8∇2u+1

r3u5 − 1

10u = 0

on the doughnut mesh with boundary conditions u = 1.

Initialise Newton with the initial guess u = 1.

P. E. Farrell (Oxford) SPS 5 May 19, 2015 2 / 4

Page 90: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

HPC 05 Challenge! (2/3)

Next, develop an efficient linear solver:

1. First use Newton + LU.

2. Next, try GMRES + GAMG. Does it work well?

3. Try increasing the maximum size of the coarse grid(pc gamg coarse eq limit)

4. Ah! Now we’re getting somewhere. Does changing the smoother help(mg levels ksp monitor true residual)?

5. Increase the quality of the smoothed aggregation basis(pc gamg agg nsmooths).

P. E. Farrell (Oxford) SPS 5 May 19, 2015 3 / 4

Page 91: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

HPC 05 Challenge! (3/3)

Profile the code. Where is it spending most of its time?

How can the preconditioner construction cost be reduced?

Once that is done, compare the memory usage of GMRES, FGMRES, GCRand CGS.

P. E. Farrell (Oxford) SPS 5 May 19, 2015 4 / 4

Page 92: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Solving PDEs on Supercomputers VI:fieldsplit preconditioners

Patrick Farrell

MMSC: Python in Scientific Computing

May 19, 2015

P. E. Farrell (Oxford) SPS 6 May 19, 2015 1 / 8

Page 93: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Block triangular factorisations

A block matrix with nonsingular A−1 has a block triangularfactorisation:

Block triangular factorisation

J =

(A BC D

)=

(I 0

CA−1 I

)(A 00 S

)(I A−1B0 I

).

where S = D − CA−1B is the (dense!) Schur complement.

This gives us an expression for its inverse:

Block triangular inverse

(A BC D

)−1

=

(I −A−1B0 I

)(A−1 0

0 S−1

)(I 0

−CA−1 I

).

P. E. Farrell (Oxford) SPS 6 May 19, 2015 2 / 8

Page 94: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Fieldsplit preconditioners

This gives rise to four related theorems.

Theorem (full)

The choice

P =

(I 0

CA−1 I

)(A 00 S

)(I A−1B0 I

)will induce Krylov convergence in 1 iteration.

How do you use this?

Cheaply approximate A−1 and S−1 (problem specific)!

P. E. Farrell (Oxford) SPS 6 May 19, 2015 3 / 8

Page 95: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Fieldsplit preconditioners

This gives rise to four related theorems.

Theorem (lower)

The choice

P =

(I 0

CA−1 I

)(A 00 S

)will induce Krylov convergence in 2 iterations.

How do you use this?

Cheaply approximate A−1 and S−1 (problem specific)!

P. E. Farrell (Oxford) SPS 6 May 19, 2015 3 / 8

Page 96: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Fieldsplit preconditioners

This gives rise to four related theorems.

Theorem (upper)

The choice

P =

(A 00 S

)(I A−1B0 I

)will induce Krylov convergence in 2 iterations.

How do you use this?

Cheaply approximate A−1 and S−1 (problem specific)!

P. E. Farrell (Oxford) SPS 6 May 19, 2015 3 / 8

Page 97: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Fieldsplit preconditioners

This gives rise to four related theorems.

Theorem (diag)

The choice

P =

(A 00 −S

)will induce Krylov convergence in 3 iterations, if D = 0.

How do you use this?

Cheaply approximate A−1 and S−1 (problem specific)!

P. E. Farrell (Oxford) SPS 6 May 19, 2015 3 / 8

Page 98: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Fieldsplit preconditioners

This gives rise to four related theorems.

Theorem (diag)

The choice

P =

(A 00 −S

)will induce Krylov convergence in 3 iterations, if D = 0.

How do you use this?

Cheaply approximate A−1 and S−1 (problem specific)!

P. E. Farrell (Oxford) SPS 6 May 19, 2015 3 / 8

Page 99: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Spectral equivalence

Definition (spectral equivalence)

Ah and Bh ∈ Rn×n are spectrally equivalent, Ah ∼ Bh, iff there existsconstants c, C independent of h such that

c ≤ λ(B−1h Ah) ≤ C.

Solving block-structured systems

Find an approximation S ∼ S or S−1 ∼ S−1.

P. E. Farrell (Oxford) SPS 6 May 19, 2015 4 / 8

Page 100: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Spectral equivalence

Definition (spectral equivalence)

Ah and Bh ∈ Rn×n are spectrally equivalent, Ah ∼ Bh, iff there existsconstants c, C independent of h such that

c ≤ λ(B−1h Ah) ≤ C.

Solving block-structured systems

Find an approximation S ∼ S or S−1 ∼ S−1.

P. E. Farrell (Oxford) SPS 6 May 19, 2015 4 / 8

Page 101: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Stokes equations

The Stokes equations are

−ν∇2u+∇p = 0,

∇ · u = 0.

P. E. Farrell (Oxford) SPS 6 May 19, 2015 5 / 8

Page 102: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Stokes equations

The Stokes equations are

−ν∇2u+∇p = 0,

∇ · u = 0.

A stable discretisation yields

J =

(A BT

B 0

).

with S = −BA−1BT .

P. E. Farrell (Oxford) SPS 6 May 19, 2015 5 / 8

Page 103: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Stokes equations

The Stokes equations are

−ν∇2u+∇p = 0,

∇ · u = 0.

Spectral equivalence (e.g. Elman, Silvester and Wathen, 2005)

Let Q be the viscosity-weighted pressure mass matrix

Qij =

∫Ω

1

νφiφj .

ThenS ∼ Q.

P. E. Farrell (Oxford) SPS 6 May 19, 2015 5 / 8

Page 104: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coding tools

Creating PETSc index sets to extract dofs:

u_dofs = SubSpace(Z, 0).dofmap().dofs()

u_is = PETSc.IS().createGeneral(u_dofs)

Configuring the dofs to split:

fields = [("0", u_is), ("1", p_is)]

snes.ksp.pc.setFieldSplitIS(*fields)

Setting the matrix for building a preconditioner for the Schur complement:

schur = (1.0/nu) * inner(p, q)*dx

schur_full = assemble(schur)

schur_fmat = as_backend_type(schur_full).mat()

schur_mat = schur_fmat.getSubMatrix(p_is, p_is)

snes.ksp.pc.setFieldSplitSchurPreType(PETSc.PC.SchurPreType.USER, schur_mat)

P. E. Farrell (Oxford) SPS 6 May 19, 2015 6 / 8

Page 105: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coding tools

Creating PETSc index sets to extract dofs:

u_dofs = SubSpace(Z, 0).dofmap().dofs()

u_is = PETSc.IS().createGeneral(u_dofs)

Configuring the dofs to split:

fields = [("0", u_is), ("1", p_is)]

snes.ksp.pc.setFieldSplitIS(*fields)

Setting the matrix for building a preconditioner for the Schur complement:

schur = (1.0/nu) * inner(p, q)*dx

schur_full = assemble(schur)

schur_fmat = as_backend_type(schur_full).mat()

schur_mat = schur_fmat.getSubMatrix(p_is, p_is)

snes.ksp.pc.setFieldSplitSchurPreType(PETSc.PC.SchurPreType.USER, schur_mat)

P. E. Farrell (Oxford) SPS 6 May 19, 2015 6 / 8

Page 106: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coding tools

Creating PETSc index sets to extract dofs:

u_dofs = SubSpace(Z, 0).dofmap().dofs()

u_is = PETSc.IS().createGeneral(u_dofs)

Configuring the dofs to split:

fields = [("0", u_is), ("1", p_is)]

snes.ksp.pc.setFieldSplitIS(*fields)

Setting the matrix for building a preconditioner for the Schur complement:

schur = (1.0/nu) * inner(p, q)*dx

schur_full = assemble(schur)

schur_fmat = as_backend_type(schur_full).mat()

schur_mat = schur_fmat.getSubMatrix(p_is, p_is)

snes.ksp.pc.setFieldSplitSchurPreType(PETSc.PC.SchurPreType.USER, schur_mat)

P. E. Farrell (Oxford) SPS 6 May 19, 2015 6 / 8

Page 107: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Configuring fieldsplit

--petsc.ksp_converged_reason

--petsc.ksp_type fgmres

--petsc.ksp_monitor_true_residual

--petsc.ksp_atol 1.0e-10

--petsc.ksp_rtol 0.0

--petsc.pc_type fieldsplit

--petsc.pc_fieldsplit_type schur

--petsc.pc_fieldsplit_schur_factorization_type full

--petsc.pc_fieldsplit_schur_precondition user

--petsc.fieldsplit_0_ksp_type richardson

--petsc.fieldsplit_0_ksp_max_it 1

--petsc.fieldsplit_0_pc_type lu

--petsc.fieldsplit_0_pc_factor_mat_solver_package mumps

--petsc.fieldsplit_1_ksp_type bcgs

--petsc.fieldsplit_1_ksp_rtol 1.0e-10

--petsc.fieldsplit_1_ksp_monitor_true_residual

--petsc.fieldsplit_1_pc_type lu

--petsc.fieldsplit_1_pc_factor_mat_solver_package mumps

P. E. Farrell (Oxford) SPS 6 May 19, 2015 7 / 8

Page 108: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

HPC 06 Challenge!

Solve the Stokes equations with ν = 1/100 on the dolphin.xml mesh,with boundary conditions

u = (0, 0) on ∂Ω0

u = (− sinπy, 0) on ∂Ω1

ν∇u · n = pn on ∂Ω2,

with colours taken from dolphin subdomains.xml.

0. Discretise the equation with a stable finite element pair. Integrateboth terms in the momentum equation by parts.

1. Solve the problem with LU (UMFPACK/MUMPS).

2. Implement the fieldsplit preconditioner with ideal inner solvers (LU).

3. Now replace the inner solvers with Krylov solvers (CG/ML/5 for A,BCGS/HYPRE/5 for S).

4. What configuration is fastest? full with strong inner solvers? diag

with weak inner solvers?P. E. Farrell (Oxford) SPS 6 May 19, 2015 8 / 8

Page 109: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Solving PDEs on Supercomputers VII:PDE-constrained optimisation

Patrick Farrell

MMSC: Python in Scientific Computing

May 17, 2015

P. E. Farrell (Oxford) SPS 7 May 17, 2015 1 / 9

Page 110: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

The mother problem

Consider again the mother problem of PDE-constrained optimisation:

miny,u

1

2

∫Ω

(y−yd)2 dx+β

2

∫Ωu2 dx

subject to

−∆y = u in Ω

y = 0 on ∂Ω

We form the Lagrangian:

L(y, u, λ) =1

2

∫Ω

(y − yd)2 dx+β

2

∫Ωu2 dx+

∫Ω∇λ · ∇y − λudx

P. E. Farrell (Oxford) SPS 7 May 17, 2015 2 / 9

Page 111: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

The mother problem

Consider again the mother problem of PDE-constrained optimisation:

miny,u

1

2

∫Ω

(y−yd)2 dx+β

2

∫Ωu2 dx

subject to

−∆y = u in Ω

y = 0 on ∂Ω

We form the Lagrangian:

L(y, u, λ) =1

2

∫Ω

(y − yd)2 dx+β

2

∫Ωu2 dx+

∫Ω∇λ · ∇y − λudx

P. E. Farrell (Oxford) SPS 7 May 17, 2015 2 / 9

Page 112: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

The optimality conditions

Taking the optimality conditions yields the system: find(y, u, λ) ∈ H1

0 × L2 ×H10 such that∫

Ωy(y − yd) +

∫Ω∇λ · ∇y = 0,

β

∫Ωuu−

∫Ωλu = 0,∫

Ω∇λ · ∇y −

∫Ωλu = 0.

On discretisation, this yields the systemM 0 K0 βM −MK −M 0

yuλ

=

z00

.

P. E. Farrell (Oxford) SPS 7 May 17, 2015 3 / 9

Page 113: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

The optimality conditions

Taking the optimality conditions yields the system: find(y, u, λ) ∈ H1

0 × L2 ×H10 such that∫

Ωy(y − yd) +

∫Ω∇λ · ∇y = 0,

β

∫Ωuu−

∫Ωλu = 0,∫

Ω∇λ · ∇y −

∫Ωλu = 0.

On discretisation, this yields the systemM 0 K0 βM −MK −M 0

yuλ

=

z00

.

P. E. Farrell (Oxford) SPS 7 May 17, 2015 3 / 9

Page 114: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Ingredients of a fieldsplit

Remember, to fieldsplit you need two things:

1. A diagonal block you can cheaply invert

2. A Schur complement you can cheaply approximate

If we take A = [[M, 0], [0, βM ]], the first is satisfied.

How about the Schur complement? Calculating, we find

S = KM−1K +1

βM.

Bad news

Approximating the inverse of sums is hard.

P. E. Farrell (Oxford) SPS 7 May 17, 2015 4 / 9

Page 115: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Ingredients of a fieldsplit

Remember, to fieldsplit you need two things:

1. A diagonal block you can cheaply invert

2. A Schur complement you can cheaply approximate

If we take A = [[M, 0], [0, βM ]], the first is satisfied.

How about the Schur complement? Calculating, we find

S = KM−1K +1

βM.

Bad news

Approximating the inverse of sums is hard.

P. E. Farrell (Oxford) SPS 7 May 17, 2015 4 / 9

Page 116: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Ingredients of a fieldsplit

Remember, to fieldsplit you need two things:

1. A diagonal block you can cheaply invert

2. A Schur complement you can cheaply approximate

If we take A = [[M, 0], [0, βM ]], the first is satisfied.

How about the Schur complement? Calculating, we find

S = KM−1K +1

βM.

Bad news

Approximating the inverse of sums is hard.

P. E. Farrell (Oxford) SPS 7 May 17, 2015 4 / 9

Page 117: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Ingredients of a fieldsplit

Remember, to fieldsplit you need two things:

1. A diagonal block you can cheaply invert

2. A Schur complement you can cheaply approximate

If we take A = [[M, 0], [0, βM ]], the first is satisfied.

How about the Schur complement? Calculating, we find

S = KM−1K +1

βM.

Bad news

Approximating the inverse of sums is hard.

P. E. Farrell (Oxford) SPS 7 May 17, 2015 4 / 9

Page 118: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Two approaches

Approach one: ignore one of terms (Rees, Dollar, Wathen 2010).

S = KM−1K +1

βM ≈ KM−1K

with inverseS−1 ≈ K−1MK−1.

Approach two: approximate the sum with a product (Pearson and Wathen,2012).

S =

(K +

1√βM

)M−1

(K +

1√βM

)− 2√

βM

≈(K +

1√βM

)M−1

(K +

1√βM

)with inverse

S−1 ≈ K−1MK−1.

P. E. Farrell (Oxford) SPS 7 May 17, 2015 5 / 9

Page 119: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Two approaches

Approach one: ignore one of terms (Rees, Dollar, Wathen 2010).

S = KM−1K +1

βM ≈ KM−1K

with inverseS−1 ≈ K−1MK−1.

Approach two: approximate the sum with a product (Pearson and Wathen,2012).

S =

(K +

1√βM

)M−1

(K +

1√βM

)− 2√

βM

≈(K +

1√βM

)M−1

(K +

1√βM

)with inverse

S−1 ≈ K−1MK−1.

P. E. Farrell (Oxford) SPS 7 May 17, 2015 5 / 9

Page 120: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coding tools

No need to pass index sets with scalar fields:

"""

--petsc.pc_fieldsplit_0_fields 0,1

--petsc.pc_fieldsplit_1_fields 2

"""

You do need index sets to extract submatrices:

trial = split(TrialFunction(Z))[0]

test = split(TestFunction(Z))[0]

bc = DirichletBC(Z.sub(0), 0.0, "on_boundary")

mass_full = assemble(inner(trial, test)*dx)

bc.apply(mass_full)

...

mass_mat = mass_fmat.getSubMatrix(is_0, is_0)

P. E. Farrell (Oxford) SPS 7 May 17, 2015 6 / 9

Page 121: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coding tools

No need to pass index sets with scalar fields:

"""

--petsc.pc_fieldsplit_0_fields 0,1

--petsc.pc_fieldsplit_1_fields 2

"""

You do need index sets to extract submatrices:

trial = split(TrialFunction(Z))[0]

test = split(TestFunction(Z))[0]

bc = DirichletBC(Z.sub(0), 0.0, "on_boundary")

mass_full = assemble(inner(trial, test)*dx)

bc.apply(mass_full)

...

mass_mat = mass_fmat.getSubMatrix(is_0, is_0)

P. E. Farrell (Oxford) SPS 7 May 17, 2015 6 / 9

Page 122: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coding tools

Creating a KSP to handle the solve:

ksp_kbm = PETSc.KSP()

ksp_kbm.create()

ksp_kbm.setType("richardson")

ksp_kbm.pc.setType("lu")

ksp_kbm.setOperators(kbm)

ksp_kbm.setOptionsPrefix("fieldsplit_1_kbm_")

ksp_kbm.setFromOptions()

ksp_kbm.setUp()

P. E. Farrell (Oxford) SPS 7 May 17, 2015 7 / 9

Page 123: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coding tools

Using an approximate inverse action with PCMAT:

"""

--petsc.fieldsplit_1_pc_type mat

"""

Configuring a shell matrix:

class SchurInv(object):

def mult(self, mat, x, y):

ksp_kbm.solve(x, tmp1)

mass.mult(tmp1, tmp2)

ksp_kbm.solve(tmp2, y)

schur = PETSc.Mat()

schur.createPython(mass.getSizes(), SchurInv())

schur.setUp()

P. E. Farrell (Oxford) SPS 7 May 17, 2015 8 / 9

Page 124: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Coding tools

Using an approximate inverse action with PCMAT:

"""

--petsc.fieldsplit_1_pc_type mat

"""

Configuring a shell matrix:

class SchurInv(object):

def mult(self, mat, x, y):

ksp_kbm.solve(x, tmp1)

mass.mult(tmp1, tmp2)

ksp_kbm.solve(tmp2, y)

schur = PETSc.Mat()

schur.createPython(mass.getSizes(), SchurInv())

schur.setUp()

P. E. Farrell (Oxford) SPS 7 May 17, 2015 8 / 9

Page 125: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

HPC 07 Challenge!

Solve the mother problem on Ω = [0, 1]2 with

yd(x, y) =

1 if (x, y) ∈ [0, 0.5]2

0 otherwise

and homogeneous Dirichlet boundary conditions.

0. Discretise the equation with [P1]3.

1. Solve the problem with LU.

2. Implement the two fieldsplit preconditioners with ideal inner solvers.

3. Which performs best as β → 0?

4. Now choose scalable inner solvers.

5. Which configuration is fastest on the machine?

P. E. Farrell (Oxford) SPS 7 May 17, 2015 9 / 9

Page 126: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Solving PDEs on Supercomputers VIII:advanced nonlinear solvers

Patrick Farrell

MMSC: Python in Scientific Computing

May 18, 2015

P. E. Farrell (Oxford) SPS 8 May 18, 2015 1 / 13

Page 127: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Globalisation of Newton’s method

Consider again the p-Laplace equation

−∇ · (γ(u)∇u) = f in Ω

u = g on ∂Ω

where

γ(u) = (ε2 +1

2|∇u|2)(p−2)/2.

The configuration we considered (p = 5) took 121 iterations to converge. Why?

P. E. Farrell (Oxford) SPS 8 May 18, 2015 2 / 13

Page 128: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Newton steps near singular Jacobians

Recall that at our initial guess u = 0, our Jacobian is nearly singular.

IfJ = UΣV T ,

thenJ−1 = V Σ−1UT ,

and if σmin → 0, then

‖δu‖ = ‖J−1F‖ → ∞.

This explains

0 SNES Function norm 3.027343750000e-02

1 SNES Function norm 3.708799037955e+56

2 SNES Function norm 1.173487195603e+56

P. E. Farrell (Oxford) SPS 8 May 18, 2015 3 / 13

Page 129: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Newton steps near singular Jacobians

Recall that at our initial guess u = 0, our Jacobian is nearly singular.

IfJ = UΣV T ,

thenJ−1 = V Σ−1UT ,

and if σmin → 0, then

‖δu‖ = ‖J−1F‖ → ∞.

This explains

0 SNES Function norm 3.027343750000e-02

1 SNES Function norm 3.708799037955e+56

2 SNES Function norm 1.173487195603e+56

P. E. Farrell (Oxford) SPS 8 May 18, 2015 3 / 13

Page 130: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Responses

A few possible responses:

1. Start with a better initial guess (continuation)

2. Regularise further (undesirable)

3. Take a smaller step (damping with α 6= 1)!

P. E. Farrell (Oxford) SPS 8 May 18, 2015 4 / 13

Page 131: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Responses

A few possible responses:

1. Start with a better initial guess (continuation)

2. Regularise further (undesirable)

3. Take a smaller step (damping with α 6= 1)!

P. E. Farrell (Oxford) SPS 8 May 18, 2015 4 / 13

Page 132: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Responses

A few possible responses:

1. Start with a better initial guess (continuation)

2. Regularise further (undesirable)

3. Take a smaller step (damping with α 6= 1)!

P. E. Farrell (Oxford) SPS 8 May 18, 2015 4 / 13

Page 133: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Responses

A few possible responses:

1. Start with a better initial guess (continuation)

2. Regularise further (undesirable)

3. Take a smaller step (damping with α 6= 1)!

Newton fractal for z3 − 1 = 0 with α = 1.

P. E. Farrell (Oxford) SPS 8 May 18, 2015 4 / 13

Page 134: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Responses

A few possible responses:

1. Start with a better initial guess (continuation)

2. Regularise further (undesirable)

3. Take a smaller step (damping with α 6= 1)!

Newton fractal for z3 − 1 = 0 with α = 0.75.

P. E. Farrell (Oxford) SPS 8 May 18, 2015 4 / 13

Page 135: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Responses

A few possible responses:

1. Start with a better initial guess (continuation)

2. Regularise further (undesirable)

3. Take a smaller step (damping with α 6= 1)!

Newton fractal for z3 − 1 = 0 with α = 0.5.

P. E. Farrell (Oxford) SPS 8 May 18, 2015 4 / 13

Page 136: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Responses

A few possible responses:

1. Start with a better initial guess (continuation)

2. Regularise further (undesirable)

3. Take a smaller step (damping with α 6= 1)!

Newton fractal for z3 − 1 = 0 with α = 0.25.

P. E. Farrell (Oxford) SPS 8 May 18, 2015 4 / 13

Page 137: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Responses

A few possible responses:

1. Start with a better initial guess (continuation)

2. Regularise further (undesirable)

3. Take a smaller step (damping with α 6= 1)!

Newton fractal for z3 − 1 = 0 with α = 0.1.

P. E. Farrell (Oxford) SPS 8 May 18, 2015 4 / 13

Page 138: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Linesearch schemes in PETSc

Backtracking linesearch (bt)

I Finds the minimum of a polynomial fit to the l2 norm in [0, 1].

I Demands monotonic and sufficient decrease.

I If decrease is insufficient, the interval is reduced.

Good for: convex problems, occasional near-singular Jacobians.

Bad for: nonconvex problems where the residual must increase beforeconvergence.

P. E. Farrell (Oxford) SPS 8 May 18, 2015 5 / 13

Page 139: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Linesearch schemes in PETSc

Backtracking linesearch (bt)

I Finds the minimum of a polynomial fit to the l2 norm in [0, 1].

I Demands monotonic and sufficient decrease.

I If decrease is insufficient, the interval is reduced.

Good for: convex problems, occasional near-singular Jacobians.

Bad for: nonconvex problems where the residual must increase beforeconvergence.

P. E. Farrell (Oxford) SPS 8 May 18, 2015 5 / 13

Page 140: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Linesearch schemes in PETSc

Backtracking linesearch (bt)

I Finds the minimum of a polynomial fit to the l2 norm in [0, 1].

I Demands monotonic and sufficient decrease.

I If decrease is insufficient, the interval is reduced.

Good for: convex problems, occasional near-singular Jacobians.

Bad for: nonconvex problems where the residual must increase beforeconvergence.

P. E. Farrell (Oxford) SPS 8 May 18, 2015 5 / 13

Page 141: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Linesearch schemes in PETSc

Critical point linesearch (cp)

I Many PDEs have an energy function to be minimised.

I Suppose F (u) is the gradient of some (unknown) E(u).

I E(u+ αdu) can be minimised by looking for roots of

duTF (u+ αdu) = 0

with a secant method.

Good for: problems with an energy functional.

P. E. Farrell (Oxford) SPS 8 May 18, 2015 5 / 13

Page 142: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Linesearch schemes in PETSc

Critical point linesearch (cp)

I Many PDEs have an energy function to be minimised.

I Suppose F (u) is the gradient of some (unknown) E(u).

I E(u+ αdu) can be minimised by looking for roots of

duTF (u+ αdu) = 0

with a secant method.

Good for: problems with an energy functional.

P. E. Farrell (Oxford) SPS 8 May 18, 2015 5 / 13

Page 143: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Linesearch schemes in PETSc

Affine-covariant linesearch (nleqerr)

I Undamped Newton’s method is affine covariant.

I This observation fundamentally changes convergence theorems forNewton (Deuflhard, 2011).

I Convergence criteria are expressed in terms of affine-covariantLipschitz constants.

I This linesearch estimates these constants and uses it to decide steplengths.

Good for: problems where you can start within singular manifolds; thehardest nonlinear problems.

P. E. Farrell (Oxford) SPS 8 May 18, 2015 5 / 13

Page 144: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Linesearch schemes in PETSc

Affine-covariant linesearch (nleqerr)

I Undamped Newton’s method is affine covariant.

I This observation fundamentally changes convergence theorems forNewton (Deuflhard, 2011).

I Convergence criteria are expressed in terms of affine-covariantLipschitz constants.

I This linesearch estimates these constants and uses it to decide steplengths.

Good for: problems where you can start within singular manifolds; thehardest nonlinear problems.

P. E. Farrell (Oxford) SPS 8 May 18, 2015 5 / 13

Page 145: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Nonlinear preconditioning

For a linear problemAx = b

we apply an approximate solver P−1 on the left:

P−1Ax = P−1b.

Write one step of a nonlinear solver for

F (x) = b

asxi+1 = N(F, xi, b).

P. E. Farrell (Oxford) SPS 8 May 18, 2015 6 / 13

Page 146: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Nonlinear preconditioning

For a linear problemAx = b

we apply an approximate solver P−1 on the left:

P−1Ax = P−1b.

Write one step of a nonlinear solver for

F (x) = b

asxi+1 = N(F, xi, b).

P. E. Farrell (Oxford) SPS 8 May 18, 2015 6 / 13

Page 147: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Nonlinear preconditioning

In nonlinear left preconditioning, we define a new residual

R(x) = x−N(F, x, b)

and apply an outer nonlinear solver to R.

In the linear case this is equivalent, since

R(x) = x−N(F, x, b)

= x+ P−1(Ax− b)− x= P−1(Ax− b)

Can accelerate an inner solver with an outer solver!

P. E. Farrell (Oxford) SPS 8 May 18, 2015 7 / 13

Page 148: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Nonlinear preconditioning

In nonlinear left preconditioning, we define a new residual

R(x) = x−N(F, x, b)

and apply an outer nonlinear solver to R.

In the linear case this is equivalent, since

R(x) = x−N(F, x, b)

= x+ P−1(Ax− b)− x= P−1(Ax− b)

Can accelerate an inner solver with an outer solver!

P. E. Farrell (Oxford) SPS 8 May 18, 2015 7 / 13

Page 149: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Nonlinear preconditioning

In nonlinear left preconditioning, we define a new residual

R(x) = x−N(F, x, b)

and apply an outer nonlinear solver to R.

In the linear case this is equivalent, since

R(x) = x−N(F, x, b)

= x+ P−1(Ax− b)− x= P−1(Ax− b)

Can accelerate an inner solver with an outer solver!

P. E. Farrell (Oxford) SPS 8 May 18, 2015 7 / 13

Page 150: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Examples of nonlinear preconditioning

Hyperelasticity (Brune et al, 2013)

Inner solver: Newton. Outer solver: nonlinear conjugate gradients.

High-Reynolds number Navier–Stokes (Cai and Keyes, 2002)

Inner solver: nonlinear additive Schwarz. Outer solver: Newton–Krylov.

High-Prandtl number Navier–Stokes (Brune et al, 2013)

Inner solver: nonlinear multigrid. Outer solver: nonlinear GMRES.

P. E. Farrell (Oxford) SPS 8 May 18, 2015 8 / 13

Page 151: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Examples of nonlinear preconditioning

Hyperelasticity (Brune et al, 2013)

Inner solver: Newton. Outer solver: nonlinear conjugate gradients.

High-Reynolds number Navier–Stokes (Cai and Keyes, 2002)

Inner solver: nonlinear additive Schwarz. Outer solver: Newton–Krylov.

High-Prandtl number Navier–Stokes (Brune et al, 2013)

Inner solver: nonlinear multigrid. Outer solver: nonlinear GMRES.

P. E. Farrell (Oxford) SPS 8 May 18, 2015 8 / 13

Page 152: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Examples of nonlinear preconditioning

Hyperelasticity (Brune et al, 2013)

Inner solver: Newton. Outer solver: nonlinear conjugate gradients.

High-Reynolds number Navier–Stokes (Cai and Keyes, 2002)

Inner solver: nonlinear additive Schwarz. Outer solver: Newton–Krylov.

High-Prandtl number Navier–Stokes (Brune et al, 2013)

Inner solver: nonlinear multigrid. Outer solver: nonlinear GMRES.

P. E. Farrell (Oxford) SPS 8 May 18, 2015 8 / 13

Page 153: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Nonlinear preconditioning: a remark

The design space for nonlinear solvers is vast.

At the moment we have very little theory to guide us.

There are very large potential gains, however.

P. E. Farrell (Oxford) SPS 8 May 18, 2015 9 / 13

Page 154: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Nonlinear multigrid

The main bottleneck for massive problems is the linear system.

What if we didn’t have to solve (large) linear systems?

FAS uses fine-grid residuals to correct coarse-grid equations.

P. E. Farrell (Oxford) SPS 8 May 18, 2015 10 / 13

Page 155: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Nonlinear multigrid

The main bottleneck for massive problems is the linear system.

What if we didn’t have to solve (large) linear systems?

FAS uses fine-grid residuals to correct coarse-grid equations.

P. E. Farrell (Oxford) SPS 8 May 18, 2015 10 / 13

Page 156: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Nonlinear multigrid

The main bottleneck for massive problems is the linear system.

What if we didn’t have to solve (large) linear systems?

FAS uses fine-grid residuals to correct coarse-grid equations.

P. E. Farrell (Oxford) SPS 8 May 18, 2015 10 / 13

Page 157: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Full Approximation Scheme (FAS)

Given:

I a problem (F h, xh, bh)I a smoother S and coarse solver MI restriction, prolongation and injection operators R,P and R.

while not converged:

xhs = S(F h, xhi , bh)

xH = Rxhs

bH = R[b− F h(xh)] + FH(xH)

xHc = M(FH , xH , bH)

xhc = xhs + P [xHc − xH ]

xhi+1 = S(F h, xhc , bh)

P. E. Farrell (Oxford) SPS 8 May 18, 2015 11 / 13

Page 158: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Nonlinear multigrid

You can use

I a high-flop smoother on the fine grids,

I and Newton-LU on the coarse grids!

(see firedrake Yamabe demo)

P. E. Farrell (Oxford) SPS 8 May 18, 2015 12 / 13

Page 159: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Nonlinear multigrid

You can use

I a high-flop smoother on the fine grids,

I and Newton-LU on the coarse grids!

(see firedrake Yamabe demo)

P. E. Farrell (Oxford) SPS 8 May 18, 2015 12 / 13

Page 160: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

HPC 08 Challenge!

Consider again the p-Laplace equation (FEniCS lecture III).

1. Investigate the performance of different linesearch schemes on thep-Laplace problem.

2. Using only basic for the inner solver, accelerate the convergence ofNewton’s method with left-preconditioning with ncg/cp.

3. Now use the optimal inner linesearch to beat the unaccelerated solver.

4. Choose sensible Krylov solvers and scale the code on ARCUS.

P. E. Farrell (Oxford) SPS 8 May 18, 2015 13 / 13

Page 161: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

Solving PDEs on Supercomputers IV:a final challenge

Patrick Farrell

MMSC: Python in Scientific Computing

May 17, 2015

P. E. Farrell (Oxford) SPS 8 May 17, 2015 1 / 3

Page 162: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

HPC 09 Challenge! (1/2)

Consider the Cahn–Hilliard equation

∂c

∂t−∇ ·M

(∇(df

dc− λ∇2c

))= 0 in Ω,

M

(∇(df

dc− λ∇2c

))= 0 on ∂Ω,

Mλ∇c · n = 0 on ∂Ω.

where c is the unknown field, f(c) = 100c2(c− 1)2, n is the unit normal,and M is a scalar parameter.

To solve this with standard C0 elements, write it as two coupledsecond-order problems.

P. E. Farrell (Oxford) SPS 8 May 17, 2015 2 / 3

Page 163: Solving PDEs on Supercomputers I: modern supercomputer ... · Solving PDEs on Supercomputers I: modern supercomputer architecture Patrick Farrell ... I Across the machine: ... I Multigrid/multilevel

HPC 09 Challenge! (2/2)

Discretise and solve the equation on Ω = [0, 1]2 for M = 1, λ = 10−2, andinitial condition

class InitialConditions(Expression):

def __init__(self):

random.seed(2 + MPI.rank(mpi_comm_world()))

def eval(self, values, x):

values[0] = 0.63 + 0.02*(0.5 - random.random())

values[1] = 0.0

def value_shape(self):

return (2,)

Make sure your scheme is at least second-order. Sensible values are∆t = 5× 10−6, θ = 0.5.

An excellent preconditioner is discussed in doi:10.1137/130921842.

P. E. Farrell (Oxford) SPS 8 May 17, 2015 3 / 3