frechet derivatives of matrix functions and applications

Frechet Derivatives of Matrix Functions and

Applications

Samuel [email protected] @sdrelton

samrelton.com blog.samrelton.com

Joint work with Nicholas J. [email protected] @nhigham

www.maths.man.ac.uk/˜higham nickhigham.wordpress.com

University of Manchester, UK

September 4, 2014

Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 1 / 23

mailto:[email protected]

http://www.twitter.com/sdrelton

http://www.samrelton.com

http://blog.samrelton.com

mailto:[email protected]

http://www.twitter.com/nhigham

http://www.maths.man.ac.uk/~higham

http://nickhigham.wordpress.com

Outline

• Matrix Functions, their Derivatives, and the Condition Number

• Elementwise Sensitivity

• Physics: Nuclear Activation Sensitivity Problem

• Differential Equations: Predicting Algebraic Error in the FEM


Matrix Functions

We are interested in functions f : Cn×n 7→ Cn×n e.g.

Matrix Exponential eA =∞∑k=0

Ak

k!

Matrix Cosine cos(A) =∞∑k=0

(−1)kA2k

(2k)!

• Define f (A) by Taylor series when f is analytic

• If A = XDX−1 then f (A) = Xf (D)X−1

• Differential equations: dudt = Au(t), u = etAu(0)

• Use cos(A) and sin(A) for second order ODEs


Frechet Derivatives

Let f : Cn×n 7→ Cn×n be a matrix function.

Definition (Frechet derivative)

The Frechet derivative of f at A is the unique linear functionLf (A, ·) : Cn×n 7→ Cn×n such that for all E

f (A + E )− f (A)− Lf (A,E ) = o(‖E‖).

• Applications include manifold optimization, Markov models,bladder cancer, image processing, and network analysis

• Higher order derivatives recently analyzed (Higham & R., 2014)


Sensitivity of Matrix Functions

f

f

SA

f (SA)

SX

f (SX )

The function f is well conditioned at A andill conditioned at X


The Norm-wise Condition Number

The two condition numbers for a matrix function are:

condabs(f ,A) = max‖E‖=1

‖Lf (A,E )‖,

condrel(f ,A) = max‖E‖=1

‖Lf (A,E )‖ ‖A‖‖f (A)‖

.


Elementwise Sensitivity

If we change just one element Aij , how is f (A) affected?

Let Eij =[δij], then the difference between f (A) and f (A + εEij) is

‖f (A)− f (A + εEij)‖ ≈ ε‖Lf (A,Eij)‖.

• ‖Lf (A,Eij)‖ gives the sensitivity in (i , j) component

• Sometimes we want the t most sensitive elements for t = 5: 20


A simple algorithm

To compute the most sensitive t entries of A:

1 for i = 1: n2 for j = 1: n3 if Aij 6= 04 Compute and store ‖Lf (A,Eij)‖5 end if6 end for7 end for8 Take the largest t values of ‖Lf (A,Eij)‖

Cost: Up to O(n5) flops since computing Lf (A,E ) costs O(n3) flops

• Trivially parallel but still very expensive when A is large

• Speed this up using block norm estimation (work in progress)


The Nuclear Activation Sensitivity Problem

• Chemical reactions: u′(t) = Au(t)

• u(t) = eAtu(0) tells us theconcentration of each element at time t

• qTu(t) is the dosage at time t

• Aij represents the reaction betweenelements i and j (so ignore Aij = 0)

• Aij is subject to measurement errorWhat happens to qTu(t) when itchanges?

Implications for safety in radiation exposure models etc.


Nuclear Activation Solution - 1

If Aij is perturbed, this introduces a relative error in qTu(t) of

|qT (etA+εEij − etA)u(0)||qT etAu(0)|

≈ ε|qTLex (tA,Eij)u(0)||qT etAu(0)|

We note that:

• The denominator is the same for all perturbations

• This requires computing a derivative in all directions Aij 6= 0

• Can we improve upon this?


Nuclear activation solution - 2

Using vec(AXB) = (BT ⊗A)vec(X ) we see the sensitivity in direction Eij is

|qTLex (tA,Eij)u(0)| = |(u(0)T ⊗ qT )Kex (tA) vec(Eij)|.

Therefore the sensitivity in ALL n2 directions is

|[(u(0)T ⊗ qT )Kex (tA)]T | = |vec(Lex (tA, unvec(u(0)⊗ q)T )T |.

• Only 1 derivative needed for all sensitivities

• Found 2 bugs in existing commercial software!

• Extend for time dependent coefficients A = A(t)


Predicting Algebraic Error in an ODE

Let’s solve the model ODE

−u′′ = f (x), x ∈ (0, 1), u(0) = u(1) = 0

with the finite element method using piecewise linear basis functions φi .

• Exact solution u(x) = e−5(x−0.5)2 − e5/4 determines f (x)

• Generate a grid of n = 19 equally spaced points xi

• Generate system Ax = b where Aij =∫ 10 φiφj and bi = f (xi ).

A = diag(−1, 2,−1) in this case

• Solve with CG iteration


Algebraic and discretization errors

• Let Vh be our finite element space (dimension 19)

• Let uh ∈ Vh be the best solution possible from Vh

• Let ukest be our numerical solution corresponding to k iterations of CG

• The discretization error is u − uh

• The algebraic error is uh − ukest

• The total error is u − ukest = alg. err. + disc. err.

• Sometimes alg err dominates the total err, how do we detect this?


Discretization error

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3

3.5x 10

−3 Discretization Error

u − uh


Algebraic Error - 8 CG iterations

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.015

−0.01

−0.005

0

0.005

0.01

0.015Algebraic Error k = 8

Alg. Err.

Total Err.

Nodes 9–11 highlighted


Algebraic Error - 9 CG iterations

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−5

−4

−3

−2

−1

0

1

2

3

4

5x 10

−3 Algebraic Error k = 9

Alg. Err.

Total Err.

Nodes 9–11 highlighted


Elementwise sensitivity analysis

• Taking f (A) = A−1 we can calculate the sensitivity of each element

• Lf (A,E ) = −A−1EA−1 so easily computed

• Ignore Aij = 0 since the two basis elements don’t overlap

• Results plotted on the following heat map


Elementwise sensitivity analysis

Most sensitive elements of A when computing A−1

in 1−norm

2 4 6 8 10 12 14 16 18

2

4

6

8

10

12

14

16

18

0

0.1

0.2

0.3

0.4

0.5

0.6

Row/Cols 9–11 in the middle


2D Peak Problem

0

0.2

0.4

0.6

0.8

1 0

0.2

0.4

0.6

0.8

1−0.005

0

0.005

0.01

0.015

0.02

0.025

0.03

Peak problem


Algebraic Error Estimation

00.5

1 0

0.5

1

−2

−1

0

1

2

x 10−4

00.5

1 0

0.5

1

−1.5

−1

−0.5

0

0.5

1

1.5

x 10−7

Left: True algebraic error using 7 CG iterations.Right: Error in estimated algebraic error using 1st Frechet derivative.


Higher Order Derivatives to Estimate Alg. Err.

0 50 100 150 20010

−16

10−14

10−12

10−10

10−8

10−6

Componentwise error using kth order derivatives, k = 1, 3, 5.


Possible extensions

• Can this be used to modify the discretization mesh to obtain betteraccuracy? (See Papez, Liesen, and Strakos 2014)

• Currently too expensive: can we estimate the sensitivities?

• Can this be extended to f (A) = eA (exponential integrators)?


Conclusions

• Explained elementwise sensitivity of matrix functions

• New applications in nuclear physics and FEM analysis

• Former is basically solved, latter needs to be cheaper

Future work:

• Estimate sensitivities more efficiently (block norm estimation)

• Further comparison of nuclear physics solution to commercialalternative

• Further analysis of ODE problem


Higher Order Frechet Derivatives

Higher order derivatives can be defined recursively:

L(k)f (A+Ek+1,E1, ... ,Ek)− L

(k)f (A,E1, ... ,Ek) =

L(k+1)f (A,E1, ... ,Ek ,Ek+1) + o(‖Ek+1‖)

Also have a simple method to compute them. For example:

f

A E1 E2 00 A 0 E2

0 0 A E1

0 0 A

=

f (A) Lf (A,E1) Lf (A,E2) L

(2)f (A,E1,E2)

0 f (A) 0 Lf (A,E2)0 0 f (A) Lf (A,E1)0 0 0 f (A)

More info in Higham & Relton, SIMAX 35(4), 2014.


frechet derivatives of matrix functions and applications

Science