frechet derivatives of matrix functions and applications

Frechet Derivatives of Matrix Functions and

Applications

Samuel Reltonsamuel.relton@maths.man.ac.uk @sdrelton

samrelton.com blog.samrelton.com

Joint work with Nicholas J. Highamhigham@maths.man.ac.uk @nhigham

www.maths.man.ac.uk/˜higham nickhigham.wordpress.com

University of Manchester, UK

September 4, 2014

Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 1 / 23

Outline

• Matrix Functions, their Derivatives, and the Condition Number

• Elementwise Sensitivity

• Physics: Nuclear Activation Sensitivity Problem

• Differential Equations: Predicting Algebraic Error in the FEM

Matrix Functions

We are interested in functions f : Cn×n 7→ Cn×n e.g.

Matrix Exponential eA =∞∑k=0

Matrix Cosine cos(A) =∞∑k=0

(−1)kA2k

• Define f (A) by Taylor series when f is analytic

• If A = XDX−1 then f (A) = Xf (D)X−1

• Differential equations: dudt = Au(t), u = etAu(0)

• Use cos(A) and sin(A) for second order ODEs

Matrix Functions

We are interested in functions f : Cn×n 7→ Cn×n e.g.

Matrix Exponential eA =∞∑k=0

Matrix Cosine cos(A) =∞∑k=0

(−1)kA2k

• Define f (A) by Taylor series when f is analytic

• If A = XDX−1 then f (A) = Xf (D)X−1

• Differential equations: dudt = Au(t), u = etAu(0)

• Use cos(A) and sin(A) for second order ODEs

Frechet Derivatives

Let f : Cn×n 7→ Cn×n be a matrix function.

Definition (Frechet derivative)

The Frechet derivative of f at A is the unique linear functionLf (A, ·) : Cn×n 7→ Cn×n such that for all E

f (A + E )− f (A)− Lf (A,E ) = o(‖E‖).

• Applications include manifold optimization, Markov models,bladder cancer, image processing, and network analysis

• Higher order derivatives recently analyzed (Higham & R., 2014)

Frechet Derivatives

Let f : Cn×n 7→ Cn×n be a matrix function.

Definition (Frechet derivative)

The Frechet derivative of f at A is the unique linear functionLf (A, ·) : Cn×n 7→ Cn×n such that for all E

f (A + E )− f (A)− Lf (A,E ) = o(‖E‖).

• Applications include manifold optimization, Markov models,bladder cancer, image processing, and network analysis

• Higher order derivatives recently analyzed (Higham & R., 2014)

Sensitivity of Matrix Functions

f (SA)

f (SX )

The function f is well conditioned at A andill conditioned at X

Sensitivity of Matrix Functions

f (SA)

f (SX )

The function f is well conditioned at A andill conditioned at X

The Norm-wise Condition Number

The two condition numbers for a matrix function are:

condabs(f ,A) = max‖E‖=1

‖Lf (A,E )‖,

condrel(f ,A) = max‖E‖=1

‖Lf (A,E )‖ ‖A‖‖f (A)‖

Elementwise Sensitivity

If we change just one element Aij , how is f (A) affected?

Let Eij =[δij], then the difference between f (A) and f (A + εEij) is

‖f (A)− f (A + εEij)‖ ≈ ε‖Lf (A,Eij)‖.

• ‖Lf (A,Eij)‖ gives the sensitivity in (i , j) component

• Sometimes we want the t most sensitive elements for t = 5: 20

A simple algorithm

To compute the most sensitive t entries of A:

1 for i = 1: n2 for j = 1: n3 if Aij 6= 04 Compute and store ‖Lf (A,Eij)‖5 end if6 end for7 end for8 Take the largest t values of ‖Lf (A,Eij)‖

Cost: Up to O(n5) flops since computing Lf (A,E ) costs O(n3) flops

• Trivially parallel but still very expensive when A is large

• Speed this up using block norm estimation (work in progress)

A simple algorithm

To compute the most sensitive t entries of A:

1 for i = 1: n2 for j = 1: n3 if Aij 6= 04 Compute and store ‖Lf (A,Eij)‖5 end if6 end for7 end for8 Take the largest t values of ‖Lf (A,Eij)‖

Cost: Up to O(n5) flops since computing Lf (A,E ) costs O(n3) flops

• Trivially parallel but still very expensive when A is large

• Speed this up using block norm estimation (work in progress)

The Nuclear Activation Sensitivity Problem

• Chemical reactions: u′(t) = Au(t)

• u(t) = eAtu(0) tells us theconcentration of each element at time t

• qTu(t) is the dosage at time t

• Aij represents the reaction betweenelements i and j (so ignore Aij = 0)

• Aij is subject to measurement errorWhat happens to qTu(t) when itchanges?

Implications for safety in radiation exposure models etc.

Nuclear Activation Solution - 1

If Aij is perturbed, this introduces a relative error in qTu(t) of

|qT (etA+εEij − etA)u(0)||qT etAu(0)|

≈ ε|qTLex (tA,Eij)u(0)||qT etAu(0)|

We note that:

• The denominator is the same for all perturbations

• This requires computing a derivative in all directions Aij 6= 0

• Can we improve upon this?

Nuclear Activation Solution - 1

If Aij is perturbed, this introduces a relative error in qTu(t) of

|qT (etA+εEij − etA)u(0)||qT etAu(0)|

≈ ε|qTLex (tA,Eij)u(0)||qT etAu(0)|

We note that:

• The denominator is the same for all perturbations

• This requires computing a derivative in all directions Aij 6= 0

• Can we improve upon this?

Nuclear activation solution - 2

Using vec(AXB) = (BT ⊗A)vec(X ) we see the sensitivity in direction Eij is

|qTLex (tA,Eij)u(0)| = |(u(0)T ⊗ qT )Kex (tA) vec(Eij)|.

Therefore the sensitivity in ALL n2 directions is

|[(u(0)T ⊗ qT )Kex (tA)]T | = |vec(Lex (tA, unvec(u(0)⊗ q)T )T |.

• Only 1 derivative needed for all sensitivities

• Found 2 bugs in existing commercial software!

• Extend for time dependent coefficients A = A(t)

Predicting Algebraic Error in an ODE

Let’s solve the model ODE

−u′′ = f (x), x ∈ (0, 1), u(0) = u(1) = 0

with the finite element method using piecewise linear basis functions φi .

• Exact solution u(x) = e−5(x−0.5)2 − e5/4 determines f (x)

• Generate a grid of n = 19 equally spaced points xi

• Generate system Ax = b where Aij =∫ 10 φiφj and bi = f (xi ).

A = diag(−1, 2,−1) in this case

• Solve with CG iteration

Algebraic and discretization errors

• Let Vh be our finite element space (dimension 19)

• Let uh ∈ Vh be the best solution possible from Vh

• Let ukest be our numerical solution corresponding to k iterations of CG

• The discretization error is u − uh

• The algebraic error is uh − ukest

• The total error is u − ukest = alg. err. + disc. err.

• Sometimes alg err dominates the total err, how do we detect this?

Algebraic and discretization errors

• Let Vh be our finite element space (dimension 19)

• Let uh ∈ Vh be the best solution possible from Vh

• Let ukest be our numerical solution corresponding to k iterations of CG

• The discretization error is u − uh

• The algebraic error is uh − ukest

• The total error is u − ukest = alg. err. + disc. err.

• Sometimes alg err dominates the total err, how do we detect this?

Discretization error

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1.5

−0.5

3.5x 10

−3 Discretization Error

u − uh

Algebraic Error - 8 CG iterations

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.015

−0.01

−0.005

0.015Algebraic Error k = 8

Alg. Err.

Total Err.

Nodes 9–11 highlighted

Algebraic Error - 9 CG iterations

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−5

−3 Algebraic Error k = 9

Alg. Err.

Total Err.

Nodes 9–11 highlighted

Elementwise sensitivity analysis

• Taking f (A) = A−1 we can calculate the sensitivity of each element

• Lf (A,E ) = −A−1EA−1 so easily computed

• Ignore Aij = 0 since the two basis elements don’t overlap

• Results plotted on the following heat map

Elementwise sensitivity analysis

Most sensitive elements of A when computing A−1

in 1−norm

2 4 6 8 10 12 14 16 18

Row/Cols 9–11 in the middle

2D Peak Problem

1−0.005

Peak problem

Algebraic Error Estimation

x 10−4

−1.5

−0.5

x 10−7

Left: True algebraic error using 7 CG iterations.Right: Error in estimated algebraic error using 1st Frechet derivative.

Higher Order Derivatives to Estimate Alg. Err.

0 50 100 150 20010

10−14

10−12

10−10

10−8

10−6

Componentwise error using kth order derivatives, k = 1, 3, 5.

Possible extensions

• Can this be used to modify the discretization mesh to obtain betteraccuracy? (See Papez, Liesen, and Strakos 2014)

• Currently too expensive: can we estimate the sensitivities?

• Can this be extended to f (A) = eA (exponential integrators)?

Conclusions

• Explained elementwise sensitivity of matrix functions

• New applications in nuclear physics and FEM analysis

• Former is basically solved, latter needs to be cheaper

Future work:

• Estimate sensitivities more efficiently (block norm estimation)

• Further comparison of nuclear physics solution to commercialalternative

• Further analysis of ODE problem

Higher Order Frechet Derivatives

Higher order derivatives can be defined recursively:

L(k)f (A+Ek+1,E1, ... ,Ek)− L

(k)f (A,E1, ... ,Ek) =

L(k+1)f (A,E1, ... ,Ek ,Ek+1) + o(‖Ek+1‖)

Also have a simple method to compute them. For example:

A E1 E2 00 A 0 E2

0 0 A E1

f (A) Lf (A,E1) Lf (A,E2) L

(2)f (A,E1,E2)

0 f (A) 0 Lf (A,E2)0 0 f (A) Lf (A,E1)0 0 0 f (A)

More info in Higham & Relton, SIMAX 35(4), 2014.

frechet derivatives of matrix functions and applications

Science

secondary invariants for frechet algebras and...

complex-valued matrix derivatives -...

derivatives, derivatives, derivatives, derivatives

lecture 05 common financial derivatives. common financial...

david frechet zongoraestje

luenberger d.- optimization bey,...

jurisdiction x the matrix – derivatives - eea

higher order frechet derivatives of matrix functions and...

ac power flows and their derivatives using …ac power flows...

frechet algebras of power series -...

chemistry 3a -- frechet -- midterm 1 -- spring 2005

higher order frechet derivatives of matrix functions and the...

regularization and optimization of backpropagation · keith...

mean ergodic operators in frechet spaces

frechet differentiability of molecular … · frechet...

on kronecker products, tensor products and matrix ... ·...

differentiating through the frechet mean´

new approximations to rotation matrices and their...

the matrix cookbook - columbia...

matrix exports · matrix exports is one of the leading...