frechet derivatives of matrix functions and applications
DESCRIPTION
I discuss some recent ideas using the Frechet derivative of matrix functions to analyze the mixed condition number, solve the nuclear activation sensitivity problem, and analyze the distribution of the algebraic error in the finite element method. Originally presented at the 4th IMA Conference on Numerical Linear Algebra and Optimization, Birmingham, UK. 4th September 2014. Joint work with Nicholas J. Higham, Wayne Arter, Zdenek Strakos, and Jan Papez.TRANSCRIPT
Frechet Derivatives of Matrix Functions and
Applications
Samuel [email protected] @sdrelton
samrelton.com blog.samrelton.com
Joint work with Nicholas J. [email protected] @nhigham
www.maths.man.ac.uk/˜higham nickhigham.wordpress.com
University of Manchester, UK
September 4, 2014
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 1 / 23
Outline
• Matrix Functions, their Derivatives, and the Condition Number
• Elementwise Sensitivity
• Physics: Nuclear Activation Sensitivity Problem
• Differential Equations: Predicting Algebraic Error in the FEM
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 2 / 23
Matrix Functions
We are interested in functions f : Cn×n 7→ Cn×n e.g.
Matrix Exponential eA =∞∑k=0
Ak
k!
Matrix Cosine cos(A) =∞∑k=0
(−1)kA2k
(2k)!
• Define f (A) by Taylor series when f is analytic
• If A = XDX−1 then f (A) = Xf (D)X−1
• Differential equations: dudt = Au(t), u = etAu(0)
• Use cos(A) and sin(A) for second order ODEs
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 3 / 23
Matrix Functions
We are interested in functions f : Cn×n 7→ Cn×n e.g.
Matrix Exponential eA =∞∑k=0
Ak
k!
Matrix Cosine cos(A) =∞∑k=0
(−1)kA2k
(2k)!
• Define f (A) by Taylor series when f is analytic
• If A = XDX−1 then f (A) = Xf (D)X−1
• Differential equations: dudt = Au(t), u = etAu(0)
• Use cos(A) and sin(A) for second order ODEs
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 3 / 23
Frechet Derivatives
Let f : Cn×n 7→ Cn×n be a matrix function.
Definition (Frechet derivative)
The Frechet derivative of f at A is the unique linear functionLf (A, ·) : Cn×n 7→ Cn×n such that for all E
f (A + E )− f (A)− Lf (A,E ) = o(‖E‖).
• Applications include manifold optimization, Markov models,bladder cancer, image processing, and network analysis
• Higher order derivatives recently analyzed (Higham & R., 2014)
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 4 / 23
Frechet Derivatives
Let f : Cn×n 7→ Cn×n be a matrix function.
Definition (Frechet derivative)
The Frechet derivative of f at A is the unique linear functionLf (A, ·) : Cn×n 7→ Cn×n such that for all E
f (A + E )− f (A)− Lf (A,E ) = o(‖E‖).
• Applications include manifold optimization, Markov models,bladder cancer, image processing, and network analysis
• Higher order derivatives recently analyzed (Higham & R., 2014)
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 4 / 23
Sensitivity of Matrix Functions
f
f
SA
f (SA)
SX
f (SX )
The function f is well conditioned at A andill conditioned at X
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 5 / 23
Sensitivity of Matrix Functions
f
f
SA
f (SA)
SX
f (SX )
The function f is well conditioned at A andill conditioned at X
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 5 / 23
The Norm-wise Condition Number
The two condition numbers for a matrix function are:
condabs(f ,A) = max‖E‖=1
‖Lf (A,E )‖,
condrel(f ,A) = max‖E‖=1
‖Lf (A,E )‖ ‖A‖‖f (A)‖
.
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 6 / 23
Elementwise Sensitivity
If we change just one element Aij , how is f (A) affected?
Let Eij =[δij], then the difference between f (A) and f (A + εEij) is
‖f (A)− f (A + εEij)‖ ≈ ε‖Lf (A,Eij)‖.
• ‖Lf (A,Eij)‖ gives the sensitivity in (i , j) component
• Sometimes we want the t most sensitive elements for t = 5: 20
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 7 / 23
A simple algorithm
To compute the most sensitive t entries of A:
1 for i = 1: n2 for j = 1: n3 if Aij 6= 04 Compute and store ‖Lf (A,Eij)‖5 end if6 end for7 end for8 Take the largest t values of ‖Lf (A,Eij)‖
Cost: Up to O(n5) flops since computing Lf (A,E ) costs O(n3) flops
• Trivially parallel but still very expensive when A is large
• Speed this up using block norm estimation (work in progress)
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 8 / 23
A simple algorithm
To compute the most sensitive t entries of A:
1 for i = 1: n2 for j = 1: n3 if Aij 6= 04 Compute and store ‖Lf (A,Eij)‖5 end if6 end for7 end for8 Take the largest t values of ‖Lf (A,Eij)‖
Cost: Up to O(n5) flops since computing Lf (A,E ) costs O(n3) flops
• Trivially parallel but still very expensive when A is large
• Speed this up using block norm estimation (work in progress)
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 8 / 23
The Nuclear Activation Sensitivity Problem
• Chemical reactions: u′(t) = Au(t)
• u(t) = eAtu(0) tells us theconcentration of each element at time t
• qTu(t) is the dosage at time t
• Aij represents the reaction betweenelements i and j (so ignore Aij = 0)
• Aij is subject to measurement errorWhat happens to qTu(t) when itchanges?
Implications for safety in radiation exposure models etc.
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 9 / 23
Nuclear Activation Solution - 1
If Aij is perturbed, this introduces a relative error in qTu(t) of
|qT (etA+εEij − etA)u(0)||qT etAu(0)|
≈ ε|qTLex (tA,Eij)u(0)||qT etAu(0)|
We note that:
• The denominator is the same for all perturbations
• This requires computing a derivative in all directions Aij 6= 0
• Can we improve upon this?
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 10 / 23
Nuclear Activation Solution - 1
If Aij is perturbed, this introduces a relative error in qTu(t) of
|qT (etA+εEij − etA)u(0)||qT etAu(0)|
≈ ε|qTLex (tA,Eij)u(0)||qT etAu(0)|
We note that:
• The denominator is the same for all perturbations
• This requires computing a derivative in all directions Aij 6= 0
• Can we improve upon this?
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 10 / 23
Nuclear activation solution - 2
Using vec(AXB) = (BT ⊗A)vec(X ) we see the sensitivity in direction Eij is
|qTLex (tA,Eij)u(0)| = |(u(0)T ⊗ qT )Kex (tA) vec(Eij)|.
Therefore the sensitivity in ALL n2 directions is
|[(u(0)T ⊗ qT )Kex (tA)]T | = |vec(Lex (tA, unvec(u(0)⊗ q)T )T |.
• Only 1 derivative needed for all sensitivities
• Found 2 bugs in existing commercial software!
• Extend for time dependent coefficients A = A(t)
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 11 / 23
Predicting Algebraic Error in an ODE
Let’s solve the model ODE
−u′′ = f (x), x ∈ (0, 1), u(0) = u(1) = 0
with the finite element method using piecewise linear basis functions φi .
• Exact solution u(x) = e−5(x−0.5)2 − e5/4 determines f (x)
• Generate a grid of n = 19 equally spaced points xi
• Generate system Ax = b where Aij =∫ 10 φiφj and bi = f (xi ).
A = diag(−1, 2,−1) in this case
• Solve with CG iteration
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 12 / 23
Algebraic and discretization errors
• Let Vh be our finite element space (dimension 19)
• Let uh ∈ Vh be the best solution possible from Vh
• Let ukest be our numerical solution corresponding to k iterations of CG
• The discretization error is u − uh
• The algebraic error is uh − ukest
• The total error is u − ukest = alg. err. + disc. err.
• Sometimes alg err dominates the total err, how do we detect this?
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 13 / 23
Algebraic and discretization errors
• Let Vh be our finite element space (dimension 19)
• Let uh ∈ Vh be the best solution possible from Vh
• Let ukest be our numerical solution corresponding to k iterations of CG
• The discretization error is u − uh
• The algebraic error is uh − ukest
• The total error is u − ukest = alg. err. + disc. err.
• Sometimes alg err dominates the total err, how do we detect this?
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 13 / 23
Discretization error
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
3.5x 10
−3 Discretization Error
u − uh
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 14 / 23
Algebraic Error - 8 CG iterations
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.015
−0.01
−0.005
0
0.005
0.01
0.015Algebraic Error k = 8
Alg. Err.
Total Err.
Nodes 9–11 highlighted
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 15 / 23
Algebraic Error - 9 CG iterations
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−5
−4
−3
−2
−1
0
1
2
3
4
5x 10
−3 Algebraic Error k = 9
Alg. Err.
Total Err.
Nodes 9–11 highlighted
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 16 / 23
Elementwise sensitivity analysis
• Taking f (A) = A−1 we can calculate the sensitivity of each element
• Lf (A,E ) = −A−1EA−1 so easily computed
• Ignore Aij = 0 since the two basis elements don’t overlap
• Results plotted on the following heat map
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 17 / 23
Elementwise sensitivity analysis
Most sensitive elements of A when computing A−1
in 1−norm
2 4 6 8 10 12 14 16 18
2
4
6
8
10
12
14
16
18
0
0.1
0.2
0.3
0.4
0.5
0.6
Row/Cols 9–11 in the middle
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 18 / 23
2D Peak Problem
0
0.2
0.4
0.6
0.8
1 0
0.2
0.4
0.6
0.8
1−0.005
0
0.005
0.01
0.015
0.02
0.025
0.03
Peak problem
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 19 / 23
Algebraic Error Estimation
00.5
1 0
0.5
1
−2
−1
0
1
2
x 10−4
00.5
1 0
0.5
1
−1.5
−1
−0.5
0
0.5
1
1.5
x 10−7
Left: True algebraic error using 7 CG iterations.Right: Error in estimated algebraic error using 1st Frechet derivative.
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 20 / 23
Higher Order Derivatives to Estimate Alg. Err.
0 50 100 150 20010
−16
10−14
10−12
10−10
10−8
10−6
Componentwise error using kth order derivatives, k = 1, 3, 5.
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 21 / 23
Possible extensions
• Can this be used to modify the discretization mesh to obtain betteraccuracy? (See Papez, Liesen, and Strakos 2014)
• Currently too expensive: can we estimate the sensitivities?
• Can this be extended to f (A) = eA (exponential integrators)?
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 22 / 23
Conclusions
• Explained elementwise sensitivity of matrix functions
• New applications in nuclear physics and FEM analysis
• Former is basically solved, latter needs to be cheaper
Future work:
• Estimate sensitivities more efficiently (block norm estimation)
• Further comparison of nuclear physics solution to commercialalternative
• Further analysis of ODE problem
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 23 / 23
Higher Order Frechet Derivatives
Higher order derivatives can be defined recursively:
L(k)f (A+Ek+1,E1, ... ,Ek)− L
(k)f (A,E1, ... ,Ek) =
L(k+1)f (A,E1, ... ,Ek ,Ek+1) + o(‖Ek+1‖)
Also have a simple method to compute them. For example:
f
A E1 E2 00 A 0 E2
0 0 A E1
0 0 A
=
f (A) Lf (A,E1) Lf (A,E2) L
(2)f (A,E1,E2)
0 f (A) 0 Lf (A,E2)0 0 f (A) Lf (A,E1)0 0 0 f (A)
More info in Higham & Relton, SIMAX 35(4), 2014.
Sam Relton (UoM) Derivatives of matrix functions September 4, 2014 1 / 1