bayesian analysis of the effects of lower precision

23
0. Bayesian Analysis of the Effects of Lower Precision Arithmetic in Inverse Problems Daniela Calvetti based on work with D Devathi and E Somersalo Case Western Reserve University Department of Mathematics, Applied Mathematics and Statistics SIAM CSE 2019 Spokane, February 27, 2019 Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 1 / 23

Upload: others

Post on 09-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

0.

Bayesian Analysis of the Effects of Lower PrecisionArithmetic in Inverse Problems

Daniela Calvetti

based on work with D Devathi and E Somersalo

Case Western Reserve UniversityDepartment of Mathematics, Applied Mathematics and Statistics

SIAM CSE 2019

Spokane, February 27, 2019

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 1 / 23

0.

Introduction

Inverse Problems: Estimate a variable x ∈ Rn from noisy indirect observations,

b = f (x) + ε, f : Rn → Rm forward map.

Bayesian Formulation: x , b and ε realizations of random variables,

X ∼ πx , E ∼ πε, B ∼ πb|x( · | x).

Posterior distribution = solution of the inverse problem. Bayes’ formula:

πx|b ∝ πx(x)πb|x(b | x) = πx(x)πε(b − f (x)).

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 2 / 23

0.

Introduction

Exploration of the posterior by sampling: Generate a representative sample

S {x (1), x (2), . . . , x (N)}, x (j) ∼ πx|b( · | b),

using, e.g., Markov Chain Monte Carlo (MCMC).A sample provides means to estimate the posterior mean and covariance (UQ):

x ≈ 1

N

N∑j=1

x (j), C ≈ 1

N

N∑j=1

(x (j) − x)(x (j) − x)T.

Observations:

Sample generation can be time consuming for costly forward model.

Lower precision arithmetic (LPA) can reduce time for sample generation.

It is not immediately clear how LPA affect the UQ estimates.

We could use UQ tools to explore the effect of LPA.

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 3 / 23

0.

Sampling from the posterior

Model problem: Consider an IP with linear forward map,

b = Ax + ε, A ∈ Rm×n,

Gaussian prior and Gaussian likelihood,

πx ∼ N (0, Γ), πε ∼ N (0,Σ).

Then the posterior is Gaussian,

πx|b ∼ N (x ,C),

whereC = (ATC−1A + Γ−1︸ ︷︷ ︸

P

)−1, x = CATΣ−1b,

P = posterior precision matrix.

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 4 / 23

0.

Sampling from the posterior

If a symmetric factorization of the posterior precision is available

P = KTK,

one can use the following sampling scheme. Let

x (j) = x + y (j),

where y (j) solvesKy (j) = w (j), w (j) ∼ N (0, In).

The approach may be of little use, if

n is very large,

the matrix A is not known explicitly.

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 5 / 23

0.

Whitening

Consider an alternative sampler for large and matrix free problems

requiring only evaluations of products with A or AT: (A, x) 7→ Ax ,

reducible to solving a few least squares problems.

Assume thatΓ−1 = LTL, Σ−1 = STS.

Whitening (Mahalanobis transformation):

A = SAL−1, b = Sb.

Whitened model: If z = Lx , we have

b = Az + ε,

whereπz ∼ N (0, In), ε ∼ N (0, Im).

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 6 / 23

0.

Effective sampler

Precision and covariance of the whitened problem:

P = ATA + In, C = P−1

LemmaLet z be a random variable solving the equation

Pz = ATb + w , (1)

where b ∈ Rm is an observed realization of the whitened data, and w is aGaussian random variable,

w ∼ N (0, P).

Thenz ∼ N (z , C), z = CATb,

which is the posterior distribution of the whitened problem.

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 7 / 23

0.

Effective sampler

Sampling:

1 Generate wk ∼ N (0, P),

2 Solve Pzk = ATb + wk ,

3 Solve Lxk = zk .

Step 1: Generation of wk is straightforward:

wk = ATνk + ηk , ηk ∼ N (0, In), νk ∼ N (0, Im) independent.

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 8 / 23

0.

Effective sampler

Step 2: The computation of zk reduces to solving a few least squares problems.In fact

Pzk = ATb + wk

= ATb + ATνk + ηk ,

1 Writeηk = ATδk + hk ∈ R(AT)⊕N (A).

where δk satisfies

AATδk = Aηk . (2)

2 Expresszk = zk1 + hk ,

where zk1 satisfies

(ATA + I)zk1 = AT(b + νk + δk). (3)

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 9 / 23

0.

Effective sampler

Because (2) and (3) are the normal equations for the least squares problems

ATδk = ηk , (4)

and [AIn

]zk1 =

[b + νk + δk

0

]. (5)

δk and zk1 can be computed by Krylov subspace iterative solvers (e.g., CGLS),requiring only

(A, z) 7→ Az , (AT, y) 7→ ATy . (6)

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 10 / 23

0.

Fast Sampling via CGLS

An alternative to iteratively solving (5) to full convergence is to approximate itssolution with

xk0 = argmin{‖y − Ax‖ | x ∈ Kk(A, y)},

where y = b + νk + δk and k0 is chosen so that

G (xk0) = min{G (xk) | k = 1, 2, . . .},

withG (x) = ‖y − Ax‖2 + ‖x‖2.

Calvetti D, Devathi D and Somersalo E (2019) Posterior Sampling with Priorconditioned CGLS

for Underdetermined Ill-posed Problems. Manuscript.

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 11 / 23

0.

Sampling with Lower Precision Arithmetic

Consider two possible scenarios:

1 Matrix-vector products involving A (6) are computed in LPA;

2 Matrix-vector products involving A and the Krylov least squares solvers areimplemented in LPA.

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 12 / 23

0.

Example: Deconvolution problem

Consider a deconvolution problems with Airy kernel and Gaussian noise

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

0

0.01

0.02

0.03

0.04

FWHM

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

0

0.01

0.02

0.03

0.04

FWHM

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 13 / 23

0.

Sparse data

Reference

0 0.2 0.4 0.6 0.8 1

-1

-0.5

0

0.5

1

1.5

2

2.5Double precision

0 0.2 0.4 0.6 0.8 1

-1

-0.5

0

0.5

1

1.5

2

2.5

Single precision MatVec A

0 0.2 0.4 0.6 0.8 1

-1

-0.5

0

0.5

1

1.5

2

2.5All single precision

0 0.2 0.4 0.6 0.8 1

-0.5

0

0.5

1

1.5

2

2.5

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 14 / 23

0.

Normality: QQ-plots

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1Double precision

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1Single precision

Note: The sample corresponding to ’all single’ is not Gaussian at all!

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 15 / 23

0.

Eigenvalues of covariance matrices

0 200 400 600 800 100010

-4

10-3

10-2

10-1

100

reference

double

0 200 400 600 800 100010

-4

10-3

10-2

10-1

100

101

reference

single

0 200 400 600 800 100010

-30

10-20

10-10

100

1010

reference

single

all single

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 16 / 23

0.

Scatter analysis

Given two samples

S1 = {x11 , . . . , xN1 } and S2 = {x12 , . . . , xN2 },

find a few orthogonal directions, v1, . . . , vk ∈ Rn along which the difference in thespread of the samples is maximized.

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 17 / 23

0.

Scatter analysis

Sample covariance matrices:

Ck =1

N

N∑j=1

(x jk − xk)(x jk − xk)T, k = 1, 2.

where

xk =1

N

N∑j=1

x jk .

Problem: Find q ∈ Rn maximizing (or minimizing)

H(q) =qTC1q

qTC2q.

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 18 / 23

0.

Scatter analysis

SinceH(αq) = H(q) for all α ∈ R,

it is possible to scale q so that

qTC2q = 1,

yielding the constrained optimization problem:

q = argmax{qTC1q}subject to qTC2q = 1.

Equivalently, we may solve the generalized eigenvalue problem:

C1q = λC2q.

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 19 / 23

0.

Scatter analysis

We remark that:

1 All generalized eigenvalues are real

2 The eigenvectors are not orthogonal

3 The maximizer is the eigenvector associated with the largest generalizedeigenvalue.

To analyze the samples, we project them in the subspaces determined by thegeneralized eigenvectors corresponding to k largest eigenvalues.

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 20 / 23

0.

Scatter analysis: Reference vs Double

The reference sample is the once computed in a canonical way, via the Cholesky ofthe precision of the posterior, i.e., C2 = Cref .

-0.5 0 0.5

-0.6

-0.4

-0.2

0

0.2

0.4

0.6 double

reference

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 21 / 23

0.

Scatter analysis: Reference vs Single

-1 0 1 2-1.5

-1

-0.5

0

0.5

1

1.5single

reference

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 22 / 23

0.

Scatter analysis: Reference vs All Single

-0.2 -0.1 0 0.1 0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15all single

reference

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 23 / 23