bayesian analysis of the effects of lower precision

0.

Bayesian Analysis of the Effects of Lower PrecisionArithmetic in Inverse Problems

Daniela Calvetti

based on work with D Devathi and E Somersalo

Case Western Reserve UniversityDepartment of Mathematics, Applied Mathematics and Statistics

SIAM CSE 2019

Spokane, February 27, 2019

Daniela Calvetti (CWRU) Bayesian Inverse Problems Spokane, February 27, 2019 1 / 23

0.

Introduction

Inverse Problems: Estimate a variable x ∈ Rn from noisy indirect observations,

b = f (x) + ε, f : Rn → Rm forward map.

Bayesian Formulation: x , b and ε realizations of random variables,

X ∼ πx , E ∼ πε, B ∼ πb|x( · | x).

Posterior distribution = solution of the inverse problem. Bayes’ formula:

πx|b ∝ πx(x)πb|x(b | x) = πx(x)πε(b − f (x)).


0.

Introduction

Exploration of the posterior by sampling: Generate a representative sample

S {x (1), x (2), . . . , x (N)}, x (j) ∼ πx|b( · | b),

using, e.g., Markov Chain Monte Carlo (MCMC).A sample provides means to estimate the posterior mean and covariance (UQ):

x ≈ 1

N

N∑j=1

x (j), C ≈ 1

N

N∑j=1

(x (j) − x)(x (j) − x)T.

Observations:

Sample generation can be time consuming for costly forward model.

Lower precision arithmetic (LPA) can reduce time for sample generation.

It is not immediately clear how LPA affect the UQ estimates.

We could use UQ tools to explore the effect of LPA.


0.

Sampling from the posterior

Model problem: Consider an IP with linear forward map,

b = Ax + ε, A ∈ Rm×n,

Gaussian prior and Gaussian likelihood,

πx ∼ N (0, Γ), πε ∼ N (0,Σ).

Then the posterior is Gaussian,

πx|b ∼ N (x ,C),

whereC = (ATC−1A + Γ−1︸︷︷︸

P

)−1, x = CATΣ−1b,

P = posterior precision matrix.


0.

Sampling from the posterior

If a symmetric factorization of the posterior precision is available

P = KTK,

one can use the following sampling scheme. Let

x (j) = x + y (j),

where y (j) solvesKy (j) = w (j), w (j) ∼ N (0, In).

The approach may be of little use, if

n is very large,

the matrix A is not known explicitly.


0.

Whitening

Consider an alternative sampler for large and matrix free problems

requiring only evaluations of products with A or AT: (A, x) 7→ Ax ,

reducible to solving a few least squares problems.

Assume thatΓ−1 = LTL, Σ−1 = STS.

Whitening (Mahalanobis transformation):

A = SAL−1, b = Sb.

Whitened model: If z = Lx , we have

b = Az + ε,

whereπz ∼ N (0, In), ε ∼ N (0, Im).


0.

Effective sampler

Precision and covariance of the whitened problem:

P = ATA + In, C = P−1

LemmaLet z be a random variable solving the equation

Pz = ATb + w , (1)

where b ∈ Rm is an observed realization of the whitened data, and w is aGaussian random variable,

w ∼ N (0, P).

Thenz ∼ N (z , C), z = CATb,

which is the posterior distribution of the whitened problem.


0.

Effective sampler

Sampling:

1 Generate wk ∼ N (0, P),

2 Solve Pzk = ATb + wk ,

3 Solve Lxk = zk .

Step 1: Generation of wk is straightforward:

wk = ATνk + ηk , ηk ∼ N (0, In), νk ∼ N (0, Im) independent.


0.

Effective sampler

Step 2: The computation of zk reduces to solving a few least squares problems.In fact

Pzk = ATb + wk

= ATb + ATνk + ηk ,

1 Writeηk = ATδk + hk ∈ R(AT)⊕N (A).

where δk satisfies

AATδk = Aηk . (2)

2 Expresszk = zk1 + hk ,

where zk1 satisfies

(ATA + I)zk1 = AT(b + νk + δk). (3)


0.

Effective sampler

Because (2) and (3) are the normal equations for the least squares problems

ATδk = ηk , (4)

and [AIn

]zk1 =

[b + νk + δk

0

]. (5)

δk and zk1 can be computed by Krylov subspace iterative solvers (e.g., CGLS),requiring only

(A, z) 7→ Az , (AT, y) 7→ ATy . (6)


0.

Fast Sampling via CGLS

An alternative to iteratively solving (5) to full convergence is to approximate itssolution with

xk0 = argmin{‖y − Ax‖ | x ∈ Kk(A, y)},

where y = b + νk + δk and k0 is chosen so that

G (xk0) = min{G (xk) | k = 1, 2, . . .},

withG (x) = ‖y − Ax‖2 + ‖x‖2.

Calvetti D, Devathi D and Somersalo E (2019) Posterior Sampling with Priorconditioned CGLS

for Underdetermined Ill-posed Problems. Manuscript.


0.

Sampling with Lower Precision Arithmetic

Consider two possible scenarios:

1 Matrix-vector products involving A (6) are computed in LPA;

2 Matrix-vector products involving A and the Krylov least squares solvers areimplemented in LPA.


0.

Example: Deconvolution problem

Consider a deconvolution problems with Airy kernel and Gaussian noise

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

0

0.01

0.02

0.03

0.04

FWHM

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

0

0.01

0.02

0.03

0.04

FWHM


0.

Sparse data

Reference

0 0.2 0.4 0.6 0.8 1

-1

-0.5

0

0.5

1

1.5

2

2.5Double precision

0 0.2 0.4 0.6 0.8 1

-1

-0.5

0

0.5

1

1.5

2

2.5

Single precision MatVec A

0 0.2 0.4 0.6 0.8 1

-1

-0.5

0

0.5

1

1.5

2

2.5All single precision

0 0.2 0.4 0.6 0.8 1

-0.5

0

0.5

1

1.5

2

2.5


0.

Normality: QQ-plots

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1Double precision

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1Single precision

Note: The sample corresponding to ’all single’ is not Gaussian at all!


0.

Eigenvalues of covariance matrices

0 200 400 600 800 100010

-4

10-3

10-2

10-1

100

reference

double

0 200 400 600 800 100010

-4

10-3

10-2

10-1

100

101

reference

single

0 200 400 600 800 100010

-30

10-20

10-10

100

1010

reference

single

all single


0.

Scatter analysis

Given two samples

S1 = {x11 , . . . , xN1 } and S2 = {x12 , . . . , xN2 },

find a few orthogonal directions, v1, . . . , vk ∈ Rn along which the difference in thespread of the samples is maximized.


0.

Scatter analysis

Sample covariance matrices:

Ck =1

N

N∑j=1

(x jk − xk)(x jk − xk)T, k = 1, 2.

where

xk =1

N

N∑j=1

x jk .

Problem: Find q ∈ Rn maximizing (or minimizing)

H(q) =qTC1q

qTC2q.


0.

Scatter analysis

SinceH(αq) = H(q) for all α ∈ R,

it is possible to scale q so that

qTC2q = 1,

yielding the constrained optimization problem:

q = argmax{qTC1q}subject to qTC2q = 1.

Equivalently, we may solve the generalized eigenvalue problem:

C1q = λC2q.


0.

Scatter analysis

We remark that:

1 All generalized eigenvalues are real

2 The eigenvectors are not orthogonal

3 The maximizer is the eigenvector associated with the largest generalizedeigenvalue.

To analyze the samples, we project them in the subspaces determined by thegeneralized eigenvectors corresponding to k largest eigenvalues.


0.

Scatter analysis: Reference vs Double

The reference sample is the once computed in a canonical way, via the Cholesky ofthe precision of the posterior, i.e., C2 = Cref .

-0.5 0 0.5

-0.6

-0.4

-0.2

0

0.2

0.4

0.6 double

reference


0.

Scatter analysis: Reference vs Single

-1 0 1 2-1.5

-1

-0.5

0

0.5

1

1.5single

reference


0.

Scatter analysis: Reference vs All Single

-0.2 -0.1 0 0.1 0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15all single

reference


bayesian analysis of the effects of lower precision

Documents