techniques for solution of large scale inverse problems ...rosie/mypresentations/gsu2015.pdf ·...

Techniques for Solution of Large ScaleInverse Problems: image restoration and

reconstruction

Rosemary RenautCollaborations with

Former UGs: Jakob Hansen, Michael Horst, (ASU)Dr. Saeed Vatankhah, (Tehran)

School of Mathematical and Statistical Sciences, Arizona State University,

Seminar: Georgia State Unversity

Outline

Background: lll-posed Inverse ProblemsTikhonov Regularization for Ill-Posed ProblemsStandard Approaches to Estimate Regularization Problem

Integral Equations and the Singular Value ExpansionEstimating the SVE via the SVDNumerical Rank: TruncationImpact on Regularization Parameter EstimationNumerical Examples

Iterative Solution of the LS problem: Hybrid LSQRFinding the Regularization ParameterIdentifying the optimal SubspaceNumerical Examples

Conclusions

Solving Practical Linear Systems of Equations

Forward Problem: Given x, A calculate b

b = Ax

Inverse Problem: Given b and A find x where A is invertible.

x = A−1b

Practically Can be complicated and depends on many factors

Illustration:Forward Problem - a blurred signal

Matrix A obtained from kernel h:

h(s) =1

σ√

2πexp(

−s2

2σ2), σ > 0.

The measured data b is obtainedfrom the integral relation and b = Ax,

bi ≈ g(si), xi ≈ f(ti).

g(s) =

∫ π

−πh(s− t)f(t)dt

Inverse Problem given A, b find x: x = A−1b

The solution depends on the conditioning of A and on the noisein measurements b. Condition of A is 1.8679e+ 05 Restorationwith noise .0001.

Example of restoration of 2D image with low noise: Inverse Crime

(a) Original (b) Noisy and Blurred10−5

(c) Noisy and Blurred10−3

(d) Noisy and Blurred 10−5 (e) Noisy and Blurred 10−3

Summary: Ill-posed problem

Goals Given data b and matrix A find x

Features May also want to accurately find features from x,for example location of peaks in signal

Difficulties The solution is very sensitive to the dataIll-Posed (according to Hadamard) A problem is ill-posed if it

does not satisfy conditions for well-posedness, or1. b /∈ range(A)2. inverse is not unique because more than one

image is mapped to the same data, or3. an arbitrarily small change in b can cause an

arbitrarily large change in x.

Spectral Decomposition of the Solution: The SVD

Consider general overdetermined discrete problem

Ax = b, A ∈ Rm×n, b ∈ Rm, x ∈ Rn, m ≥ n.

Singular value decomposition (SVD) of A (full column rank)

A = UΣV T =

n∑i=1

uiσivTi , Σ = diag(σ1, . . . , σn).

gives expansion for the solution

x =n∑i=1

uTi b

σivi

ui, vi are left and right singular vectors for ASolution is weighted linear combination of basis vi

Basis Vectors

Data red and solution black

Solutions blue using truncatedexpansions

x =

k∑i=1

uTi b

σivi

The Filtered SVD - more general than truncation

The truncated SVD is a special case of spectral filtering

xfilt =

n∑i=1

qi(uTi b

σi)vi

with qi = 0, i > k. We have to find k.Spectral filtering is used to filter the components in the spectralbasis, such that noise in signal is damped.

Tikhonov Regularization qi(λ) =σ2i

σ2i +λ2

, i = 1 . . . n, λ is theregularization parameter, and solution is

x(λ) = argminx{‖b−Ax‖2 + λ2‖x‖2}

1-D Interesting but Noisy Signal

Blur with Gaussian and add noise- can we find the solution?

Solution for Increasing λ

Solutions x(λ): we have to find λ

General Formulation Tikhonov Regularization for Ill-Posed Problems

Ill-Posed Equations in the presence of noise

Ax ≈ b A ∈ Rm×n

b = btrue + η, noise η ∼ N(0, Cη)

Tikhonov Regularization:

x = argminx∈Rn

{‖Ax− b‖2Wη+ λ2‖L(x− x0)‖22}

Mapping L defines basis for xPrior x0

Weighting Wη = C−1η , ‖y‖Wη = yTWηy. Whitens noise in b.

Requires automatic estimation of λ

Regularization Parameter Estimation using the SVD: Examples (m = n)

For L invertible, and SVD W1/2η AL−1 = UΣV T , Σ = diag(σi).

Find λopt fromUnbiased Predictive Risk Minimize functional

U(λ) =

n∑i=1

(λ2

σ2i + λ2

)2

(uTi b)2 + 2

n∑i=1

σ2i

σ2i + λ2

Morozov Discrepancy Principle Given parameter ν, solve

M(λ) =

n∑i=1

(λ2

σ2i + λ2

)2

(uTi b)2 − νn = 0

χ2 principle Find root of functional

χ(λ) =

n∑i=1

λ2

σ2i + λ2

(uTi b)2 − n = 0

GCV : Minimize rational function

G(λ) =

(n∑i=1

(λ2

σ2i + λ2

)2

(uTi b)2

)(n∑i=1

λ2

σ2i + λ2

)−2

Not practical for large scale problems

Large Scale Parameter Estimation

Expensive Cost of forming SVD (GSVD)UPRE, GCV Minimize nonlinear functional no unique minimum

MDP, χ2 Root finding. Require good estimate of noise Cη

Lcurve Requires many solutions to find the L-curveAccuracy For large scale SVD and GSVD are contaminated

by numerical noise

Can we use different approach for large scale problem?

Two directions

Integral Equation and Square Integrable Kernels

Integral equation

g(s) =

∫ 1

0h(s, t)f(t)dt.

Square integrable kernel

‖h‖22 =

∫ 1

0

∫ 1

0h(s, t)2 ds dt is finite.

Singular Value Expansion

h(s, t) =

∞∑j=1

µjuj(s)vj(t)

Singular Values µi, µ1 ≥ µ2 ≥ · · · ≥ 0.∑∞

i=1 µ2i is bounded

Singular Functions ui(s), vi(t) orthonormal w.r.t. inner product< φ,ψ >=

∫ 10 φ(t)ψ(t)dt

Basis for L2(0, 1) For f, g ∈ L2(0, 1):

f(t) =∑i

< vi, f > vi(t) g(s) =∑i

< ui, g > ui(s)

Galerkin Method to Approximate SVE by SVD [Han88]

1. Choose orthonormal basis functions ψ(n)i (s) and φ(n)

j (t).

2. Calculate matrix A(n) entries

a(n)ij =< h(si, t), φ

(n)j (t) > i, j = 1, . . . , n

3. Compute SVD of A(n) = U (n)Σ(n)(V (n))T ;

Σ(n) = diag(σ(n)i ) U (n) = (u

(n)ij ) V (n) = (v

(n)ij )

4. Define for j = 1 : n

u(n)j (s) :=

n∑i=1

u(n)ij ψi(s), v

(n)j (t) :=

n∑i=1

v(n)ij φi(t)

5. Then µj ≈ σj , uj(s) ≈ uj(s) and vj(t) ≈ vj(t)

Discrete SVD provides SVE of degenerate kernel

Theorem [Han88] σ(n)j , u(n)

j , v(n)j are exact singular values and

functions of degenerate kernel

h(n)(s, t) :=

n∑i=1

n∑j=1

h(n)ij ψ

(n)i (s)φ

(n)j (t)

Approximation Properties

Kernel ∆(n) =√‖h− h(n)‖2 =

√‖h‖2 − ‖A(n)‖2F

Singular Values 0 ≤ µi − σ(n)i ≤ ∆(n) & σ

(n)i ≤ σ(n+1)

i ≤ µiSingular Vectors are orthonormal and for µi 6= µi+1

max{‖ui − u(n)i ‖

2, ‖vi − v(n)i ‖

2} ≤ 2∆(n)

µi − µi+1

Coefficients gi =< g(s), ui(s) > and g(n)i =< g(s), u

(n)i >

|gi − g(n)i | ≤

√2∆(n)

µi − µi+1‖g‖

Impact on the Solution

Continuous Solution of Integral Equation for f(t) =∑

i fivi(t)

f(t) =∑µi 6=0

< ui(s), g(s) >

µivi(t) =

∑µi 6=0

giµivi(t)

Picard Condition f is square integrable if

‖f‖22 =

∫ 1

0f(t)2 dt =

∞∑µi 6=0

(< ui(s), g(s) >

µi

)2

<∞

Coefficients Define g(n)i =< g(s), ψ

(n)i (s) > to be entries in

vector g(n)

Observation Investigate Picard condition numerically

gi =< ui(s), g(s) >≈< u(n)i (s), g(s) >= (u

(n)i )T g(n)

Numerical Solution inherits ill-posedness of continuous

Discrete Solution

Samples of g(s) give g(n)i =< g(s), ψ

(n)i (s) >

Integration of kernel gives A(n)ij = a

(n)ij

a(n)ij =< h(si, t), φ

(n)j (t) > i, j = 1, . . . , n

SVD of A(n) yields discrete solution for coefficients f(n)j ≈ f (n)

j

f (n) =

n∑i=1

(u(n)i )T g(n)

σ(n)i

v(n)i

Samples for f(t) are obtained from

f(t) =∑j

f(n)j φj(t)

Regularized Discrete Solution as a function of sample size n

Filtering using filter factor

qi(λ(n)) =

(σ(n)i )2

(σ(n)i )2 + (λ(n))2

Tikhonov Regularization with regularization parameter λ:

f (n)(λ(n)) =

n∑i=1

qi(λ(n))

(u(n)i )T g(n)

σ(n)i

v(n)i

Use SVD-SVE convergence to find λ(N) for N >> n, from λ(n).

Truncating the expansion

Theorem (Numerical Rank)For {p(n)} dependent on ε, such that µJ+1 ≤ µ1ε, thenlimn→∞

p(n) = J := p∗.

Hence

f (n)(λ(n)) ≈p(n)∑i=1

qi(λ(n))

(u(n)i )T g(n)

σ(n)i

v(n)i

Number of significant terms converges with n

Practicality to calculate A and g: Top Hat Functions ∆t = ∆s := ∆

Basis Functions φi = ψi = ρi where ρi is the top hat

ρi(t) =

{1√∆

t ∈ [(i− 1)∆, i∆]

0 otherwise

Integrate

A(n)ij =

1

∆

∫ i∆

(i−1)∆

∫ j∆

(j−1)∆h(s, t)dtds

g(n)i =

1√∆

∫ i∆

(i−1)∆g(s)ds

Quadrature mid point A(n)ij = ∆h(si, tj) g

(n)i =

√∆ g(si)

Solution Samples f(tj) =f(n)j√∆

Equivalently Given samples g(n)i = g(si), estimate g

(n)i , H(n)

ij ,

find coefficients f(n)j , hence samples

f(tj) = f(n)j /√

∆

Sampling Dependent Regularization Parameter

Regularized Solution

f (n)(λ(n)) =

n∑i=1

(σ(n)i )2

(σ(n)i )2 + (λ(n))2

(u(n)i )T g(n)

σ(n)i

v(n)i

Tiknonov Formulation

f (n)(λ(n)) = argminf∈Rn

{‖A(n)f (n) − g(n)‖2 + (λ(n))2‖f (n)‖2}

Regularization on coefficients ‖ ˆf (n)‖2 regularizes coefficientsRegularization on samples (λ(n))2‖f (n)‖2 = ∆(n)(λ(n))2‖f (n)‖2

Noise on Reconstruction want variance s2f in reconstructed

samples f(tj) independent of samplingSample Independent Regularization parameter

1/sf = λf =√

∆(n)λ(n)

Relates N >> n:√

∆(n)λ(n) =√

∆(N)λ(N)

Relating to Methods of Parameter Estimation: Example χ2

Solve (dropping dependence on n)

χ(λ) = F (λ)− n =

n∑i=1

λ2

σ2i + λ2

(uTi g)2 − n = 0

Numerical Rank p(n): suppose p(n) = max {J} such thatσJ > σ1ε

F (λ) =

p(n)∑i=1

λ2

σ2i + λ2

(uTi g)2 +

n∑i=p(n)+1

(uTi g)2

Weighting is applied such that we may assume

E

n∑i=p(n)+1

(uTi g)2

= n− p(n)

Solve reduced problem:

χ(p(n))(λ(n)) =

p(n)∑i=1

(λ(n))2

(σ(n)i )2 + (λ(n))2

((u(n)i )T g(n))2 − p(n) = 0

Comments on weighting and number of terms

Numerical Rank p(n) determines the number of significantterms, that are uncontaminated by numerical noise

Convergence of the SVD to the SVE allows convergence ofestimates from M(λ), U(λ) and χ(λ)

Convergence speed depends on the estimate ∆(n) whichmeasures the distance between the continuousand discrete norm of the kernel.

Downsample to find λ(n) then find

λ(N) =

√∆(n)

∆(N)λ(n)

Example of convergence for ∆: Regularization Tools Gravity

Kernel conditioning depends on d

h(s, t) =d

(d2 + (s− t)2)3/2,

f(t) = sin (πt) + .5 sin (2πt) ,

Exactly

||h||22 =

∫ 1

0

∫ 1

0

d2

(d2 + (s− t)2)3 ds dt =3 arctan

(1d

)+ d

d2+1

4d3,

For d = .25, ||h||22 ≈ 67.404 and for d = .5, ||h||22 ≈ 7.443.

Convergence of ∆(n)

Figure: −∆(n)2 against n. d = .25 d = .5.

Simulations

1. Set up large scale matrix for size N . Set noise level η.2. Set up the right hand side sample data for size N . Add

noise: g = g + η randn(size(g, 50)) max(|g|).3. Downsample at rate r: ∆(N) = 1/N , ∆(n) = r∆(N)

4. A(n) = A(1 : r : end, 1 : r : end). g(n) = g(1 : r : end).

5. Scaling for coefficients g(n) =√

∆(n)g(n), A(n) = r A(n)

6. Calculate the regularization parameter for down sampled

data size n: λ(N) =√

∆(n)

∆(N)λ(n)

7. Calculate Galerkin coefficients; f (N)

8. Scale Galerkin coefficients to samples f (N) = f (N)/√

∆(N)

Example for d = 0.5, N = 1500 Noise level .01

Figure: Solutions and Data : noise level .01

Example for d = 0.5, N = 1500 Noise level .1

Figure: Solutions and Data : noise level .1

Projection Tomography, Regularization tools tomo Noise .01 UPRE

Figure: Noise level .01 UPRE

Projection Tomography, Regularization tools tomo Noise .01 GCV

Figure: Noise level .01 GCV

Projection Tomography, Regularization tools tomo Noise .01 χ

Figure: Noise level .01 χ2

Summary for SVE-SVD

Kernel must be square integrableDecay of singular values must be fast enough that

numerical rank p∗ << N for problem of size NRegularization parameter is then found for p∗ ≤ n << N

Partial SVD is calculated for large scale problem.Solution for problem of size N is found dependent on

finding p∗, and λ(n) from problem of size n.Cost Considerable computational savings.

Large Scale Problems : without square integrability

The Iterative SchemeLSQR Let β1 := ‖b‖2, and e

(t+1)1 first column of It+1

Generate, lower bidiagonal Bt ∈ R(t+1)×t, columnorthonormal Ht+1 ∈ Rm×(t+1) , Gt ∈ Rn×t

AGt = Ht+1Bt, β1Ht+1e(t+1)1 = b.

Projected Problem on projected space:

wt(ζ) = argminw∈Rt

{‖Btw − β1e(t+1)1 ‖22 + ζ2‖w‖22}.

Projected Solution depends on ζopt

xt = Gtwt(ζopt)

Generally: ζopt 6= λopt

Regularization of the LSQR solution: Questions

(i) Determine optimal t The choice of the subspace impacts theregularizing properties of the iteration: For large tnoise due to numerical precision and error entersthe projected space.

(ii) Determine optimal ζ How do regularization parametertechniques translate to the projected problem?

(iii) Relation optimal ζ and optimal λ Given t how well doesoptimal ζ for projected space yield optimal λ for fullspace, or when is this the case?

Needed Properties and Definitions:

Interlace Properties Singular values, γi, of Bt, σi of A, interlace

σ1 ≥ γ1 ≥ σ2 · · · ≥ γt ≥ σt+1 ≥ 0.

Residuals Full, rfull(xt), and projected, rproj(wt),

rfull(xt) = Axt − b = AGtwt − β1Ht+1e(t+1)1

= Ht+1(Btwt − β1e(t+1)1 ) = Ht+1rproj(wt).

Pseudoinverse Use A†(λ) for pseudo inverse of [A;λI], then

wt(ζ) = β1(BtTBt + ζ2It)

−1BtTe

(t+1)1 = β1B

†t (ζ)e

(t+1)1

= (GtTATAGt + ζ2It)

−1GtTATb = (AGt)

†(ζ)b.

Influence A(λ) = AA†(λ) for the influence matrix, likewise(AGt)(ζ) = AGt(AGt)

†(ζ).

Calculating Unbiased Predictive Risk using wt(λ)[RVA15]

Full problem

λopt = argminλ{Ufull(λ)} = argmin

λ{‖rfull(x(λ))‖22 + 2 Tr(A(λ))−m}.

Using the projected solution for parameter λ andTr ((AGt)(λ)) = Tr (Bt(λ))

Ufull(λ) = ‖ ((AGt)(λ)− Im)b‖22 + 2 Tr ((AGt)(λ))−m= ‖β1(Bt(λ)− It+1)et+1

1 ‖22 + 2 Tr(Bt(λ))−m

λopt for Ufull(λ) can be estimated given projected SVD

Deriving UPRE for the projected problem

Is λopt relevant to ζopt for the projected problem?

Noise in the right hand side For b = btrue + η, η ∼ N(0, Im)

β1et+11 = HT

t+1b = HTt+1btrue +HT

t+1η.

Noise in projected right hand side β1et+11 , satisfies

HTt+1η ∼ N(0, It+1)

Immediately

Uproj(ζ) = ‖β1(Bt(ζ)− It+1)e(t+1)1 ‖22 + 2 Tr(Bt(ζ))− (t+ 1)

= Ufull(ζ) +m− (t+ 1).

Minimizer of Uproj(ζ) is minimizer of Ufull(ζ)

ζopt calculated for projected problem may not yield λopt on full problem

ζopt depends on t, λopt depends on m∗ =: min(m,n)

Trace Relations By linearity and cycling.

Tr(A(λ)) = Tr(A(ATA+ λ2In)−1AT ) = n− λ2Tr((ATA+ λ2In)−1)

= m∗ − λ2m∗∑i=1

(σ2i + λ2)−1.

Immediately Tr(Bt(ζ)) = t− ζ2∑t

i=1(γ2i + ζ2)−1.

Interlacing For σi ≈ γi, 1 ≤ i ≤ t, σ2i /(σ

2i + λ2) ≈ 0, i > t,

Tr(A(λ)) = t− λ2t∑i=1

(σ2i + λ2)−1 + (m∗ − t)− λ2

m∗∑i=t+1

(σ2i + λ2)−1

≈ Tr(Bt(λ)) + (m∗ − t)− λ2m∗∑

i=t+1

(σ2i + λ2)−1 ≈ Tr(Bt(λ)).

If t approx numerical rank A, ζopt ≈ λopt

Other Estimation Techniques for the Projected Problem

GCV: [CNO08] weighted GCV is introduced for ω > 0.

Gproj(ζ, ω) =‖rproj(wt(ζ))‖22

(Tr(ωBt(ζ)− It+1))2 , G(λ) = Gproj(λ, 1).

Optimal Analysing as for UPRE: ω = t+1m < 1.

Discrepancy Principle Seek λ such that ‖rfull(x(λ))‖22 = δ ≈ m.To avoid over smoothing: δ = υm, υ > 1

Discrepancy for the Projected Problem Seek ζ such that

‖rproj(wt(ζ))‖22 ≈ δproj = υ(t+ 1).

We do not obtain in these cases ζopt ≈ λopt

Identifying optimal subspace size t

Noise revealing function: [HPS09] suppose θj and βj ondiagonal and sub diagonal of Bt

ρ(t) =

t∏j=1

(θj/βj+1)

Optimal t is given by (for user determined tmin)

topt−ρ = min{argmaxt>tmin

(ρ(t))}+ step

step= 2 is to assure that noise has entered the entries inρ(t) and hence the basis.

tmin is chosen based on examination of ρ(t).

Only useful if discrete Picard condition holds [HPS09].

Identifying optimal subspace size t:

Minimization of the GCV for the truncated SVD of Bt∗ [CKO15]Projected subspace size is defined to be t∗

G(t) =t∗

(t∗ − t)2

t∗∑t+1

|uTi b|2.

Optimal t is given by

topt−G = argmintG(t)

Does not require Picard condition, but topt−G depends on t∗

Numerical Illustrations 1d Underdetermined Cases

Regularization Tools phillips: Picard condition not satisfied,shaw: severely ill-posed, and gravity -d = .75severely ill-posed, d = .25 less severe.

Noise levels SNR approx −10 log 10(η√m) for noise level η.

Underdetermined m = 152 and n = 304. 50% undersampling.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-2

0

2

4

6

8

10 Typical Noisy Data

Example 1 Example 2 Example 3 Example 4 Example 5 True

(a) phillips,η = .005

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1

0

1

2

3

4

5

6

7

8 Typical Noisy Data


(b) gravity, d =.25, η = .005

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-0.5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5 Typical Noisy Data


(c) shaw, η = .005

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3 Typical Noisy Data


(d) gravity, d =.75, η = .005

Figure: Illustrative test data high noise η = .005 for 5 sample righthand side data. Exact data are solid lines

Significance of Reorthogonalization: Clustering of singular values

10 20 30 40 50 60 7010

-3

10-2

10-1

100

101

102

Bt , t = 1

Bt , t = 11

Bt , t = 21

Bt , t = 31

Bt , t = 41

Bt , t = 51

Bt , t = 61

Bt , t = 71

A

(a) phillips: R

10 20 30 40 50 60 70

10-14

10-12

10-10

10-8

10-6

10-4

10-2

100

102

Bt , t = 1

Bt , t = 11

Bt , t = 21

Bt , t = 31

Bt , t = 41

Bt , t = 51

Bt , t = 61

Bt , t = 71

A

(b) gravity: Rd = .25

10 20 30 40 50 60 70

10-14

10-12

10-10

10-8

10-6

10-4

10-2

100

102

Bt , t = 1

Bt , t = 11

Bt , t = 21

Bt , t = 31

Bt , t = 41

Bt , t = 51

Bt , t = 61

Bt , t = 71

A

(c) shaw: R

10 20 30 40 50 60 70

10-14

10-12

10-10

10-8

10-6

10-4

10-2

100

102

Bt , t = 1

Bt , t = 11

Bt , t = 21

Bt , t = 31

Bt , t = 41

Bt , t = 51

Bt , t = 61

Bt , t = 71

A

(d) gravity: Rd = .75

10 20 30 40 50 60 7010

-3

10-2

10-1

100

101

102

Bt , t = 1

Bt , t = 11

Bt , t = 21

Bt , t = 31

Bt , t = 41

Bt , t = 51

Bt , t = 61

Bt , t = 71

A

(e) phillips:NR

10 20 30 40 50 60 70

10-14

10-12

10-10

10-8

10-6

10-4

10-2

100

102

Bt , t = 1

Bt , t = 11

Bt , t = 21

Bt , t = 31

Bt , t = 41

Bt , t = 51

Bt , t = 61

Bt , t = 71

A

(f) gravity: NRd = .25

10 20 30 40 50 60 70

10-14

10-12

10-10

10-8

10-6

10-4

10-2

100

102

Bt , t = 1

Bt , t = 11

Bt , t = 21

Bt , t = 31

Bt , t = 41

Bt , t = 51

Bt , t = 61

Bt , t = 71

A

(g) shaw: NR

10 20 30 40 50 60 70

10-14

10-12

10-10

10-8

10-6

10-4

10-2

100

102

Bt , t = 1

Bt , t = 11

Bt , t = 21

Bt , t = 31

Bt , t = 41

Bt , t = 51

Bt , t = 61

Bt , t = 71

A

(h) gravity: NRd = .75

Figure: Singular values against index for Bt, increasing t compared toA. With and without reorthogonalization (R) and (NR) in 8(a)-8(b) and8(e)-8(f), resp.. Notice clustering of spectral values without highaccuracy reorthogonalization.

Significance of reorthogonalization: Estimating t

10 20 30 40 50 60 7010

0

101

102

Example 1 topt = 5 Example 2 topt = 5 Example 4 topt = 5 Example 5 topt = 5 Example 5 topt = 5

(a) phillips R

10 20 30 40 50 60 7010

-2

100

102

104

106

108

1010


(b) gravity Rd = .25

10 20 30 40 50 60 7010

-2

100

102

104

106

108

1010

1012

1014

1016

1018


(c) shaw R

10 20 30 40 50 60 7010

-5

100

105

1010

1015

1020

1025

1030


(d) gravity Rd = .75

10 20 30 40 50 60 7010

-3

10-2

10-1

100

101

102


(e) phillips NR

10 20 30 40 50 60 7010

-5

10-4

10-3

10-2

10-1

100

101

102


(f) gravity NRd = .25

10 20 30 40 50 60 7010

-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

101

102


(g) shaw NR

10 20 30 40 50 60 7010

-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

101

102


(h) gravity NRd = .75

Figure: Noise revealing function ρ(t).

Average Relative Error over 50 samples: Reorthogonalized

t

10 20 30 40 50 60 70

mean

0.0630

0.0794

10-1

0.1260

0.1587

0.2

0.2520

0.3175

0.4Relative Error

Min

MDP

UPRE

GCV

WGCV

PMDP

(a) phillips

t

10 20 30 40 50 60 70

mean

0.0630

0.0794

10-1

0.1260

0.1587

0.2

0.2520

0.3175

0.4Relative Error

Min

MDP

UPRE

GCV

WGCV

PMDP

(b) gravity, d = .25

t

10 20 30 40 50 60 70

mean

0.14

0.1573

0.1768

0.1987

0.2232

0.2509

0.2819

0.3168

0.3560

0.4Relative Error

Min

MDP

UPRE

GCV

WGCV

PMDP

(c) shaw

t

10 20 30 40 50 60 70

mean

0.15

0.1715

0.1960

0.2241

0.2561

0.2928

0.3347

0.3826

0.4374

0.5Relative Error

Min

MDP

UPRE

GCV

WGCV

PMDP

(d) gravity, d = .75

Figure:MinimumMDP UPREGCV WGCVPMDP

UPRE and WGCV may outperform GCV for small t

Observations

I For the most severe case: gravity d = .75 UPRE andWGCV yield optimal results

I PMDP gives good results small t but blows up.I For these examples optimal solutions live on a small

projected space.I Noise levels are high compared to noise revealing paper

HPS.

Two dimensional image deblurring [NPP04] Problem size 256× 256

(a) True (b) Data (c) PSF

(d) True (e) Data (f) PSF

Figure: Data for grain and satellite images with blur by the given pointspread function and noise level 10%.

Noise Revealing Function ρ(t): comparing topt−ρ, topt−G , topt−min

10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

7;(t)t = 29t = 21t = 27

(a) grain

20 40 60 80 100 120 1400

0.5

1

1.5

2

2.5

3

3.5

4

4.5;(t)t = 27t = 71t = 69

(b) satellite

Figure: ρ(t) using tmin = 25. Dashed-dot topt−ρ, magenta topt−G andblack topt−min, location of minimum for ρ(t) plus step.

Evaluating Image Quality : Relative error

t10 15 20 25 30 35 40 45 50

0.4

0.45

0.5

0.55

0.6

0.65

0.7Relative Error

MinMDPUPREGCVWGCVPMDPProj

(a) grain RE

t10 20 30 40 50 60 70

0.4

0.45

0.5

0.55

0.6

0.65

0.7Relative Error

MinMDPUPREGCVWGCVPMDPProj

(b) satellite RE

Figure: Relative error (RE) with increasing t. Solid line in each caseis solution with projection and without regularization.

UPRE, WGCV and PMDP outperform GCV

Solutions for different topt: (MIN, topt−min, topt−G , topt−ρ) Noise level 10%

(a) 22 (b) 21 (c) 27 (d) 29

(e) 42 (f) 71 (g) 69 (h) 27

Figure: UPRE to find ζ and comparing to solutions obtained fortopt−ρ, topt−min and topt−G as compared to solution with minimumerror, MIN.

Solutions inadequate

Iteratively Reweighted Regularization [LK83]

Minimum Support Stabilizer Regularization operator L(k).

(L(k))ii = ((x(k−1)i − x

(k−2)i )2 + β2)−1/2 β > 0

Focusing parameter β ensures L(k) invertibleInitialization L(0) = I. x(0) = x0. (might be 0)Invertibility use (L(k))−1 as right preconditioner for A

(L(k))−1ii = ((x

(k−1)i − x

(k−2)i )2 + β2)1/2 β > 0

Reduced System When β = 0 and x(k−1)i = x

(k−2)i remove

column i, matrix is A.Update Equation Solve Ay ≈ r = b−Ax(k−1). With correct

indexing set yi = yi if updated, else yi = 0.

x(k) = x(k−1) + y

Effectively treats iteration as time integration to stability

Solutions topt after two steps IRR: (MIN, topt−min, topt−G , topt−ρ)

(a) 19 k = 3 (b) 21 (c) 27 (d) 29

(e) 35 (f) 71 (g) 69 (h) 27

Figure: IRR k = 2 Grain k = 2 MIN solution is at topt−min, show k = 3.

Solutions are stabilized less dependent on t

Noise revealing function ρ(t) with k and relative error with k: 5% error

0 20 40 60 80 1000.35

0.4

0.45

0.5

0.55

0.6

0.65

0

1

2

3

4

(a) Grain RE

5 10 15 200

0.5

1

1.5

2

t = 21

t = 44

t = 27

5 10 15 200

0.05

0.1

0.15

0.2

0.25

t = 21

t = 44

t = 27

5 10 15 200

0.05

0.1

0.15

0.2

t = 21

t = 44

t = 27

5 10 15 200.02

0.04

0.06

0.08

0.1

0.12 t = 21

t = 44

t = 27

(b) Grain ρ(t)

0 50 100 1500.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.80

1

2

3

(c) Satellite

20 40 600.01

0.02

0.03

0.04

0.05

0.06

t = 71

t = 42

t = 69

5 10 15 20 250

0.01

0.02

0.03

t = 71

t = 42

t = 69

5 10 15 20 250

0.005

0.01

0.015

0.02

t = 71

t = 42

t = 69

5 10 15 20 250

0.002

0.004

0.006

0.008

0.01

0.012

t = 71

t = 42

t = 69

(d) Satellite

Figure: Determining topt with k for 5%noise using ρ(t). Dashed-dot topt−ρ,magenta topt−G , black topt−min.

Stopping criteria.I Errors decrease

initially with k andthen increase.

I Pick k look at ρ(t).I Grain k = 4 noise

enters, use k = 2.I Satellite k = 3 noise

enters, use k = 1.

Sparse tomographic reconstruction: walnut [HHK+15, HKK+13]

Projection Problem Resolution of data 164× 120.Downsampling 50%, 25%, eg m = 164× 60, m = 164× 30

Resolution Full problem is 164× 164

5 10 15 20 25 300

50

100

150

200

250120

60

30

15

(a) ρ(t) for increasing sparsity

ρ(t) quite consistent for small t and m.

Solutions at topt−ρ = 8

(b) True (c) UPRE:k = 0 (d) UPRE:k = 2

(e) Comparison (f) GCV:k = 0 (g) GCV:k = 2

tmin = 5,topt−ρ = 8,sampling at12◦ intervals,30 projections.Comparisonfrom [HKK+13],sparsity withprior, and reso-lution 256× 256

Stablized Projection no characteristic TV blocky structure

Observations

UPRE/WGCV regularization parameter estimation explainedfor projected problem.

Underdetermined problems are also solved.Iteratively Reweighted Regularization stabilizes the projected

solutionSensitivity to choice of topt reduced by IRR

topt can be estimated using ρ(t), use topt−min asindependent of other parameters

Future extend to more realistic large scale problems

Conclusions : Large Scale Problems

Spectrum Estimate efficiently σ(n)i or γi

Effective rank t or p∗

Optimal λ(N) for large scale from small scale: λ(n) or ζopt

Computational cost applies to the smaller problemλ regularization parameter estimation determined

from smaller problem.Rank In both cases solutions depend on a numerical

rankFuture extend to more realistic large scale problems

SVE exetend to Kronecker product formulationsGSVD relate to GSVE and apply for more general

operators.

Julianne M. Chung, Misha E. Kilmer, and Dianne P. O’Leary.A framework for regularization via operator approximation.SIAM Journal on Scientific Computing, 37(2):B332–B359, 2015.

Julianne Chung, James G Nagy, and DIANNE P O’Leary.A weighted GCV method for Lanczos hybrid regularization.Electronic Transactions on Numerical Analysis, 28:149–167,2008.

P. C. Hansen.Computation of the singular value expansion.Computing, 40:185–199, 1988.

Keijo Hamalainen, Lauri Harhanen, Aki Kallonen, Antti Kujanpaa,Esa Niemi, and Samuli Siltanen.Tomographic x-ray data of a walnut.arXiv preprint arXiv:1502.04064, 2015.

Keijo Hamalainen, Aki Kallonen, Ville Kolehmainen, MattiLassas, Kati Niinimaki, and Samuli Siltanen.Sparse tomography.

SIAM Journal on Scientific Computing, 35(3):B644–B665, 2013.

Iveta Hnetynkova, Martin Plesinger, and Zdenek Strakos.The regularizing effect of the Golub-Kahan iterativebidiagonalization and revealing the noise level in the data.BIT Numerical Mathematics, 49(4):669–696, 2009.

B. J. Last and K. Kubik.Compact gravity inversion.GEOPHYSICS, 48(6):713–721, 1983.

James G. Nagy, Katrina Palmer, and Lisa Perrone.Iterative methods for image deblurring: A Matlab object-orientedapproach.Numerical Algorithms, 36(1):73–93, 2004.

R. A. Renaut, S. Vatankhah, and V. E. Ardestani.Hybrid and iteratively reweighted regularization by unbiasedpredictive risk and weighted gcv for projected systems, 2015.submitted and at: http://arxiv.org/abs/1509.00096.

http://arxiv.org/abs/1509.00096

techniques for solution of large scale inverse problems ...rosie/mypresentations/gsu2015.pdf ·...

Documents