techniques for solution of large scale inverse problems ...rosie/mypresentations/gsu2015.pdf ·...
TRANSCRIPT
Techniques for Solution of Large ScaleInverse Problems: image restoration and
reconstruction
Rosemary RenautCollaborations with
Former UGs: Jakob Hansen, Michael Horst, (ASU)Dr. Saeed Vatankhah, (Tehran)
School of Mathematical and Statistical Sciences, Arizona State University,
Seminar: Georgia State Unversity
Outline
Background: lll-posed Inverse ProblemsTikhonov Regularization for Ill-Posed ProblemsStandard Approaches to Estimate Regularization Problem
Integral Equations and the Singular Value ExpansionEstimating the SVE via the SVDNumerical Rank: TruncationImpact on Regularization Parameter EstimationNumerical Examples
Iterative Solution of the LS problem: Hybrid LSQRFinding the Regularization ParameterIdentifying the optimal SubspaceNumerical Examples
Conclusions
Solving Practical Linear Systems of Equations
Forward Problem: Given x, A calculate b
b = Ax
Inverse Problem: Given b and A find x where A is invertible.
x = A−1b
Practically Can be complicated and depends on many factors
Illustration:Forward Problem - a blurred signal
Matrix A obtained from kernel h:
h(s) =1
σ√
2πexp(
−s2
2σ2), σ > 0.
The measured data b is obtainedfrom the integral relation and b = Ax,
bi ≈ g(si), xi ≈ f(ti).
g(s) =
∫ π
−πh(s− t)f(t)dt
Inverse Problem given A, b find x: x = A−1b
The solution depends on the conditioning of A and on the noisein measurements b. Condition of A is 1.8679e+ 05 Restorationwith noise .0001.
Example of restoration of 2D image with low noise: Inverse Crime
(a) Original (b) Noisy and Blurred10−5
(c) Noisy and Blurred10−3
(d) Noisy and Blurred 10−5 (e) Noisy and Blurred 10−3
Summary: Ill-posed problem
Goals Given data b and matrix A find x
Features May also want to accurately find features from x,for example location of peaks in signal
Difficulties The solution is very sensitive to the dataIll-Posed (according to Hadamard) A problem is ill-posed if it
does not satisfy conditions for well-posedness, or1. b /∈ range(A)2. inverse is not unique because more than one
image is mapped to the same data, or3. an arbitrarily small change in b can cause an
arbitrarily large change in x.
Spectral Decomposition of the Solution: The SVD
Consider general overdetermined discrete problem
Ax = b, A ∈ Rm×n, b ∈ Rm, x ∈ Rn, m ≥ n.
Singular value decomposition (SVD) of A (full column rank)
A = UΣV T =
n∑i=1
uiσivTi , Σ = diag(σ1, . . . , σn).
gives expansion for the solution
x =n∑i=1
uTi b
σivi
ui, vi are left and right singular vectors for ASolution is weighted linear combination of basis vi
Basis Vectors
Data red and solution black
Solutions blue using truncatedexpansions
x =
k∑i=1
uTi b
σivi
The Filtered SVD - more general than truncation
The truncated SVD is a special case of spectral filtering
xfilt =
n∑i=1
qi(uTi b
σi)vi
with qi = 0, i > k. We have to find k.Spectral filtering is used to filter the components in the spectralbasis, such that noise in signal is damped.
Tikhonov Regularization qi(λ) =σ2i
σ2i +λ2
, i = 1 . . . n, λ is theregularization parameter, and solution is
x(λ) = argminx{‖b−Ax‖2 + λ2‖x‖2}
1-D Interesting but Noisy Signal
Blur with Gaussian and add noise- can we find the solution?
Solution for Increasing λ
Solutions x(λ): we have to find λ
Solution for Increasing λ
Solutions x(λ): we have to find λ
Solution for Increasing λ
Solutions x(λ): we have to find λ
Solution for Increasing λ
Solutions x(λ): we have to find λ
Solution for Increasing λ
Solutions x(λ): we have to find λ
Solution for Increasing λ
Solutions x(λ): we have to find λ
Solution for Increasing λ
Solutions x(λ): we have to find λ
Solution for Increasing λ
Solutions x(λ): we have to find λ
Solution for Increasing λ
Solutions x(λ): we have to find λ
Solution for Increasing λ
Solutions x(λ): we have to find λ
Solution for Increasing λ
Solutions x(λ): we have to find λ
Solution for Increasing λ
Solutions x(λ): we have to find λ
Solution for Increasing λ
Solutions x(λ): we have to find λ
General Formulation Tikhonov Regularization for Ill-Posed Problems
Ill-Posed Equations in the presence of noise
Ax ≈ b A ∈ Rm×n
b = btrue + η, noise η ∼ N(0, Cη)
Tikhonov Regularization:
x = argminx∈Rn
{‖Ax− b‖2Wη+ λ2‖L(x− x0)‖22}
Mapping L defines basis for xPrior x0
Weighting Wη = C−1η , ‖y‖Wη = yTWηy. Whitens noise in b.
Requires automatic estimation of λ
Regularization Parameter Estimation using the SVD: Examples (m = n)
For L invertible, and SVD W1/2η AL−1 = UΣV T , Σ = diag(σi).
Find λopt fromUnbiased Predictive Risk Minimize functional
U(λ) =
n∑i=1
(λ2
σ2i + λ2
)2
(uTi b)2 + 2
n∑i=1
σ2i
σ2i + λ2
Morozov Discrepancy Principle Given parameter ν, solve
M(λ) =
n∑i=1
(λ2
σ2i + λ2
)2
(uTi b)2 − νn = 0
χ2 principle Find root of functional
χ(λ) =
n∑i=1
λ2
σ2i + λ2
(uTi b)2 − n = 0
GCV : Minimize rational function
G(λ) =
(n∑i=1
(λ2
σ2i + λ2
)2
(uTi b)2
)(n∑i=1
λ2
σ2i + λ2
)−2
Not practical for large scale problems
Large Scale Parameter Estimation
Expensive Cost of forming SVD (GSVD)UPRE, GCV Minimize nonlinear functional no unique minimum
MDP, χ2 Root finding. Require good estimate of noise Cη
Lcurve Requires many solutions to find the L-curveAccuracy For large scale SVD and GSVD are contaminated
by numerical noise
Can we use different approach for large scale problem?
Two directions
Integral Equation and Square Integrable Kernels
Integral equation
g(s) =
∫ 1
0h(s, t)f(t)dt.
Square integrable kernel
‖h‖22 =
∫ 1
0
∫ 1
0h(s, t)2 ds dt is finite.
Singular Value Expansion
h(s, t) =
∞∑j=1
µjuj(s)vj(t)
Singular Values µi, µ1 ≥ µ2 ≥ · · · ≥ 0.∑∞
i=1 µ2i is bounded
Singular Functions ui(s), vi(t) orthonormal w.r.t. inner product< φ,ψ >=
∫ 10 φ(t)ψ(t)dt
Basis for L2(0, 1) For f, g ∈ L2(0, 1):
f(t) =∑i
< vi, f > vi(t) g(s) =∑i
< ui, g > ui(s)
Galerkin Method to Approximate SVE by SVD [Han88]
1. Choose orthonormal basis functions ψ(n)i (s) and φ(n)
j (t).
2. Calculate matrix A(n) entries
a(n)ij =< h(si, t), φ
(n)j (t) > i, j = 1, . . . , n
3. Compute SVD of A(n) = U (n)Σ(n)(V (n))T ;
Σ(n) = diag(σ(n)i ) U (n) = (u
(n)ij ) V (n) = (v
(n)ij )
4. Define for j = 1 : n
u(n)j (s) :=
n∑i=1
u(n)ij ψi(s), v
(n)j (t) :=
n∑i=1
v(n)ij φi(t)
5. Then µj ≈ σj , uj(s) ≈ uj(s) and vj(t) ≈ vj(t)
Discrete SVD provides SVE of degenerate kernel
Theorem [Han88] σ(n)j , u(n)
j , v(n)j are exact singular values and
functions of degenerate kernel
h(n)(s, t) :=
n∑i=1
n∑j=1
h(n)ij ψ
(n)i (s)φ
(n)j (t)
Approximation Properties
Kernel ∆(n) =√‖h− h(n)‖2 =
√‖h‖2 − ‖A(n)‖2F
Singular Values 0 ≤ µi − σ(n)i ≤ ∆(n) & σ
(n)i ≤ σ(n+1)
i ≤ µiSingular Vectors are orthonormal and for µi 6= µi+1
max{‖ui − u(n)i ‖
2, ‖vi − v(n)i ‖
2} ≤ 2∆(n)
µi − µi+1
Coefficients gi =< g(s), ui(s) > and g(n)i =< g(s), u
(n)i >
|gi − g(n)i | ≤
√2∆(n)
µi − µi+1‖g‖
Impact on the Solution
Continuous Solution of Integral Equation for f(t) =∑
i fivi(t)
f(t) =∑µi 6=0
< ui(s), g(s) >
µivi(t) =
∑µi 6=0
giµivi(t)
Picard Condition f is square integrable if
‖f‖22 =
∫ 1
0f(t)2 dt =
∞∑µi 6=0
(< ui(s), g(s) >
µi
)2
<∞
Coefficients Define g(n)i =< g(s), ψ
(n)i (s) > to be entries in
vector g(n)
Observation Investigate Picard condition numerically
gi =< ui(s), g(s) >≈< u(n)i (s), g(s) >= (u
(n)i )T g(n)
Numerical Solution inherits ill-posedness of continuous
Discrete Solution
Samples of g(s) give g(n)i =< g(s), ψ
(n)i (s) >
Integration of kernel gives A(n)ij = a
(n)ij
a(n)ij =< h(si, t), φ
(n)j (t) > i, j = 1, . . . , n
SVD of A(n) yields discrete solution for coefficients f(n)j ≈ f (n)
j
f (n) =
n∑i=1
(u(n)i )T g(n)
σ(n)i
v(n)i
Samples for f(t) are obtained from
f(t) =∑j
f(n)j φj(t)
Regularized Discrete Solution as a function of sample size n
Filtering using filter factor
qi(λ(n)) =
(σ(n)i )2
(σ(n)i )2 + (λ(n))2
Tikhonov Regularization with regularization parameter λ:
f (n)(λ(n)) =
n∑i=1
qi(λ(n))
(u(n)i )T g(n)
σ(n)i
v(n)i
Use SVD-SVE convergence to find λ(N) for N >> n, from λ(n).
Truncating the expansion
Theorem (Numerical Rank)For {p(n)} dependent on ε, such that µJ+1 ≤ µ1ε, thenlimn→∞
p(n) = J := p∗.
Hence
f (n)(λ(n)) ≈p(n)∑i=1
qi(λ(n))
(u(n)i )T g(n)
σ(n)i
v(n)i
Number of significant terms converges with n
Practicality to calculate A and g: Top Hat Functions ∆t = ∆s := ∆
Basis Functions φi = ψi = ρi where ρi is the top hat
ρi(t) =
{1√∆
t ∈ [(i− 1)∆, i∆]
0 otherwise
Integrate
A(n)ij =
1
∆
∫ i∆
(i−1)∆
∫ j∆
(j−1)∆h(s, t)dtds
g(n)i =
1√∆
∫ i∆
(i−1)∆g(s)ds
Quadrature mid point A(n)ij = ∆h(si, tj) g
(n)i =
√∆ g(si)
Solution Samples f(tj) =f(n)j√∆
Equivalently Given samples g(n)i = g(si), estimate g
(n)i , H(n)
ij ,
find coefficients f(n)j , hence samples
f(tj) = f(n)j /√
∆
Sampling Dependent Regularization Parameter
Regularized Solution
f (n)(λ(n)) =
n∑i=1
(σ(n)i )2
(σ(n)i )2 + (λ(n))2
(u(n)i )T g(n)
σ(n)i
v(n)i
Tiknonov Formulation
f (n)(λ(n)) = argminf∈Rn
{‖A(n)f (n) − g(n)‖2 + (λ(n))2‖f (n)‖2}
Regularization on coefficients ‖ ˆf (n)‖2 regularizes coefficientsRegularization on samples (λ(n))2‖f (n)‖2 = ∆(n)(λ(n))2‖f (n)‖2
Noise on Reconstruction want variance s2f in reconstructed
samples f(tj) independent of samplingSample Independent Regularization parameter
1/sf = λf =√
∆(n)λ(n)
Relates N >> n:√
∆(n)λ(n) =√
∆(N)λ(N)
Relating to Methods of Parameter Estimation: Example χ2
Solve (dropping dependence on n)
χ(λ) = F (λ)− n =
n∑i=1
λ2
σ2i + λ2
(uTi g)2 − n = 0
Numerical Rank p(n): suppose p(n) = max {J} such thatσJ > σ1ε
F (λ) =
p(n)∑i=1
λ2
σ2i + λ2
(uTi g)2 +
n∑i=p(n)+1
(uTi g)2
Weighting is applied such that we may assume
E
n∑i=p(n)+1
(uTi g)2
= n− p(n)
Solve reduced problem:
χ(p(n))(λ(n)) =
p(n)∑i=1
(λ(n))2
(σ(n)i )2 + (λ(n))2
((u(n)i )T g(n))2 − p(n) = 0
Comments on weighting and number of terms
Numerical Rank p(n) determines the number of significantterms, that are uncontaminated by numerical noise
Convergence of the SVD to the SVE allows convergence ofestimates from M(λ), U(λ) and χ(λ)
Convergence speed depends on the estimate ∆(n) whichmeasures the distance between the continuousand discrete norm of the kernel.
Downsample to find λ(n) then find
λ(N) =
√∆(n)
∆(N)λ(n)
Example of convergence for ∆: Regularization Tools Gravity
Kernel conditioning depends on d
h(s, t) =d
(d2 + (s− t)2)3/2,
f(t) = sin (πt) + .5 sin (2πt) ,
Exactly
||h||22 =
∫ 1
0
∫ 1
0
d2
(d2 + (s− t)2)3 ds dt =3 arctan
(1d
)+ d
d2+1
4d3,
For d = .25, ||h||22 ≈ 67.404 and for d = .5, ||h||22 ≈ 7.443.
Convergence of ∆(n)
Figure: −∆(n)2 against n. d = .25 d = .5.
Simulations
1. Set up large scale matrix for size N . Set noise level η.2. Set up the right hand side sample data for size N . Add
noise: g = g + η randn(size(g, 50)) max(|g|).3. Downsample at rate r: ∆(N) = 1/N , ∆(n) = r∆(N)
4. A(n) = A(1 : r : end, 1 : r : end). g(n) = g(1 : r : end).
5. Scaling for coefficients g(n) =√
∆(n)g(n), A(n) = r A(n)
6. Calculate the regularization parameter for down sampled
data size n: λ(N) =√
∆(n)
∆(N)λ(n)
7. Calculate Galerkin coefficients; f (N)
8. Scale Galerkin coefficients to samples f (N) = f (N)/√
∆(N)
Example for d = 0.5, N = 1500 Noise level .01
Figure: Solutions and Data : noise level .01
Example for d = 0.5, N = 1500 Noise level .1
Figure: Solutions and Data : noise level .1
Projection Tomography, Regularization tools tomo Noise .01 UPRE
Figure: Noise level .01 UPRE
Projection Tomography, Regularization tools tomo Noise .01 GCV
Figure: Noise level .01 GCV
Projection Tomography, Regularization tools tomo Noise .01 χ
Figure: Noise level .01 χ2
Summary for SVE-SVD
Kernel must be square integrableDecay of singular values must be fast enough that
numerical rank p∗ << N for problem of size NRegularization parameter is then found for p∗ ≤ n << N
Partial SVD is calculated for large scale problem.Solution for problem of size N is found dependent on
finding p∗, and λ(n) from problem of size n.Cost Considerable computational savings.
Large Scale Problems : without square integrability
The Iterative SchemeLSQR Let β1 := ‖b‖2, and e
(t+1)1 first column of It+1
Generate, lower bidiagonal Bt ∈ R(t+1)×t, columnorthonormal Ht+1 ∈ Rm×(t+1) , Gt ∈ Rn×t
AGt = Ht+1Bt, β1Ht+1e(t+1)1 = b.
Projected Problem on projected space:
wt(ζ) = argminw∈Rt
{‖Btw − β1e(t+1)1 ‖22 + ζ2‖w‖22}.
Projected Solution depends on ζopt
xt = Gtwt(ζopt)
Generally: ζopt 6= λopt
Regularization of the LSQR solution: Questions
(i) Determine optimal t The choice of the subspace impacts theregularizing properties of the iteration: For large tnoise due to numerical precision and error entersthe projected space.
(ii) Determine optimal ζ How do regularization parametertechniques translate to the projected problem?
(iii) Relation optimal ζ and optimal λ Given t how well doesoptimal ζ for projected space yield optimal λ for fullspace, or when is this the case?
Needed Properties and Definitions:
Interlace Properties Singular values, γi, of Bt, σi of A, interlace
σ1 ≥ γ1 ≥ σ2 · · · ≥ γt ≥ σt+1 ≥ 0.
Residuals Full, rfull(xt), and projected, rproj(wt),
rfull(xt) = Axt − b = AGtwt − β1Ht+1e(t+1)1
= Ht+1(Btwt − β1e(t+1)1 ) = Ht+1rproj(wt).
Pseudoinverse Use A†(λ) for pseudo inverse of [A;λI], then
wt(ζ) = β1(BtTBt + ζ2It)
−1BtTe
(t+1)1 = β1B
†t (ζ)e
(t+1)1
= (GtTATAGt + ζ2It)
−1GtTATb = (AGt)
†(ζ)b.
Influence A(λ) = AA†(λ) for the influence matrix, likewise(AGt)(ζ) = AGt(AGt)
†(ζ).
Calculating Unbiased Predictive Risk using wt(λ)[RVA15]
Full problem
λopt = argminλ{Ufull(λ)} = argmin
λ{‖rfull(x(λ))‖22 + 2 Tr(A(λ))−m}.
Using the projected solution for parameter λ andTr ((AGt)(λ)) = Tr (Bt(λ))
Ufull(λ) = ‖ ((AGt)(λ)− Im)b‖22 + 2 Tr ((AGt)(λ))−m= ‖β1(Bt(λ)− It+1)et+1
1 ‖22 + 2 Tr(Bt(λ))−m
λopt for Ufull(λ) can be estimated given projected SVD
Deriving UPRE for the projected problem
Is λopt relevant to ζopt for the projected problem?
Noise in the right hand side For b = btrue + η, η ∼ N(0, Im)
β1et+11 = HT
t+1b = HTt+1btrue +HT
t+1η.
Noise in projected right hand side β1et+11 , satisfies
HTt+1η ∼ N(0, It+1)
Immediately
Uproj(ζ) = ‖β1(Bt(ζ)− It+1)e(t+1)1 ‖22 + 2 Tr(Bt(ζ))− (t+ 1)
= Ufull(ζ) +m− (t+ 1).
Minimizer of Uproj(ζ) is minimizer of Ufull(ζ)
ζopt calculated for projected problem may not yield λopt on full problem
ζopt depends on t, λopt depends on m∗ =: min(m,n)
Trace Relations By linearity and cycling.
Tr(A(λ)) = Tr(A(ATA+ λ2In)−1AT ) = n− λ2Tr((ATA+ λ2In)−1)
= m∗ − λ2m∗∑i=1
(σ2i + λ2)−1.
Immediately Tr(Bt(ζ)) = t− ζ2∑t
i=1(γ2i + ζ2)−1.
Interlacing For σi ≈ γi, 1 ≤ i ≤ t, σ2i /(σ
2i + λ2) ≈ 0, i > t,
Tr(A(λ)) = t− λ2t∑i=1
(σ2i + λ2)−1 + (m∗ − t)− λ2
m∗∑i=t+1
(σ2i + λ2)−1
≈ Tr(Bt(λ)) + (m∗ − t)− λ2m∗∑
i=t+1
(σ2i + λ2)−1 ≈ Tr(Bt(λ)).
If t approx numerical rank A, ζopt ≈ λopt
Other Estimation Techniques for the Projected Problem
GCV: [CNO08] weighted GCV is introduced for ω > 0.
Gproj(ζ, ω) =‖rproj(wt(ζ))‖22
(Tr(ωBt(ζ)− It+1))2 , G(λ) = Gproj(λ, 1).
Optimal Analysing as for UPRE: ω = t+1m < 1.
Discrepancy Principle Seek λ such that ‖rfull(x(λ))‖22 = δ ≈ m.To avoid over smoothing: δ = υm, υ > 1
Discrepancy for the Projected Problem Seek ζ such that
‖rproj(wt(ζ))‖22 ≈ δproj = υ(t+ 1).
We do not obtain in these cases ζopt ≈ λopt
Identifying optimal subspace size t
Noise revealing function: [HPS09] suppose θj and βj ondiagonal and sub diagonal of Bt
ρ(t) =
t∏j=1
(θj/βj+1)
Optimal t is given by (for user determined tmin)
topt−ρ = min{argmaxt>tmin
(ρ(t))}+ step
step= 2 is to assure that noise has entered the entries inρ(t) and hence the basis.
tmin is chosen based on examination of ρ(t).
Only useful if discrete Picard condition holds [HPS09].
Identifying optimal subspace size t:
Minimization of the GCV for the truncated SVD of Bt∗ [CKO15]Projected subspace size is defined to be t∗
G(t) =t∗
(t∗ − t)2
t∗∑t+1
|uTi b|2.
Optimal t is given by
topt−G = argmintG(t)
Does not require Picard condition, but topt−G depends on t∗
Numerical Illustrations 1d Underdetermined Cases
Regularization Tools phillips: Picard condition not satisfied,shaw: severely ill-posed, and gravity -d = .75severely ill-posed, d = .25 less severe.
Noise levels SNR approx −10 log 10(η√m) for noise level η.
Underdetermined m = 152 and n = 304. 50% undersampling.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-2
0
2
4
6
8
10 Typical Noisy Data
Example 1 Example 2 Example 3 Example 4 Example 5 True
(a) phillips,η = .005
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1
0
1
2
3
4
5
6
7
8 Typical Noisy Data
Example 1 Example 2 Example 3 Example 4 Example 5 True
(b) gravity, d =.25, η = .005
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5 Typical Noisy Data
Example 1 Example 2 Example 3 Example 4 Example 5 True
(c) shaw, η = .005
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3 Typical Noisy Data
Example 1 Example 2 Example 3 Example 4 Example 5 True
(d) gravity, d =.75, η = .005
Figure: Illustrative test data high noise η = .005 for 5 sample righthand side data. Exact data are solid lines
Significance of Reorthogonalization: Clustering of singular values
10 20 30 40 50 60 7010
-3
10-2
10-1
100
101
102
Bt , t = 1
Bt , t = 11
Bt , t = 21
Bt , t = 31
Bt , t = 41
Bt , t = 51
Bt , t = 61
Bt , t = 71
A
(a) phillips: R
10 20 30 40 50 60 70
10-14
10-12
10-10
10-8
10-6
10-4
10-2
100
102
Bt , t = 1
Bt , t = 11
Bt , t = 21
Bt , t = 31
Bt , t = 41
Bt , t = 51
Bt , t = 61
Bt , t = 71
A
(b) gravity: Rd = .25
10 20 30 40 50 60 70
10-14
10-12
10-10
10-8
10-6
10-4
10-2
100
102
Bt , t = 1
Bt , t = 11
Bt , t = 21
Bt , t = 31
Bt , t = 41
Bt , t = 51
Bt , t = 61
Bt , t = 71
A
(c) shaw: R
10 20 30 40 50 60 70
10-14
10-12
10-10
10-8
10-6
10-4
10-2
100
102
Bt , t = 1
Bt , t = 11
Bt , t = 21
Bt , t = 31
Bt , t = 41
Bt , t = 51
Bt , t = 61
Bt , t = 71
A
(d) gravity: Rd = .75
10 20 30 40 50 60 7010
-3
10-2
10-1
100
101
102
Bt , t = 1
Bt , t = 11
Bt , t = 21
Bt , t = 31
Bt , t = 41
Bt , t = 51
Bt , t = 61
Bt , t = 71
A
(e) phillips:NR
10 20 30 40 50 60 70
10-14
10-12
10-10
10-8
10-6
10-4
10-2
100
102
Bt , t = 1
Bt , t = 11
Bt , t = 21
Bt , t = 31
Bt , t = 41
Bt , t = 51
Bt , t = 61
Bt , t = 71
A
(f) gravity: NRd = .25
10 20 30 40 50 60 70
10-14
10-12
10-10
10-8
10-6
10-4
10-2
100
102
Bt , t = 1
Bt , t = 11
Bt , t = 21
Bt , t = 31
Bt , t = 41
Bt , t = 51
Bt , t = 61
Bt , t = 71
A
(g) shaw: NR
10 20 30 40 50 60 70
10-14
10-12
10-10
10-8
10-6
10-4
10-2
100
102
Bt , t = 1
Bt , t = 11
Bt , t = 21
Bt , t = 31
Bt , t = 41
Bt , t = 51
Bt , t = 61
Bt , t = 71
A
(h) gravity: NRd = .75
Figure: Singular values against index for Bt, increasing t compared toA. With and without reorthogonalization (R) and (NR) in 8(a)-8(b) and8(e)-8(f), resp.. Notice clustering of spectral values without highaccuracy reorthogonalization.
Significance of reorthogonalization: Estimating t
10 20 30 40 50 60 7010
0
101
102
Example 1 topt = 5 Example 2 topt = 5 Example 4 topt = 5 Example 5 topt = 5 Example 5 topt = 5
(a) phillips R
10 20 30 40 50 60 7010
-2
100
102
104
106
108
1010
Example 1 topt = 5 Example 2 topt = 5 Example 4 topt = 5 Example 5 topt = 5 Example 5 topt = 5
(b) gravity Rd = .25
10 20 30 40 50 60 7010
-2
100
102
104
106
108
1010
1012
1014
1016
1018
Example 1 topt = 5 Example 2 topt = 5 Example 4 topt = 5 Example 5 topt = 5 Example 5 topt = 5
(c) shaw R
10 20 30 40 50 60 7010
-5
100
105
1010
1015
1020
1025
1030
Example 1 topt = 5 Example 2 topt = 5 Example 4 topt = 5 Example 5 topt = 5 Example 5 topt = 5
(d) gravity Rd = .75
10 20 30 40 50 60 7010
-3
10-2
10-1
100
101
102
Example 1 topt = 5 Example 2 topt = 5 Example 4 topt = 5 Example 5 topt = 5 Example 5 topt = 5
(e) phillips NR
10 20 30 40 50 60 7010
-5
10-4
10-3
10-2
10-1
100
101
102
Example 1 topt = 5 Example 2 topt = 5 Example 4 topt = 5 Example 5 topt = 5 Example 5 topt = 5
(f) gravity NRd = .25
10 20 30 40 50 60 7010
-8
10-7
10-6
10-5
10-4
10-3
10-2
10-1
100
101
102
Example 1 topt = 5 Example 2 topt = 5 Example 4 topt = 5 Example 5 topt = 5 Example 5 topt = 5
(g) shaw NR
10 20 30 40 50 60 7010
-8
10-7
10-6
10-5
10-4
10-3
10-2
10-1
100
101
102
Example 1 topt = 5 Example 2 topt = 5 Example 4 topt = 5 Example 5 topt = 5 Example 5 topt = 5
(h) gravity NRd = .75
Figure: Noise revealing function ρ(t).
Average Relative Error over 50 samples: Reorthogonalized
t
10 20 30 40 50 60 70
mean
0.0630
0.0794
10-1
0.1260
0.1587
0.2
0.2520
0.3175
0.4Relative Error
Min
MDP
UPRE
GCV
WGCV
PMDP
(a) phillips
t
10 20 30 40 50 60 70
mean
0.0630
0.0794
10-1
0.1260
0.1587
0.2
0.2520
0.3175
0.4Relative Error
Min
MDP
UPRE
GCV
WGCV
PMDP
(b) gravity, d = .25
t
10 20 30 40 50 60 70
mean
0.14
0.1573
0.1768
0.1987
0.2232
0.2509
0.2819
0.3168
0.3560
0.4Relative Error
Min
MDP
UPRE
GCV
WGCV
PMDP
(c) shaw
t
10 20 30 40 50 60 70
mean
0.15
0.1715
0.1960
0.2241
0.2561
0.2928
0.3347
0.3826
0.4374
0.5Relative Error
Min
MDP
UPRE
GCV
WGCV
PMDP
(d) gravity, d = .75
Figure:MinimumMDP UPREGCV WGCVPMDP
UPRE and WGCV may outperform GCV for small t
Observations
I For the most severe case: gravity d = .75 UPRE andWGCV yield optimal results
I PMDP gives good results small t but blows up.I For these examples optimal solutions live on a small
projected space.I Noise levels are high compared to noise revealing paper
HPS.
Two dimensional image deblurring [NPP04] Problem size 256× 256
(a) True (b) Data (c) PSF
(d) True (e) Data (f) PSF
Figure: Data for grain and satellite images with blur by the given pointspread function and noise level 10%.
Noise Revealing Function ρ(t): comparing topt−ρ, topt−G , topt−min
10 20 30 40 50 60 70 80 90 1000
1
2
3
4
5
6
7;(t)t = 29t = 21t = 27
(a) grain
20 40 60 80 100 120 1400
0.5
1
1.5
2
2.5
3
3.5
4
4.5;(t)t = 27t = 71t = 69
(b) satellite
Figure: ρ(t) using tmin = 25. Dashed-dot topt−ρ, magenta topt−G andblack topt−min, location of minimum for ρ(t) plus step.
Evaluating Image Quality : Relative error
t10 15 20 25 30 35 40 45 50
0.4
0.45
0.5
0.55
0.6
0.65
0.7Relative Error
MinMDPUPREGCVWGCVPMDPProj
(a) grain RE
t10 20 30 40 50 60 70
0.4
0.45
0.5
0.55
0.6
0.65
0.7Relative Error
MinMDPUPREGCVWGCVPMDPProj
(b) satellite RE
Figure: Relative error (RE) with increasing t. Solid line in each caseis solution with projection and without regularization.
UPRE, WGCV and PMDP outperform GCV
Solutions for different topt: (MIN, topt−min, topt−G , topt−ρ) Noise level 10%
(a) 22 (b) 21 (c) 27 (d) 29
(e) 42 (f) 71 (g) 69 (h) 27
Figure: UPRE to find ζ and comparing to solutions obtained fortopt−ρ, topt−min and topt−G as compared to solution with minimumerror, MIN.
Solutions inadequate
Iteratively Reweighted Regularization [LK83]
Minimum Support Stabilizer Regularization operator L(k).
(L(k))ii = ((x(k−1)i − x
(k−2)i )2 + β2)−1/2 β > 0
Focusing parameter β ensures L(k) invertibleInitialization L(0) = I. x(0) = x0. (might be 0)Invertibility use (L(k))−1 as right preconditioner for A
(L(k))−1ii = ((x
(k−1)i − x
(k−2)i )2 + β2)1/2 β > 0
Reduced System When β = 0 and x(k−1)i = x
(k−2)i remove
column i, matrix is A.Update Equation Solve Ay ≈ r = b−Ax(k−1). With correct
indexing set yi = yi if updated, else yi = 0.
x(k) = x(k−1) + y
Effectively treats iteration as time integration to stability
Solutions topt after two steps IRR: (MIN, topt−min, topt−G , topt−ρ)
(a) 19 k = 3 (b) 21 (c) 27 (d) 29
(e) 35 (f) 71 (g) 69 (h) 27
Figure: IRR k = 2 Grain k = 2 MIN solution is at topt−min, show k = 3.
Solutions are stabilized less dependent on t
Noise revealing function ρ(t) with k and relative error with k: 5% error
0 20 40 60 80 1000.35
0.4
0.45
0.5
0.55
0.6
0.65
0
1
2
3
4
(a) Grain RE
5 10 15 200
0.5
1
1.5
2
t = 21
t = 44
t = 27
5 10 15 200
0.05
0.1
0.15
0.2
0.25
t = 21
t = 44
t = 27
5 10 15 200
0.05
0.1
0.15
0.2
t = 21
t = 44
t = 27
5 10 15 200.02
0.04
0.06
0.08
0.1
0.12 t = 21
t = 44
t = 27
(b) Grain ρ(t)
0 50 100 1500.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.80
1
2
3
(c) Satellite
20 40 600.01
0.02
0.03
0.04
0.05
0.06
t = 71
t = 42
t = 69
5 10 15 20 250
0.01
0.02
0.03
t = 71
t = 42
t = 69
5 10 15 20 250
0.005
0.01
0.015
0.02
t = 71
t = 42
t = 69
5 10 15 20 250
0.002
0.004
0.006
0.008
0.01
0.012
t = 71
t = 42
t = 69
(d) Satellite
Figure: Determining topt with k for 5%noise using ρ(t). Dashed-dot topt−ρ,magenta topt−G , black topt−min.
Stopping criteria.I Errors decrease
initially with k andthen increase.
I Pick k look at ρ(t).I Grain k = 4 noise
enters, use k = 2.I Satellite k = 3 noise
enters, use k = 1.
Sparse tomographic reconstruction: walnut [HHK+15, HKK+13]
Projection Problem Resolution of data 164× 120.Downsampling 50%, 25%, eg m = 164× 60, m = 164× 30
Resolution Full problem is 164× 164
5 10 15 20 25 300
50
100
150
200
250120
60
30
15
(a) ρ(t) for increasing sparsity
ρ(t) quite consistent for small t and m.
Solutions at topt−ρ = 8
(b) True (c) UPRE:k = 0 (d) UPRE:k = 2
(e) Comparison (f) GCV:k = 0 (g) GCV:k = 2
tmin = 5,topt−ρ = 8,sampling at12◦ intervals,30 projections.Comparisonfrom [HKK+13],sparsity withprior, and reso-lution 256× 256
Stablized Projection no characteristic TV blocky structure
Observations
UPRE/WGCV regularization parameter estimation explainedfor projected problem.
Underdetermined problems are also solved.Iteratively Reweighted Regularization stabilizes the projected
solutionSensitivity to choice of topt reduced by IRR
topt can be estimated using ρ(t), use topt−min asindependent of other parameters
Future extend to more realistic large scale problems
Conclusions : Large Scale Problems
Spectrum Estimate efficiently σ(n)i or γi
Effective rank t or p∗
Optimal λ(N) for large scale from small scale: λ(n) or ζopt
Computational cost applies to the smaller problemλ regularization parameter estimation determined
from smaller problem.Rank In both cases solutions depend on a numerical
rankFuture extend to more realistic large scale problems
SVE exetend to Kronecker product formulationsGSVD relate to GSVE and apply for more general
operators.
Julianne M. Chung, Misha E. Kilmer, and Dianne P. O’Leary.A framework for regularization via operator approximation.SIAM Journal on Scientific Computing, 37(2):B332–B359, 2015.
Julianne Chung, James G Nagy, and DIANNE P O’Leary.A weighted GCV method for Lanczos hybrid regularization.Electronic Transactions on Numerical Analysis, 28:149–167,2008.
P. C. Hansen.Computation of the singular value expansion.Computing, 40:185–199, 1988.
Keijo Hamalainen, Lauri Harhanen, Aki Kallonen, Antti Kujanpaa,Esa Niemi, and Samuli Siltanen.Tomographic x-ray data of a walnut.arXiv preprint arXiv:1502.04064, 2015.
Keijo Hamalainen, Aki Kallonen, Ville Kolehmainen, MattiLassas, Kati Niinimaki, and Samuli Siltanen.Sparse tomography.
SIAM Journal on Scientific Computing, 35(3):B644–B665, 2013.
Iveta Hnetynkova, Martin Plesinger, and Zdenek Strakos.The regularizing effect of the Golub-Kahan iterativebidiagonalization and revealing the noise level in the data.BIT Numerical Mathematics, 49(4):669–696, 2009.
B. J. Last and K. Kubik.Compact gravity inversion.GEOPHYSICS, 48(6):713–721, 1983.
James G. Nagy, Katrina Palmer, and Lisa Perrone.Iterative methods for image deblurring: A Matlab object-orientedapproach.Numerical Algorithms, 36(1):73–93, 2004.
R. A. Renaut, S. Vatankhah, and V. E. Ardestani.Hybrid and iteratively reweighted regularization by unbiasedpredictive risk and weighted gcv for projected systems, 2015.submitted and at: http://arxiv.org/abs/1509.00096.