signal and image restoration: solving ill-posed inverse ...rosie/mypresentations/mbi_2.pdf ·...
TRANSCRIPT
SIGNAL AND IMAGE RESTORATION: SOLVING
ILL-POSED INVERSE PROBLEMS - NUMERICAL
ANALYSIS
Rosemary Renauthttp://math.asu.edu/˜rosie
MBI WORKSHOP
FEBRUARY 19, 2013
Outline
BackgroundSimple 1D Example Signal RestorationFeature Extraction - Total Variation
The Least Squares SolutionStandard Analysis by the SVDPicard Condition for Ill-Posed ProblemsImportance of the Basis and Noise
Generalized regularizationGSVD for examining the solutionRevealing the Noise in the GSVD BasisStabilizing the GSVD Solution
Applying to TV and the SB AlgorithmParameter Estimation for the TV
Conclusions and Future
Simple Example: Blurred Signal Restoration
b(t) =∫ π−π h(t, s)x(s)ds h(s) = 1
σ√
2πexp(−s
2
2σ2 ), σ > 0.
SolveAx ≈ bwhereAdescribes the blurh(s, t) = h(s− t)but not always invariant
Sensitivity to noise in the data: here variance .0001
Solution depends on the conditioning of A, here 1.8679e+ 05
inverse crime: x = A−1(b + n)
Ill-conditioned Least Squares: Tikhonov Regularization
Solve ill-conditionedAx ≈ b
Standard Tikhonov, L approximates a derivative operator
x(λ) = arg minx{1
2‖Ax− b‖22 +
λ2
2‖Lx‖22}
x(λ) solves normal equations provided null(L)∩ null(A) = {0}
(ATA+ λ2LTL)x(λ) = ATb
This is not good for preserving edges in solutions.Not good for extracting features in images.
Tikhonov Regularized Solutions x(λ) and derivative Lx for changing λ
Figure: We cannot capture Lx (red) from the solution (green): Noticethat ‖Lx‖ decreases as λ increases
Tikhonov Regularized Solutions x(λ) and derivative Lx for changing λ
Figure: We cannot capture Lx (red) from the solution (green): Noticethat ‖Lx‖ decreases as λ increases
Tikhonov Regularized Solutions x(λ) and derivative Lx for changing λ
Figure: We cannot capture Lx (red) from the solution (green): Noticethat ‖Lx‖ decreases as λ increases
Tikhonov Regularized Solutions x(λ) and derivative Lx for changing λ
Figure: We cannot capture Lx (red) from the solution (green): Noticethat ‖Lx‖ decreases as λ increases
Alternative Regularization: Total Variation
Consider general regularization R(x) suited to properties of x:
x(λ) = arg minx{1
2‖Ax− b‖22 +
λ2
2R(x)}
Suppose R is total variation of x (general options are possible)For example p−norm regularization p < 2
x(λ) = arg minx{‖Ax− b‖22 + λ2‖Lx‖p}
p = 1 approximates the total variation in the solution.Problematic for large scale to solve the optimization problem.
Basic Formulation: Split Bregman (Goldstein and Osher, 2009)
Introduce d ≈ Lx and let R(x) = λ2
2 ‖d− Lx‖22 + µ‖d‖1
(x,d)(λ, µ) = arg minx,d{1
2‖Ax− b‖22 +
λ2
2‖d− Lx‖22 + µ‖d‖1}
Alternating minimization separates steps for d from x
Various versions of the iteration can be defined. Fundamentally:
S1 : x(k+1) = arg minx{1
2‖Ax− b‖22 +
λ2
2‖Lx− (d(k+1) − g(k))‖22}
S2 : d(k+1) = arg mind{λ
2
2‖d− (Lx(k+1) + g(k))‖22 + µ‖d‖1}
S3 : g(k+1) = g(k) + Lx(k+1) − d(k+1).
Advantages of the formulation
Update for g: updates the Lagrange multiplier g
S3 : g(k+1) = g(k) + Lx(k+1) − d(k+1).
This is just - a vector updateUpdate for d:
S2 : d = arg mind{µ‖d‖1 +
λ2
2‖d− c‖22}, c = Lx + g
= arg mind{‖d‖1 +
γ
2‖d− c‖22}, γ =
λ2
µ.
This is achieved using soft thresholding.
Focus here: The Tikhonov Step of the Algorithm
S1 : x(k+1) = arg minx{1
2‖Ax− b‖22 +
λ2
2‖Lx− (d(k) − g(k))‖22}
Update for x:
x(k+1) = arg minx{1
2‖Ax− b‖22 +
λ2
2‖Lx− h(k)‖22}, h(k) = d(k) − g(k).
Standard least squares update using a Tikhonov regularizer.Depends on changing right hand sideDepends on parameter λ.
The Tikhonov Update
Disadvantages of the formulationupdate for x: A Tikhonov LS update each stepChanging right hand side.Regularization parameter λ - dependent on k?Threshold parameter µ - dependent on k?
Advantages of the formulationExtensive literature on Tikhonov LS problemsTo determine stability analyze Tikhonov LS step of algorithm
I Understand the impact of the basis on the solutionI Picard conditionI Use Generalized Singular Value Decomposition for
analysis
Background: LS Solutions
SVD for the LS solutionBasis for the LS solutionThe Picard conditionGSVD for the regularized solutionGeneralized Picard condition
Spectral Decomposition of the Solution: The SVD
Consider general overdetermined discrete problem
Ax = b, A ∈ Rm×n, b ∈ Rm, x ∈ Rn, m ≥ n.
Singular value decomposition (SVD) of A (full column rank)
A = UΣV T =
n∑i=1
uiσivTi , Σ = diag(σ1, . . . , σn).
gives expansion for the solution
x =
n∑i=1
uTi b
σivi
ui, vi are left and right singular vectors for ASolution is a weighted linear combination of the basis vectors vi
Illustration: Example of the basis vectors
Problem wing n = 84: Cond(A) = 2.2056e+ 19 from kernelh(s, t): Hansen’s Regularization Toolbox.
h(s, t) = se−ts2, x(s) = 1, 1/3 < t < 2/3, b(t) = (e−t/9 − e−4t/9)/(2t)
Left Singular Vectors and Basis Depend on the kernel (matrix A)
Figure: The first few left singular vectors ui and basis vectors vi.
Are the basis vectors accurate?
Use Matlab High Precision to examine the SVD
I Matlab digits allows high precision. Standard is 32.I Symbolic toolbox allows operations on high precision
variables with vpa.I SVD for vpa variables calculates the singular values
symbolically, but not the singular vectors.I Higher accuracy for the SVs generates higher accuracy
singular vectors.I Solutions with high precision can take advantage of
Matlab’s symbolic toolbox.
Left Singular Vectors and Basis Calculated in High Precision
Figure: The first few left singular vectors ui and basis vectors vi.Higher precision preserves the frequency content of the basis.
How many can we use in the solution for x?How does inaccurate basis impact regularized solutions?
Technique to detect true basis: the Power Spectrum for detecting whitenoise : a time series analysis technique
Suppose for a given vector y that it is a time series indexed byposition, i.e. index i.Diagnostic 1 Does the histogram of entries of y generate
histogram consistent with y ∼ N(0, 1)? (i.e.independent normally distributed with mean 0 andvariance 1) Not practical to automatically look at ahistogram and make an assessment
Diagnostic 2 Test the expectation that yi are selected from awhite noise time series. Take the Fourier transformof y and form cumulative periodogram z frompower spectrum c
cj = |(dft(y)j |2, zj =
∑ji=1 cj∑qi=1 ci
, j = 1, . . . , q,
Automatic: Test is the line (zj , j/q) close to a straight line withslope 1 and length
√5/2?
Cumulative Periodogram for the left singular vectors
Figure: Standard precision on the left and high precision on the right:On left more vectors are close to white than on the right - vectors arewhite if the CP is close to the diagonal on the plot: Low frequencyvectors lie above the diagonal and high frequency below the diagonal.White noise follows the diagonal
Cumulative Periodogram for the basis vectors
Figure: Standard precision on the left and high precision on the right:On left more vectors are close to white than on the right - if you countthere are 9 vectors with true frequency content on the left
Measure Deviation from Straight Line: Basis Vectors
Figure: Testing for white noise for the standard precision vectors:Calculate the cumulative periodogram and measure the deviationfrom the “white noise” line or assess proportion of the vector outsidethe Kolmogorov Smirnov test at a 5% confidence level for white noiselines. In this case it suggests that about 9 vectors are noise free.
Cannot expect to use more than 9 vectors in the expansion forx. Additional terms are contaminated by noise - independent ofnoise in b
Discrete Picard condition: examine the weights of the expansion
Recall
x =
n∑i=1
uTi b
σivi
Here|uTi b|/σi = O(1)
Ratios are not large but are the values correct? Consideringonly the discrete Picard condition does not tell us whether theexpansion for the solution is correct.
Discrete Picard condition: examine the weights of the expansion
I From high precision calculation of σi shows they decayexponentially to zero (down to machine precision)
I The Picard condition considers ratios |uTi b|/σi for data b;they decay exponentially (down to the machine precision).
I Note ratios in this case are O(1) - hence noisecontaminated basis vectors are not ignored. i.e. toapproximate a solution with a discontinuity we need all thebasis vectors.
I We may obtain solutions by truncating the SVD
x =
k∑i=1
uTi b
σivi
I Now parameter k is a regularization parameterI For given example we know k < 10 independent of the
measured data b. We cannot see this from the Picardcondition.
The Truncated Solutions (Noise free data b) - inverse crime
Figure: Truncated SVD Solutions: Standard precision |uTi b|. Error in
the basis contaminates the solution
The Truncated Solutions (Noise free data b) - inverse crime
Figure: Truncated SVD Solutions: VPA calculation |uTi b|. Number of
terms not sufficient to represent the solution discontinuity
Observations
I Even when committing the inverse crime we will notachieve the solution if we cannot approximate the basiscorrectly.
I We need all basis vectors which contain the highfrequency terms in order to approximate a solution withhigh frequency components - e.g. edges.
I Reminder - this is independent of the data.I But is an indication of an ill-posed problem. In this case the
data that is modified exhibits in the matrix Adecomposition.
I We look at a problem with a smoother solution -what arethe issues?Problem shaw from the regularization toolbox defined oninterval [−π/2, π/2]
h(s, t) = (cos(s) + cos(t))(sin(u)/u)2, u = π(sin(s) + sin(t))
x(t) = 2 exp(−6(t− .8)2) + exp(−2(t+ .5)2)
Basis vectors for problem shaw
Figure: The first few left singular vectors vi (left) and their white noisecontent on the right
The Solutions with truncated SVD- problem shaw
Figure: Truncated SVD Solutions: data enters through coefficients|uT
i b|. On the left no noise in b and on the right with noise 10−4
In this case the low frequency vectors can represent thesolution but we need to know the regularization parameter k
Observations from the SVD analysis in presence of noise
I Number of terms k in TSVD depends on vi.I Practically measured data also contaminated by noise e.
x =
n∑i=1
(uTi (bexact + e)
σi)vi = xexact +
n∑i=1
(uTi e
σi) vi
I Note ‖UTe‖ = ‖e‖ and ‖UTb‖ = ‖b‖ is dominated by lowfrequency terms when A smooths x to give b.
I If e is uniform, we expect |uTi e| to be similar magnitude ∀i.I When σi << |uTi e| contribution of the high frequency error
is magnified and that of basis vector viI The truncated SVD is a special case of spectral filtering
xfilt =
n∑i=1
φi(uTi b
σi)vi
I Spectral filtering is used to filter the components in thespectral basis, such that noise in signal is damped.
Regularization by Spectral Filtering : This is Tikhonov regularization
xTik =
n∑i=1
φi(uTI b
σi)vi
I Tikhonov Regularization φi =σ2i
σ2i+λ2
, i = 1 . . . n, λ is theregularization parameter, and solution is
xTik(λ) = arg minx{‖b−Ax‖2 + λ2‖x‖2}
I Choice of λ2 impacts the solution.I It is exactly Tikhonov regularization that is used in the SB
algorithm. But for a change of basis determined by L.I
xTik(λ) = arg minx{‖b−Ax‖2 + λ2‖Lx− h‖2}
The Generalized Singular Value Decomposition
Introduce generalization of the SVD to obtain expansion forx(λ) = arg minx{‖Ax− b‖2 + λ2‖L(x− x0)‖2}
Lemma (GSVD)Assume invertibility and m ≥ n ≥ p. There exist unitary matrices U ∈ Rm×m,V ∈ Rp×p, and a nonsingular matrix Z ∈ Rn×n such that
A = U
[Υ
0(m−n)×n
]ZT , L = V [M, 0p×(n−p)]Z
T ,
Υ = diag(υ1, . . . , υp, 1, . . . , 1) ∈ Rn×n, M = diag(µ1, . . . , µp) ∈ Rp×p,
with
0 ≤ υ1 ≤ · · · ≤ υp ≤ 1, 1 ≥ µ1 ≥ · · · ≥ µp > 0, υ2i + µ2
i = 1, i = 1, . . . p.
Use Υ̃ and M̃ to denote the rectangular matrices containing Υ and M , andnote generalized singular values: ρi = νi
µi
Solution of the Generalized Problem using the GSVD h = 0
x(λ) =
p∑i=1
νiν2i + λ2µ2
i
(uTi b)z̃i +
n∑i=p+1
(uTi b)z̃i
z̃i is the ith column of (ZT )−1.Equivalently with filter factor φi
x(λ) =
p∑i=1
φiuTi b
νiz̃i +
n∑i=p+1
(uTi b)z̃i,
Lx(λ) =
p∑i=1
φiuTi b
ρivi, φi =
ρ2i
ρ2i + λ2
,
Notice the similarity with the filtered SVD solution
x(λ) =
n∑i=1
φiuTi b
σivi, φi =
σ2i
σ2i + λ2
.
Does noise enter the GSVD basis?
Form the truncated matrix Ak using only k basis vectors vi
Figure: Contrast GSVD basis u (left) z (right). Trade off U and Z
Does noise enter the GSVD basis from using matrix A?
Figure: This is confirmed by examining the KS test for white noise
Example Solution: Problem phillips
Figure: Solution is robust to truncating the GSVD
Summary For the Tikhonov Least Squares Solution - intrinsic to TV
I Basis vectors are subject to noise and contaminate thesolution independent of the data.
I Solutions require finding k.I We can determine k by examining noise in the basisI Using the GSVD we see that Tikhonov regularization
yields a basis with smoothed basis vectorsI But we can apply the same techniquesI Stabilizing both TSVD and TGSVD is robust to parameter
estimationI Parameter estimation for truncated expansions is cheaperI Reminder truncation k is independent of the data b.
The TIK update of the TV algorithm
Update for x: using the GSVD : basis Z
x(k+1) =
p∑i=1
(νiu
Ti b
ν2i + λ2µ2
i
+λ2µiv
Ti h
(k)
ν2i + λ2µ2
i
)zi +
n∑i=p+1
(uTi b)zi
A weighted combination of basis vectors zi: weights
(i)ν2i
ν2i + λ2µ2
i
uTi b
νi(ii)
λ2µ2i
ν2i + λ2µ2
i
vTi h(k)
µi
Notice (i) is fixed by b, but (ii) depends on the updates h(k)
i.e. (i) is iteration independentIf (i) dominates (ii) solution will converge slowly or not at allλ impacts the solution and must not over damp (ii)
Update of the mapped data
x(k+1) =
p∑i=1
(φi
uTi b
νi+ (1− φi)
vTi h(k)
µi
)zi +
n∑i=p+1
(uTi b)zi
Lx(k+1) =
p∑i=1
(φi
uTi b
ρi+ (1− φi)vTi h(k)
)vi.
Now the stability depends on coefficients φiρi
, 1− φi, φiνi and1−φiµi
with respect to the inner products uTi b, vTi h(k)
Examining the weights with λ increasing
Coefficients |uTi b| and |vTi h|, λ = .001, .1 and 10, k = 1, 10 and Final
Examining the weights with λ increasing
Coefficients of x(k) and Lx(k) for λ = .001, .1 and 10,k = 1, 10 and Final
Examining the solutions with λ increasing
Solutions: λ = .001, .1 and 10
Heuristic Observations
It is important to analyze the components of the solutionThe basis is importantRegularization parameters should be adjusted dynamically.Theoretical results apply concerning parameter estimation -L-curve etc.
Unbiased Predictive Risk for Estimating λ Tikhonov Regularization
Regularization matrix
x(λ) = R(λ)bwhereR(λ) = (ATA+ λ2I)−1AT
Predictive error - cannot be calculated
p(λ) = AR(λ)b− b̂ = A(λ)(b̂ + η)− b̂
= (A(λ)− Im)b̂︸ ︷︷ ︸+A(λ)η︸ ︷︷ ︸= deterministic stochastic
Predictive risk - can be calculated
r(λ) = (A(λ)− Im)b
= (A(λ)− Im)b̂︸ ︷︷ ︸+ (A(λ)− Im)η︸ ︷︷ ︸= deterministic stochastic
Inverse covariance weighting yields UPRE functional
U(λ) := ‖r(λ)‖2 + 2 trace(A(λ))−m)
Theoretical results: using Unbiased Predictive Risk for the SB Tik
LemmaSuppose noise in h(k) is stochastic and both data fit Ax ≈ band derivative data fit Łx ≈ h are weighted by their inversecovariance matrices for normally distributed noise in b and h;then optimal choice for λ at all steps is λ = 1.
RemarkCan we expect h(k) is stochastic?
LemmaSuppose h(k+1) is regarded as deterministic, then UPREapplied to find the the optimal choice for λ at each step leads toa different optimum at each step, namely it depends on h.
RemarkBecause h changes each step the optimal choice for λ usingUPRE will change with each iteration, λ varies over all steps.
Example Solution: 1D: Increasing Noise Levels
Example Solution: 1D: Increasing Noise Levels
Observations
1. The results demonstrate regularization parameterestimation needed
2. Choice based on the LS solution is useful but not sufficient3. Curiously demonstrate that statistical analysis is relevant4. For non piecewise data the parameter choice λ = 1 is not
appropriate.
Example Solution: 2D - various examples (SNR in title)
Example Solution: 2D - various examples (SNR in title)
Example Solution: 2D - various examples (SNR in title)
Example Solution: 2D - various examples (SNR in title)
Further Observations and Future Work
Results demonstrate basic analysis of problem isworthwhile
Extend standard parameter estimation from LSOverhead of optimal λ for the first step - reasonableStochastic interpretation - use fixed λ = 1 after the first
iteration is found.Practically use LSQR algorithms rather than GSVD - changes
the basis to the Krylov basis. Or use randomizedSVD to generate randomized GSVD
Noise in the basis is relevant for LSQR / CGLS iterativealgorithms
Convergence testing is based on h.
Initial work - much to do