signal and image restoration: solving ill-posed inverse ...rosie/mypresentations/mbi_2.pdf ·...

SIGNAL AND IMAGE RESTORATION: SOLVING

ILL-POSED INVERSE PROBLEMS - NUMERICAL

ANALYSIS

Rosemary Renauthttp://math.asu.edu/˜rosie

MBI WORKSHOP

FEBRUARY 19, 2013

http://math.asu.edu/~rosie

Outline

BackgroundSimple 1D Example Signal RestorationFeature Extraction - Total Variation

The Least Squares SolutionStandard Analysis by the SVDPicard Condition for Ill-Posed ProblemsImportance of the Basis and Noise

Generalized regularizationGSVD for examining the solutionRevealing the Noise in the GSVD BasisStabilizing the GSVD Solution

Applying to TV and the SB AlgorithmParameter Estimation for the TV

Conclusions and Future

Simple Example: Blurred Signal Restoration

b(t) =∫ π−π h(t, s)x(s)ds h(s) = 1

σ√

2πexp(−s

2

2σ2 ), σ > 0.

SolveAx ≈ bwhereAdescribes the blurh(s, t) = h(s− t)but not always invariant

Sensitivity to noise in the data: here variance .0001

Solution depends on the conditioning of A, here 1.8679e+ 05

inverse crime: x = A−1(b + n)

Ill-conditioned Least Squares: Tikhonov Regularization

Solve ill-conditionedAx ≈ b

Standard Tikhonov, L approximates a derivative operator

x(λ) = arg minx{1

2‖Ax− b‖22 +

λ2

2‖Lx‖22}

x(λ) solves normal equations provided null(L)∩ null(A) = {0}

(ATA+ λ2LTL)x(λ) = ATb

This is not good for preserving edges in solutions.Not good for extracting features in images.

Tikhonov Regularized Solutions x(λ) and derivative Lx for changing λ

Figure: We cannot capture Lx (red) from the solution (green): Noticethat ‖Lx‖ decreases as λ increases

Alternative Regularization: Total Variation

Consider general regularization R(x) suited to properties of x:

x(λ) = arg minx{1

2‖Ax− b‖22 +

λ2

2R(x)}

Suppose R is total variation of x (general options are possible)For example p−norm regularization p < 2

x(λ) = arg minx{‖Ax− b‖22 + λ2‖Lx‖p}

p = 1 approximates the total variation in the solution.Problematic for large scale to solve the optimization problem.

Basic Formulation: Split Bregman (Goldstein and Osher, 2009)

Introduce d ≈ Lx and let R(x) = λ2

2 ‖d− Lx‖22 + µ‖d‖1

(x,d)(λ, µ) = arg minx,d{1

2‖Ax− b‖22 +

λ2

2‖d− Lx‖22 + µ‖d‖1}

Alternating minimization separates steps for d from x

Various versions of the iteration can be defined. Fundamentally:

S1 : x(k+1) = arg minx{1

2‖Ax− b‖22 +

λ2

2‖Lx− (d(k+1) − g(k))‖22}

S2 : d(k+1) = arg mind{λ

2

2‖d− (Lx(k+1) + g(k))‖22 + µ‖d‖1}

S3 : g(k+1) = g(k) + Lx(k+1) − d(k+1).

Advantages of the formulation

Update for g: updates the Lagrange multiplier g

S3 : g(k+1) = g(k) + Lx(k+1) − d(k+1).

This is just - a vector updateUpdate for d:

S2 : d = arg mind{µ‖d‖1 +

λ2

2‖d− c‖22}, c = Lx + g

= arg mind{‖d‖1 +

γ

2‖d− c‖22}, γ =

λ2

µ.

This is achieved using soft thresholding.

Focus here: The Tikhonov Step of the Algorithm

S1 : x(k+1) = arg minx{1

2‖Ax− b‖22 +

λ2

2‖Lx− (d(k) − g(k))‖22}

Update for x:

x(k+1) = arg minx{1

2‖Ax− b‖22 +

λ2

2‖Lx− h(k)‖22}, h(k) = d(k) − g(k).

Standard least squares update using a Tikhonov regularizer.Depends on changing right hand sideDepends on parameter λ.

The Tikhonov Update

Disadvantages of the formulationupdate for x: A Tikhonov LS update each stepChanging right hand side.Regularization parameter λ - dependent on k?Threshold parameter µ - dependent on k?

Advantages of the formulationExtensive literature on Tikhonov LS problemsTo determine stability analyze Tikhonov LS step of algorithm

I Understand the impact of the basis on the solutionI Picard conditionI Use Generalized Singular Value Decomposition for

analysis

Background: LS Solutions

SVD for the LS solutionBasis for the LS solutionThe Picard conditionGSVD for the regularized solutionGeneralized Picard condition

Spectral Decomposition of the Solution: The SVD

Consider general overdetermined discrete problem

Ax = b, A ∈ Rm×n, b ∈ Rm, x ∈ Rn, m ≥ n.

Singular value decomposition (SVD) of A (full column rank)

A = UΣV T =

n∑i=1

uiσivTi , Σ = diag(σ1, . . . , σn).

gives expansion for the solution

x =

n∑i=1

uTi b

σivi

ui, vi are left and right singular vectors for ASolution is a weighted linear combination of the basis vectors vi

Illustration: Example of the basis vectors

Problem wing n = 84: Cond(A) = 2.2056e+ 19 from kernelh(s, t): Hansen’s Regularization Toolbox.

h(s, t) = se−ts2, x(s) = 1, 1/3 < t < 2/3, b(t) = (e−t/9 − e−4t/9)/(2t)

Left Singular Vectors and Basis Depend on the kernel (matrix A)

Figure: The first few left singular vectors ui and basis vectors vi.

Are the basis vectors accurate?

Use Matlab High Precision to examine the SVD

I Matlab digits allows high precision. Standard is 32.I Symbolic toolbox allows operations on high precision

variables with vpa.I SVD for vpa variables calculates the singular values

symbolically, but not the singular vectors.I Higher accuracy for the SVs generates higher accuracy

singular vectors.I Solutions with high precision can take advantage of

Matlab’s symbolic toolbox.

Left Singular Vectors and Basis Calculated in High Precision

Figure: The first few left singular vectors ui and basis vectors vi.Higher precision preserves the frequency content of the basis.

How many can we use in the solution for x?How does inaccurate basis impact regularized solutions?

Technique to detect true basis: the Power Spectrum for detecting whitenoise : a time series analysis technique

Suppose for a given vector y that it is a time series indexed byposition, i.e. index i.Diagnostic 1 Does the histogram of entries of y generate

histogram consistent with y ∼ N(0, 1)? (i.e.independent normally distributed with mean 0 andvariance 1) Not practical to automatically look at ahistogram and make an assessment

Diagnostic 2 Test the expectation that yi are selected from awhite noise time series. Take the Fourier transformof y and form cumulative periodogram z frompower spectrum c

cj = |(dft(y)j |2, zj =

∑ji=1 cj∑qi=1 ci

, j = 1, . . . , q,

Automatic: Test is the line (zj , j/q) close to a straight line withslope 1 and length

√5/2?

Cumulative Periodogram for the left singular vectors

Figure: Standard precision on the left and high precision on the right:On left more vectors are close to white than on the right - vectors arewhite if the CP is close to the diagonal on the plot: Low frequencyvectors lie above the diagonal and high frequency below the diagonal.White noise follows the diagonal

Cumulative Periodogram for the basis vectors

Figure: Standard precision on the left and high precision on the right:On left more vectors are close to white than on the right - if you countthere are 9 vectors with true frequency content on the left

Measure Deviation from Straight Line: Basis Vectors

Figure: Testing for white noise for the standard precision vectors:Calculate the cumulative periodogram and measure the deviationfrom the “white noise” line or assess proportion of the vector outsidethe Kolmogorov Smirnov test at a 5% confidence level for white noiselines. In this case it suggests that about 9 vectors are noise free.

Cannot expect to use more than 9 vectors in the expansion forx. Additional terms are contaminated by noise - independent ofnoise in b

Discrete Picard condition: examine the weights of the expansion

Recall

x =

n∑i=1

uTi b

σivi

Here|uTi b|/σi = O(1)

Ratios are not large but are the values correct? Consideringonly the discrete Picard condition does not tell us whether theexpansion for the solution is correct.

Discrete Picard condition: examine the weights of the expansion

I From high precision calculation of σi shows they decayexponentially to zero (down to machine precision)

I The Picard condition considers ratios |uTi b|/σi for data b;they decay exponentially (down to the machine precision).

I Note ratios in this case are O(1) - hence noisecontaminated basis vectors are not ignored. i.e. toapproximate a solution with a discontinuity we need all thebasis vectors.

I We may obtain solutions by truncating the SVD

x =

k∑i=1

uTi b

σivi

I Now parameter k is a regularization parameterI For given example we know k < 10 independent of the

measured data b. We cannot see this from the Picardcondition.

The Truncated Solutions (Noise free data b) - inverse crime

Figure: Truncated SVD Solutions: Standard precision |uTi b|. Error in

the basis contaminates the solution

The Truncated Solutions (Noise free data b) - inverse crime

Figure: Truncated SVD Solutions: VPA calculation |uTi b|. Number of

terms not sufficient to represent the solution discontinuity

Observations

I Even when committing the inverse crime we will notachieve the solution if we cannot approximate the basiscorrectly.

I We need all basis vectors which contain the highfrequency terms in order to approximate a solution withhigh frequency components - e.g. edges.

I Reminder - this is independent of the data.I But is an indication of an ill-posed problem. In this case the

data that is modified exhibits in the matrix Adecomposition.

I We look at a problem with a smoother solution -what arethe issues?Problem shaw from the regularization toolbox defined oninterval [−π/2, π/2]

h(s, t) = (cos(s) + cos(t))(sin(u)/u)2, u = π(sin(s) + sin(t))

x(t) = 2 exp(−6(t− .8)2) + exp(−2(t+ .5)2)

Basis vectors for problem shaw

Figure: The first few left singular vectors vi (left) and their white noisecontent on the right

The Solutions with truncated SVD- problem shaw

Figure: Truncated SVD Solutions: data enters through coefficients|uT

i b|. On the left no noise in b and on the right with noise 10−4

In this case the low frequency vectors can represent thesolution but we need to know the regularization parameter k

Observations from the SVD analysis in presence of noise

I Number of terms k in TSVD depends on vi.I Practically measured data also contaminated by noise e.

x =

n∑i=1

(uTi (bexact + e)

σi)vi = xexact +

n∑i=1

(uTi e

σi) vi

I Note ‖UTe‖ = ‖e‖ and ‖UTb‖ = ‖b‖ is dominated by lowfrequency terms when A smooths x to give b.

I If e is uniform, we expect |uTi e| to be similar magnitude ∀i.I When σi << |uTi e| contribution of the high frequency error

is magnified and that of basis vector viI The truncated SVD is a special case of spectral filtering

xfilt =

n∑i=1

φi(uTi b

σi)vi

I Spectral filtering is used to filter the components in thespectral basis, such that noise in signal is damped.

Regularization by Spectral Filtering : This is Tikhonov regularization

xTik =

n∑i=1

φi(uTI b

σi)vi

I Tikhonov Regularization φi =σ2i

σ2i+λ2

, i = 1 . . . n, λ is theregularization parameter, and solution is

xTik(λ) = arg minx{‖b−Ax‖2 + λ2‖x‖2}

I Choice of λ2 impacts the solution.I It is exactly Tikhonov regularization that is used in the SB

algorithm. But for a change of basis determined by L.I

xTik(λ) = arg minx{‖b−Ax‖2 + λ2‖Lx− h‖2}

The Generalized Singular Value Decomposition

Introduce generalization of the SVD to obtain expansion forx(λ) = arg minx{‖Ax− b‖2 + λ2‖L(x− x0)‖2}

Lemma (GSVD)Assume invertibility and m ≥ n ≥ p. There exist unitary matrices U ∈ Rm×m,V ∈ Rp×p, and a nonsingular matrix Z ∈ Rn×n such that

A = U

[Υ

0(m−n)×n

]ZT , L = V [M, 0p×(n−p)]Z

T ,

Υ = diag(υ1, . . . , υp, 1, . . . , 1) ∈ Rn×n, M = diag(µ1, . . . , µp) ∈ Rp×p,

with

0 ≤ υ1 ≤ · · · ≤ υp ≤ 1, 1 ≥ µ1 ≥ · · · ≥ µp > 0, υ2i + µ2

i = 1, i = 1, . . . p.

Use Υ̃ and M̃ to denote the rectangular matrices containing Υ and M , andnote generalized singular values: ρi = νi

µi

Solution of the Generalized Problem using the GSVD h = 0

x(λ) =

p∑i=1

νiν2i + λ2µ2

i

(uTi b)z̃i +

n∑i=p+1

(uTi b)z̃i

z̃i is the ith column of (ZT )−1.Equivalently with filter factor φi

x(λ) =

p∑i=1

φiuTi b

νiz̃i +

n∑i=p+1

(uTi b)z̃i,

Lx(λ) =

p∑i=1

φiuTi b

ρivi, φi =

ρ2i

ρ2i + λ2

,

Notice the similarity with the filtered SVD solution

x(λ) =

n∑i=1

φiuTi b

σivi, φi =

σ2i

σ2i + λ2

.

Does noise enter the GSVD basis?

Form the truncated matrix Ak using only k basis vectors vi

Figure: Contrast GSVD basis u (left) z (right). Trade off U and Z

Does noise enter the GSVD basis from using matrix A?

Figure: This is confirmed by examining the KS test for white noise

Example Solution: Problem phillips

Figure: Solution is robust to truncating the GSVD

Summary For the Tikhonov Least Squares Solution - intrinsic to TV

I Basis vectors are subject to noise and contaminate thesolution independent of the data.

I Solutions require finding k.I We can determine k by examining noise in the basisI Using the GSVD we see that Tikhonov regularization

yields a basis with smoothed basis vectorsI But we can apply the same techniquesI Stabilizing both TSVD and TGSVD is robust to parameter

estimationI Parameter estimation for truncated expansions is cheaperI Reminder truncation k is independent of the data b.

The TIK update of the TV algorithm

Update for x: using the GSVD : basis Z

x(k+1) =

p∑i=1

(νiu

Ti b

ν2i + λ2µ2

i

+λ2µiv

Ti h

(k)

ν2i + λ2µ2

i

)zi +

n∑i=p+1

(uTi b)zi

A weighted combination of basis vectors zi: weights

(i)ν2i

ν2i + λ2µ2

i

uTi b

νi(ii)

λ2µ2i

ν2i + λ2µ2

i

vTi h(k)

µi

Notice (i) is fixed by b, but (ii) depends on the updates h(k)

i.e. (i) is iteration independentIf (i) dominates (ii) solution will converge slowly or not at allλ impacts the solution and must not over damp (ii)

Update of the mapped data

x(k+1) =

p∑i=1

(φi

uTi b

νi+ (1− φi)

vTi h(k)

µi

)zi +

n∑i=p+1

(uTi b)zi

Lx(k+1) =

p∑i=1

(φi

uTi b

ρi+ (1− φi)vTi h(k)

)vi.

Now the stability depends on coefficients φiρi

, 1− φi, φiνi and1−φiµi

with respect to the inner products uTi b, vTi h(k)

Examining the weights with λ increasing

Coefficients |uTi b| and |vTi h|, λ = .001, .1 and 10, k = 1, 10 and Final

Examining the weights with λ increasing

Coefficients of x(k) and Lx(k) for λ = .001, .1 and 10,k = 1, 10 and Final

Examining the solutions with λ increasing

Solutions: λ = .001, .1 and 10

Heuristic Observations

It is important to analyze the components of the solutionThe basis is importantRegularization parameters should be adjusted dynamically.Theoretical results apply concerning parameter estimation -L-curve etc.

Unbiased Predictive Risk for Estimating λ Tikhonov Regularization

Regularization matrix

x(λ) = R(λ)bwhereR(λ) = (ATA+ λ2I)−1AT

Predictive error - cannot be calculated

p(λ) = AR(λ)b− b̂ = A(λ)(b̂ + η)− b̂

= (A(λ)− Im)b̂︸︷︷︸+A(λ)η︸︷︷︸= deterministic stochastic

Predictive risk - can be calculated

r(λ) = (A(λ)− Im)b

= (A(λ)− Im)b̂︸︷︷︸+ (A(λ)− Im)η︸︷︷︸= deterministic stochastic

Inverse covariance weighting yields UPRE functional

U(λ) := ‖r(λ)‖2 + 2 trace(A(λ))−m)

Theoretical results: using Unbiased Predictive Risk for the SB Tik

LemmaSuppose noise in h(k) is stochastic and both data fit Ax ≈ band derivative data fit Łx ≈ h are weighted by their inversecovariance matrices for normally distributed noise in b and h;then optimal choice for λ at all steps is λ = 1.

RemarkCan we expect h(k) is stochastic?

LemmaSuppose h(k+1) is regarded as deterministic, then UPREapplied to find the the optimal choice for λ at each step leads toa different optimum at each step, namely it depends on h.

RemarkBecause h changes each step the optimal choice for λ usingUPRE will change with each iteration, λ varies over all steps.

Example Solution: 1D: Increasing Noise Levels

Observations

1. The results demonstrate regularization parameterestimation needed

2. Choice based on the LS solution is useful but not sufficient3. Curiously demonstrate that statistical analysis is relevant4. For non piecewise data the parameter choice λ = 1 is not

appropriate.

Example Solution: 2D - various examples (SNR in title)

Further Observations and Future Work

Results demonstrate basic analysis of problem isworthwhile

Extend standard parameter estimation from LSOverhead of optimal λ for the first step - reasonableStochastic interpretation - use fixed λ = 1 after the first

iteration is found.Practically use LSQR algorithms rather than GSVD - changes

the basis to the Krylov basis. Or use randomizedSVD to generate randomized GSVD

Noise in the basis is relevant for LSQR / CGLS iterativealgorithms

Convergence testing is based on h.

Initial work - much to do

signal and image restoration: solving ill-posed inverse ...rosie/mypresentations/mbi_2.pdf ·...

Documents