towards solution of large scale image restoration and

Motivation Statistical Results for Least Squares Newton algorithm Large Scale Problems Stopping the EM Algorithm Statistically Edge Detection for PSF Estimation

Towards Solution of Large Scale Image Restoration and ReconstructionProblems

Rosemary RenautJoint work with Anne Gelb, Aditya Viswanathan, Hongbin Guo, Doug Cochran,Youzuo

Lin,Arizona State University

Jodi Mead, Boise State University

November 4, 2009

National Science Foundation: Division of Computational Mathematics 1 / 49


Outline

1 MotivationQuick Review

2 Statistical Results for Least SquaresSummary of LS Statistical ResultsImplications of Statistical Results for Regularized Least Squares

3 Newton algorithmAlgorithm with LSQR (Paige and Saunders)Results

4 Large Scale ProblemsApplication in Image Reconstruction and Restoration

5 Stopping the EM Algorithm Statistically

6 Edge Detection for PSF Estimation



Signal/Image Restoration:

Integral Model of Signal Degradation b(t) =RK(t, s)x(s)ds

K(t, s) describes blur of the signal.

Convolutional model: invariant K(t, s) = K(t− s) is Point Spread Function (PSF).

Typically sampling includes noise e(t), model is

b(t) =

ZK(t− s)x(s)ds+ e(t)

Discrete model: given discrete samples b, find samples x of xLet A discretize K, assume known, model is given by

b = Ax + e.

Naıvely invert the system to find x!



Example 1-D Original and Blurred Noisy Signal

Original signal x. Blurred and noisy signal bGaussian PSF.



The Solution: Regularization is needed

Naıve Solution A Regularized Solution



Least Squares for Ax = b: A Quick Review

Background

Consider discrete systems: A ∈ Rm×n, b ∈ Rm, x ∈ Rn

Ax = b + e,

Classical Approach Linear Least Squares (A full rank)

xLS = arg minx||Ax− b||22

Difficulty xLS sensitive to changes in right hand side b when A is ill-conditioned.

For convolutional models system is ill-posed.



Introduce Regularization to Find Acceptable Solution

Weighted Fidelity with Regularization• Regularize

xRLS(λ) = arg minx{‖b−Ax‖2Wb

+ λ2R(x)},

Weighting matrix Wb

• R(x) is a regularization term

• λ is a regularization parameter which is unknown.Solution xRLS(λ)

depends on λ.depends on regularization operator Rdepends on the weighting matrix Wb



The Weighting Matrix:

Some Assumptions for Multiple Data MeasurementsGiven multiple measurements of data b:

Usually error in b, e is an m−vector of random measurement errors with mean 0 andpositive definite covariance matrix Cb = E(eeT ).

For uncorrelated heteroskedastic measurements Cb is diagonal matrix of standarddeviations of the errors. (Colored noise)

For white noise Cb = σ2I .

Weighting by Wb = Cb−1 in data fit term, theoretically, e = Wb

1/2e are uncorrelated.

Difficulty if Wb increases ill-conditioning of A!

For images find Wb from the image data



Formulation: Generalized Tikhonov Regularization With Weighting

Use R(x) = ‖D(x− x0)‖2

x = argmin J(x) = argmin{‖Ax− b‖2Wb+ λ2‖D(x− x0)‖2}. (1)

D is a suitable operator, often derivative approximation.

AssumeN (A) ∩N (D) = {0}x0 is a reference solution, often x0 = 0, might need to be average solution.

Having found λ, the posterior inverse covariance matrix is

Wx = ATWbA+ λ2I

Posterior information can give some confidence on parameter estimates.



Choice of λ is crucial: an example with D = I .



Choice of λ crucial: Different algorithms - different solutions.

Discrepancy Principle

Suppose noise is white: Cb = σ2bI .

Find λ such that the regularized residual satisfies

σ2b =

1

m‖b−Ax(λ)‖22.

Can be implemented by a Newton root finding algorithm.

But discrepancy principle typically oversmooths.

Others [Vog02]L-Curve

Generalized Cross Validation (GCV)

Unbiased Predictive Risk (UPRE)



Some standard approaches I: L-curve - Find the corner

Let r(λ) = (A(λ)−A)b: InfluenceMatrixA(λ) = A(ATWbA+ λ2DTD)−1AT

Plot

log(‖Dx‖), log(‖r(λ)‖)

Trade off contributions.

Expensive - requires range of λ.

GSVD makes calculations efficient.

Not statistically based

Find corner

No cornerNational Science Foundation: Division of Computational Mathematics 12 / 49


Generalized Cross-Validation (GCV)

LetA(λ) = A(ATWbA+ λ2DTD)−1AT

Can pick Wb = I .

Minimize GCV function

‖b−Ax(λ)‖2Wb

[trace(Im −A(λ))]2,

which estimates predictive risk.



Requires minimum

Multiple minima

Sometimes flat



Unbiased Predictive Risk Estimation (UPRE)

Minimize expected value of predictiverisk: Minimize UPRE function

‖b−Ax(λ)‖2Wb

+2 trace(A(λ))−m



Need estimate of traceMinimum needed



Background: Statistics of the Least Squares Problem

Theorem (Rao73)

Let r be the rank of A and for b ∼ N(Ax, σ2bI), (errors in measurements are normally

distributed with mean 0 and covariance σ2bI), then

J = minx‖Ax− b‖2 ∼ σ2

bχ2(m− r).

J follows a χ2 distribution with m− r degrees of freedom:Basically the Discrepancy Principle

Corollary (Weighted Least Squares)

For b ∼ N(Ax, Cb), Wb = Cb−1 then

J = minx‖Ax− b‖2Wb

∼ χ2(m− r).



Extension: Statistics of the Regularized Least Squares Problem

Thm: χ2 distribution of the regularized functional (Renaut/Mead 2008) (NOTE: WeightingMatrix on Regularization term.)

x = argmin JD(x) = argmin{‖Ax− b‖2Wb+ ‖(x− x0)‖2WD

}, WD = DTWxD. (2)

Assume

Wb and Wx are symmetric positive definite.

Problem is uniquely solvableN (A) ∩N (D) = {0}.Moore-Penrose generalized inverse of WD is CD

Statistics: Errors in the right hand side e ∼ N(0, Cb), and x0 is known so that(x− x0) = f ∼ N(0, CD),

x0 is the mean vector of the model parameters.

ThenJD(x(WD)) ∼ χ2(m+ p− n)



Significance of the χ2 result

JD ∼ χ2(m+ p− n)

For sufficiently large m = m+ p− n

E(J(x(WD))) = m+ p− n E(JJT ) = 2(m+ p− n)

Moreoverm−

√2mzα/2 < J(x(WD)) < m+

√2mzα/2. (3)

zα/2 is the relevant z-value for a χ2-distribution with m = m+ p− n degrees



General Result [MR09b], [RHM09], [MR09a]

The Cost Functional follows a χ2 Statistical Distribution

If x0 is not the mean value, then we introduce a non-central χ2 distribution with centralityparameter c.

If the problem is rank deficient the degrees of freedom are reduced.

Suppose degrees of freedom m and centrality parameter c then

E(JD) = m+ c E(JDJTD) = 2(m) + 4c

Suggests: Try to find WD so that E(J) = m+ c

First find λ only.

Find Wx = λ2I



What do we need to apply the Theory?

RequirementsCovariance Cb on data parameters b (or on model parameters x!)

A priori information x0, mean x.

But x (and hence x0) are not known.

If not known use repeated data measurements calculate Cb and mean b.

Hence estimate the centrality parameter E(b) = AE(x) implies b = Ax. Hence

c = ‖c‖22 = ‖QUTWb1/2(b−Ax0)‖22

E(JD) = E(‖QUTWb1/2(b−Ax0)‖22) = m+ p− n+ ‖c‖22

Given the GSVD estimate the degrees of freedom m.

Then we can use E(J) to find λ



Assume x0 is the mean (experimentalists know something about model parameters)

DESIGNING THE ALGORITHM: IRecall: if Cb and Cx are good estimates of covariance

|JD(x)− (m+ p− n)|

should be small.Thus, let m = m+ p− n then we want

m−√

2mzα/2 < J(x(WD)) < m+√

2mzα/2.

zα/2 is the relevant z-value for a χ2-distribution with m degrees

GOALFind Wx to make (3) tight: Single Variable case find λ

JD(x(λ)) ≈ m



A Newton-line search Algorithm to find λ = 1/σ. (Basic algebra)

Newton to Solve F (σ) = JD(σ)− m = 0

We use σ = 1/λ, and y(σ(k)) is the current solution for which

x(σ(k)) = y(σ(k)) + x0

Then∂

∂σJ(σ) = − 2

σ3‖Dy(σ)‖2 < 0

Hence we have a basic Newton Iteration

σ(k+1) = σ(k)(1 +1

2(σ(k)

‖Dy‖ )2(JD(σ(k))− m)).



Practical Details of Algorithm: Large Scale problems

AlgorithmInitialization

Convert generalized Tikhonov problem to standard form.( if L is not invertible you justneed to know how to find Ax and ATx, and the null space of L)

Use LSQR (Paige and Saunders) algorithm to find the bidiagonal matrix for the projectedproblem.

Obtain a solution of the bidiagonal problem for given initial σ.

Subsequent StepsIncrease dimension of space if needed with reuse of existing bidiagonalization. May alsouse smaller size system if appropriate.

Each σ calculation of algorithm reuses saved information from the Lancosbidiagonalization.



Illustrating the Results for Problem Size 512: Two Standard Test Problems

Comparison for noise level 10%. On left D = I and on right D is first derivative

Notice L-curve and χ2-LSQR perform well.

UPRE does not perform well.



Real Data: Seismic Signal Restoration

The Data Set and GoalReal data set of 48 signals of length 3000.

The point spread function is derived from the signals.

Calculate the signal variance pointwise over all 48 signals.

Goal: restore the signal x from Ax = b, where A is PSF matrix and b is given blurredsignal.

Method of Comparison- no exact solution known: use convergence with respect todownsampling.



Comparison High Resolution White noise

Greater contrast with χ2 . UPRE is insufficiently regularized.L-curve severely undersmooths (not shown). Parameters not consistent across resolutions



THE UPRE SOLUTION: x0 = 0 White Noise

Regularization Parameters are consistent: σ = 0.01005 all resolutions



THE LSQR Hybrid SOLUTION: White Noise

Regularization quite consistent resolution 2 to 100σ = 0.0000029, .0000029, .0000029, .0000057, .0000057



Illustrating the Deblurring Result: Problem Size 65536

Example taken from RESTORE TOOLS Nagy et al 2007-8: 15% Noise

Computational Cost is Minimal: Projected Problem Size is 15, λ = .58



Problem Grain noise 15% added : increasing subproblem size to validate againstincreasing subproblem size

(a) Signal to noise ratio 10 log10(1/e) relative error e (b) Regularization Parameter Against ProblemSize



Illustrating the progress of the Newton algorithm post LSQR



Illustrating the progress of the Newton algorithm with LSQR



Problem Grain noise 15% added for increasing subproblem size

Figure: Signal to noise ratio 10 log10(1/e) relative error e



An Alternative Direction For Large Scale Problems : Domain Decomposition [Ren98]

Domain decomposition of x into several domains:

x = (xT1 ,xT2 , ...,x

Tp )T .

Corresponding to different splitting of image x, kernel operator A is split

A = (A1, A2, ..., Ap).

eg: Different Splitting Schemes



Formulation - Regularized Least Squares [LRG09]

The linear system Ax ≈ b is replaced with the split systems

Aiyi ≈ bi(x), bi(x) = b−Xj 6=i

Ajxj = b−Ax +Aixi.

Locally solve Ax ≈ b

minyi∈<ni

‖Aiyi − bi(x)‖2, 1 ≤ i ≤ p.

If the problem is ill-posed we have the regularized problem

minf

˘‖ Ax− b ‖22 +λ2‖Dx‖22

¯.

Similarly, we will have splitting on operator assuming local regularization„ADΛ

«=

„„A1

λ1D1

«„A2

λ2D2

«· · ·„ApλpDp

««.

Solve iteratively using novel updates for changing right hand sides, [Saa87], [CW97]Update Scheme - Global solution update from local solutions at step k to step k + 1

x(k+1) =

pXi=1

τ(k+1)i (xlocal)

(k+1)i ,

where (xlocal)(k+1)i = ((x

(k)1 )T , . . . , (x

(k)i−1)T , (y

(k+1)i )T , (x

(k)i+1)T , . . . , (x

(k)p )T )T



Feasibility of the Approach 1-D Phillips, size 1024, noise level 6% Regularization .25

(a) No Splitting Rel Error .0525 (b) 4 Sub Problems Rel Error .0499



2-D PET Reconstruction, Size 64× 64, Noise Level 1.5% Regularization .2

(c) No Splitting SNR 11.73DB (d) 4 Sub Problems SNR 12.24DB



A New Stopping Rule for the EM Method [GR09])

Quick Rationale PETWell-known that ML-EM method converges to overly noisy solutions, iterative methodshave to stop before convergence [SV82], [HHL92], [HL94].

Detected counts in tube i are Poisson with mean bi =Pj aijxj . Hence basic relationship

is b ≈ Ax, A is projection matrix, b are counts and x is the density to be reconstructed.

EM iteration: x(k+1) = (AT (b./(Ax(k)))). ∗ x(k).

(e) True (f) k = 95 (g) k = 500



A New Estimate of the Stopping Level

Algorithm: For k until converged for m tubes

Calculate step of EM, x(k).

Update tube means b(k) = Ax(k). Bin all tubes to have b(k) > 20.

Calculate y = (b− b(k))./√

b(k), then y ∼ N (0, 1)

Calculate mean y and sample standard deviation s for yi, i = 1 : m.

Calculate α =√m− 1y/s, and pt(α), α ∼ t(m− 1) (t-student density with m− 1

degrees of freedom).

Calculate β = (m− 1)s2 and pN (β), β ∼ N (m− 1,p

2(m− 1)) (Gaussian densitymean m− 1).

Calculate likelihood of sampling α and β from the two distributions: l(k) = pt(α)pN (β).

When l(k) is maximum, l(k) < l(k−1). STOP Solution is x(k−1).



Simulations: Validation

Table: The best and the predicted stopping step for 11 simulations

Best 95 90 89 90 90 95 90 92 89 94 91

Pred 96 88 94 89 89 95 100 100 95 93 94

Figure: l(k) = pt(α)pN (β) for 500 steps. Maximized at k = 96.



Extension for Mammogram Denoising: Early Ideas

The ModelAssume blurring of the mammogram by PSF kernel K and measured image is b.

Deconvolve in the Fourier domain and invert to give noisy estimate of optical density d.

Each entry of d is a linear combination of x-ray energy with Poisson noise, and√

d isclose to normally distributed [ANS48].

To find true optical density x denoise the deconvolved d. Use total variation

minx

mXi=1

(√

di −√

xi)2 + λ2‖x‖TV , s. t. x ≥ 0.

Given knowledge of variance in noise in x-ray automatically select λ using statisticalestimation approach.



Trial Experiment: Use data set from UoFlorida (DDSM) cancer case 0001, left breast with CCscanning angle

(a) Original Image (b) Restored Image

Figure: Total yellow (calcification) reduced by deblurring. Rectangle at bottom rhs indicates deblurring



PSF Estimation in Blurring Problems using Edge Detection( Cochran, Gelb, Viswanathan,Renaut, Stefan)

Given the blurring model (PSF convolution operator K) and x ∈ L2(−π, π) piecewise-smooth.We estimate the psf starting with 2N + 1 blurred Fourier coefficients b(j), j = −N, ..., N .

b = K ∗ x+ e

Principle:Apply a linear edge detector, denote by T . We shall assume that the edge detector can bewritten as a convolution with an appropriate kernel

T ∗ (K ∗ x+ e) = (K ∗ x+ e) ∗ T= x ∗K ∗ T + e ∗ T= (x ∗ T ) ∗K + e ∗ T≈ [x] ∗K + e

Here [x](s) is a jump function. For a jump discontinuity in a function the jump function at anypoint s only depends on the values of x at s+ and s−.

[x](s) := x(s+)− x(s−)

Hence, we observe shifted and scaled replicates of the psf.



Example (No Noise)

−3 −2 −1 0 1 2 3−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

x

f(x)

Function

(a) True function

−3 −2 −1 0 1 2 3−0.5

0

0.5

1

1.5

x

h(x)

(b) Motion Blur PSF

−3 −2 −1 0 1 2 3−1.5

−1

−0.5

0

0.5

1

1.5

x

g(x)

Blurred Function

(c) Blurred Function

−3 −2 −1 0 1 2 3−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

x

f(x)

Function

Blur after edge detection

True blur (normalized)

(d) After applying edge detection

Figure: Function subjected to motion blur, N = 128

−3 −2 −1 0 1 2 3−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

x

f(x)

Function

(a) True function

−3 −2 −1 0 1 2 3−0.2

0

0.2

0.4

0.6

0.8

1

1.2

x

h(x)

(b) Gaussian Blur PSF

−3 −2 −1 0 1 2 3−1.5

−1

−0.5

0

0.5

1

1.5

2

x

g(x)

(c) Blurred Function

−3 −2 −1 0 1 2 3−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

x

f(x)

Function



(d) After applying edge detection

Figure: Function subjected to Gaussian blur, N = 128



Representative Examples : Gaussian PSF

−3 −2 −1 0 1 2 3−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

x

f(x)

Function

Blurred, noisy function



(a) Noisy blur estimation

−3 −2 −1 0 1 2 3−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

x

f(x)

Function




(b) After low-pass filtering

Figure: Function subjected to Gaussian blur, N = 128

Complex noise distribution on Fourier coefficients – e ∼ N (0, 1.5(2N+1)2

)

Second picture subjected to low-pass (Gaussian) filtering

It is conceivable that parameter estimation for a Gaussian PSF can take into account theeffect of Gaussian filtering



Representative Examples: Motion Blur

−3 −2 −1 0 1 2 3−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

x

f(x)

Function




(a) Noisy blur estimation

−3 −2 −1 0 1 2 3−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

x

f(x)

Function




(b) After TV denoising

Figure: Function subjected to Motion blur, N = 128

Cannot perform conventional low-pass filtering since blur is piecewise-smooth

We compute the noisy blur estimate for Fourier expansion of blurred jumpSN [b] ≈ [x] ∗K + e

Denoising problem formulation

minx

‖x− SN [b] ‖22 + λ2‖Dx ‖1.



Future Work Combining Approaches

Extend the parameter selection methods to the domain decomposition problems for largescale.

Use efficient schemes for large scale problems - eg right hand side updates

Extend to edge detection approaches

Use tensor product of the PSF for extension to 2D - is it feasible

Use parameter estimation techniques for the 2D problem

Further development of statistical techniques for estimating acceptable solutions.



Bibliography I

F. J. ANSCOMBE.The transformation of poisson, binomial and negative-binomial data.Biometrika, 35:246–254, 1948.

T. F. Chan and W. L. Wan.Analysis of projection methods for solving linear systems with multiple right-hand sides.1997.

H. Guo and R. A. Renaut.Revisiting stopping rules for iterative methods used in emission tomography: Analysis anddevelopments.Physics of Medicine and Biology, submitted, 2009.

H. M. Hudson, B. F. Hutton, and R. Larkin.Accelerated EM reconstruction using ordered subsets.J. Nucl. Med., 33:960, 1992.

H. M. Hudson and R. Larkin.Accelerated imaging reconstruction using orded subsets of projection data.IEEE Trans. Med. Imag., 13(4):601–609, 1994.



Bibliography II

Y. Lin, R. A. Renaut, and H. Guo.Multisplitting for regularized least squares.in prep., 2009.

J. Mead and R. A. Renaut.Least squares problems with inequality constraints as quadratic constraints.Linear Algebra and its Applications, 2009.

J. Mead and R. A. Renaut.A Newton root-finding algorithm for estimating the regularization parameter for solvingill-conditioned least squares problems.Inverse Problems, 25, 2009.

R. A. Renaut.A parallel multisplitting solution of the least squares problem.BIT, 1998.

R. A Renaut, I. Hnetynkova, and J. Mead.Regularization parameter estimation for large scale Tikhonov regularization using a prioriinformation.Computational Statistics and Data Analysis, 54(1), 2009.



Bibliography III

Y. Saad.On the Lanczos method for solving symmetric linear systems with several right-handsides.1987.

L. A. Shepp and Y. Vardi.Maximum likelihood reconstruction for emission tomography.IEEE Trans. Med. Imag., MI-1(2):113–122, Oct. 1982.

Curtis R. Vogel.Computational Methods for Inverse Problems.Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2002.



Future Work

Other Results and Future WorkSoftware Package!

Diagonal Weighting Schemes

Edge preserving regularization - Total Variation

Better handling of Colored Noise.

Residual Periodogram for large scale.



Algorithm Using the GSVD

GSVD

Use GSVD of [Wb1/2A,D]

For γi the generalized singular values, and s = UTWb1/2r

m = m− n+ p

si = si/(γ2i σ

2x + 1), i = 1, . . . , p, ti = siγi.

Find root of

F (σx) =

pXi=1

(1

γ2i σ

2x + 1

)s2i +mX

i=n+1

s2i − m = 0

Equivalently: solve F = 0, where

F (σx) = sT s− m and F ′(σx) = −2σx‖t‖22.



Practical Details of AlgorithmFind the parameter

Step 1: Bracket the root by logarithmic search on σ to handle the asymptotes: yieldssigmamax and sigmamin

Step 2: Calculate step, with steepness controlled by tolD. Let t = Dy/σ(k), where y isthe current update, then

step =1

2(

1

max {‖t‖, tolD} )2(JD(σ(k))− m)

Step 3: Introduce line search α(k) in Newton

sigmanew = σ(k)(1 + α(k)step)



Key Aspects of the Proof I: The Functional J

Algebraic Simplifications: Rewrite functional as quadratic form

Regularized solution given in terms of regularization matrix R(WD)

x = x0 + (ATWbA+DTWxD)−1ATWbr, (4)

= x0 +R(WD)Wb1/2r, r = b−Ax0

= x0 + y(WD). (5)

R(WD) = (ATWbA+DTWxD)−1ATWb1/2 (6)

Functional is given in terms of influence matrix A(WD)

A(WD) = Wb1/2AR(WD) (7)

JD(x) = rTWb1/2(Im −A(WD))Wb

1/2r, let r = Wb1/2r (8)

= rT (Im −A(WD))r. AQuadraticForm (9)



Key Aspects of the Proof II : Properties of a Quadratic Form

χ2 distribution of Quadratic Forms xTPx for normal variables (Fisher-Cochran Theorem)

Components xi are independent normal variables xi ∼ N(0, 1), i = 1 : n.

A necessary and sufficient condition that xTPx has a central χ2 distribution is that P isidempotent, P 2 = P . In which case the degrees of freedom of χ2 isrank(P ) =trace(P ) = n. .

When the means of xi are µi 6= 0, xTPx has a non-central χ2 distribution, withnon-centrality parameter c = µTPµ

A χ2 random variable with n degrees of freedom and centrality parameter c has meann+ c and variance 2(n+ 2c).



Key Aspects of the Proof III: Requires the GSVD

Lemma

Assume invertibility and m ≥ n ≥ p. There exist unitary matrices U ∈ Rm×m, V ∈ Rp×p,and a nonsingular matrix X ∈ Rn×n such that

A = U

»Υ

0(m−n)×n

–XT D = V [M, 0p×(n−p)]X

T , (10)

Υ = diag(υ1, . . . , υp, 1, . . . , 1) ∈ Rn×n, M = diag(µ1, . . . , µp) ∈ Rp×p,

0 ≤ υ1 ≤ · · · ≤ υp ≤ 1, 1 ≥ µ1 ≥ · · · ≥ µp > 0,υ2i + µ2

i = 1, i = 1, . . . p.(11)

The Functional with the GSVDLet Q = diag(µ1, . . . , µp, 0n−p, Im−n)

then J = rT (Im −A(WD))r = ‖QUT r‖22 = ‖k‖22



Proof IV: Statistical Distribution of the Weighted Residual

Covariance Structure

Errors in b are e ∼ N(0, Cb). Now b depends on x, b = Ax hence we can showb ∼ N(Ax0, Cb +ACDA

T ) (x0 is mean of x)

Residual r = b−Ax ∼ N(0, Cb +ACDAT ).

r = Wb1/2r ∼ N(0, I + ACDA

T ), A = Wb1/2A.

Use the GSVD

I + ACDAT = UQ−2UT , Q = diag(µ1, . . . , µp, In−p, Im−n)

Now k = QUT r then k ∼ N(0, QUT (UQ−2UT )UQ) ∼ N(0, Im)

But J = ‖QUT r‖2 = ‖k‖2, where k is the vector k excluding components p+ 1 : n.Thus

JD ∼ χ2(m+ p− n).



When mean of the parameters is not known, or x0 = 0 is not the mean

Corollary: non-central χ2 distribution of the regularized functionalRecall

x = argmin JD(x) = argmin{‖Ax− b‖2Wb+ ‖(x− x0)‖2WD

}, WD = DTWxD.

Assume all assumptions as before, but x 6= x0 is the mean vector of the model parameters.Let

c = ‖c‖22 = ‖QUTWb1/2A(x− x0)‖22

ThenJD ∼ χ2(m+ p− n, c)

The functional at optimum follows a non central χ2 distribution



A further result when A is not of full column rank

The Rank Deficient SolutionSuppose A is not full column rank. Then the filtered solution can be written in terms of theGSVD

xFILT(λ) =

pXi=p+1−r

γ2i

υi(γ2i + λ2)

sixi +nX

i=p+1

sixi =

pXi=1

fiυisixi +

nXi=p+1

sixi.

Here fi = 0, i = 1 : p− r, fi = γ2i /(γ

2i + λ2), i = p− r + 1 : p. This yields

J(xFILT(λ)) ∼ χ2(m− n+ r, c)

notice degrees of freedom are reduced.


towards solution of large scale image restoration and

Documents