multiresolution parameter choice method for total...

37
Copyright © by SIAM. Unauthorized reproduction of this article is prohibited. SIAM J. IMAGING SCIENCES c 2016 Society for Industrial and Applied Mathematics Vol. 9, No. 3, pp. 938–974 Multiresolution Parameter Choice Method for Total Variation Regularized Tomography * Kati Niinim¨ aki , Matti Lassas , Keijo H¨ am¨ al¨ ainen § , Aki Kallonen § , Ville Kolehmainen , Esa Niemi , and Samuli Siltanen Abstract. A computational method is introduced for choosing the regularization parameter for total variation (TV) regularization. A partial understanding of the properties of the method is provided by rigor- ously proving that the TV norms of the reconstructions converge with any choice of regularization parameter. The computational approach is based on computing reconstructions at a few different resolutions and various values of regularization parameter. The chosen parameter is the smallest one resulting in approximately discretization-invariant TV norms of the reconstructions. The method is tested with simulated and experimental X-ray tomography data and compared to the S-curve method. The results are comparable to those of the S-curve method. However, the S-curve method needs quantitative a priori information about the expected sparsity (TV norm) of the unknown, while the proposed method does not require such input parameters. Key words. total variation regularization, parameter choice, tomography, inverse problem, regularization AMS subject classifications. 15A29, 94A08, 92C55 DOI. 10.1137/15M1034076 1. Introduction. In X-ray tomography one collects projection images of an unknown two- or three-dimensional body from different directions. For each direction, an X-ray source is placed on one side of the body, and an X-ray detector is placed on the opposite side. After a calibration step involving a logarithm of the measured projection images, the X-ray tomography experiment is usually modeled by (1.1) e g = Au + e, where vector e g R M is the data and u R + is the X-ray attenuation function defined in a bounded domain Ω R n . In this work, we take Ω to be the unit square [0, 1] 2 . The operator A represents integrals of u over the areas or volumes connecting the X-ray source * Received by the editors August 5, 2015; accepted for publication (in revised form) April 18, 2016; published electronically July 12, 2016. http://www.siam.org/journals/siims/9-3/M103407.html Department of Mathematics, KTH Royal Institute of Technology, Stockholm SE-100 44, Sweden ([email protected]). The research of this author was supported by the Academy of Finland (Finnish Centre of Excellence in Inverse Problems Research 2012–2017), 250215. Department of Mathematics and Statistics, University of Helsinki, Helsinki 00014, Finland (Matti.Lassas@ helsinki.fi, esa.niemi@helsinki.fi, Samuli.Siltanen@helsinki.fi). The research of these authors was supported by the Academy of Finland (Finnish Centre of Excellence in Inverse Problems Research 2012–2017), 250215. § Department of Physics, University of Helsinki, Helsinki 00014, Finland (keijo.hamalainen@helsinki.fi, aki.kallonen@helsinki.fi). The research of the fourth author was supported by Suomen Akatemia (Academy of Finland), 1259526. Department of Applied Physics, University of Eastern Finland, Kuopio 70211, Finland (ville.kolehmainen@uef.fi). 938 Downloaded 09/20/16 to 130.216.48.126. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Upload: others

Post on 01-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

SIAM J. IMAGING SCIENCES c© 2016 Society for Industrial and Applied MathematicsVol. 9, No. 3, pp. 938–974

Multiresolution Parameter Choice Methodfor Total Variation Regularized Tomography∗

Kati Niinimaki† , Matti Lassas‡ , Keijo Hamalainen§ , Aki Kallonen§ , Ville Kolehmainen¶,Esa Niemi‡ , and Samuli Siltanen‡

Abstract. A computational method is introduced for choosing the regularization parameter for total variation(TV) regularization. A partial understanding of the properties of the method is provided by rigor-ously proving that the TV norms of the reconstructions converge with any choice of regularizationparameter. The computational approach is based on computing reconstructions at a few differentresolutions and various values of regularization parameter. The chosen parameter is the smallest oneresulting in approximately discretization-invariant TV norms of the reconstructions. The methodis tested with simulated and experimental X-ray tomography data and compared to the S-curvemethod. The results are comparable to those of the S-curve method. However, the S-curve methodneeds quantitative a priori information about the expected sparsity (TV norm) of the unknown,while the proposed method does not require such input parameters.

Key words. total variation regularization, parameter choice, tomography, inverse problem, regularization

AMS subject classifications. 15A29, 94A08, 92C55

DOI. 10.1137/15M1034076

1. Introduction. In X-ray tomography one collects projection images of an unknown two-or three-dimensional body from different directions. For each direction, an X-ray sourceis placed on one side of the body, and an X-ray detector is placed on the opposite side.After a calibration step involving a logarithm of the measured projection images, the X-raytomography experiment is usually modeled by

(1.1) g = Au+ e,

where vector g ∈ RM is the data and u : Ω → R+ is the X-ray attenuation function definedin a bounded domain Ω ⊂ Rn. In this work, we take Ω to be the unit square [0, 1]2. Theoperator A represents integrals of u over the areas or volumes connecting the X-ray source

∗Received by the editors August 5, 2015; accepted for publication (in revised form) April 18, 2016; publishedelectronically July 12, 2016.

http://www.siam.org/journals/siims/9-3/M103407.html†Department of Mathematics, KTH Royal Institute of Technology, Stockholm SE-100 44, Sweden ([email protected]).

The research of this author was supported by the Academy of Finland (Finnish Centre of Excellence in InverseProblems Research 2012–2017), 250215.‡Department of Mathematics and Statistics, University of Helsinki, Helsinki 00014, Finland (Matti.Lassas@

helsinki.fi, [email protected], [email protected]). The research of these authors was supported by theAcademy of Finland (Finnish Centre of Excellence in Inverse Problems Research 2012–2017), 250215.§Department of Physics, University of Helsinki, Helsinki 00014, Finland ([email protected],

[email protected]). The research of the fourth author was supported by Suomen Akatemia (Academy ofFinland), 1259526.¶Department of Applied Physics, University of Eastern Finland, Kuopio 70211, Finland ([email protected]).

938

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 2: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TV PARAMETER CHOICE 939

(a)

u(x)

u:Ω→R+

(b)

f ∈ R16

(c)

f ∈ R64

(d)

f ∈ R256

Figure 1. Schematic illustration about the relationship between continuous and discrete tomographic models.(a) Continuous model for two-dimensional X-ray tomography. The function u(x) of (1.1) gives the pointwiseX-ray attenuation coefficient inside the domain Ω shown here as a square. The dots show the five locations ofthe X-ray source. The detector has 10 pixels, so there is a total of 50 data points. (b)–(d) Discrete modelswith various resolutions. The attenuation coefficient is modeled as constant inside each pixel. The point of thisfigure is to demonstrate how the resolution of our computational model can be freely chosen, while the numberof data points is fixed.

and a pixel in the detector; see Figure 1(a). The model in (1.1) is approximate in the sensethat it neglects scattering phenomena [76, 2] and effects of nonmonochromatic radiation,such as beam hardening [83]. Furthermore, e ∈ RM is a Gaussian random vector modelingmeasurement noise.

The inverse problem is given the measurement geometry and noisy data g, recover a dis-crete approximation f ∈ RN to u. Depending on the number and geometry of projectiondirections, this inverse problem is either mildly or strongly ill-posed [66, 65]. Ill-posednessmeans high sensitivity to inaccuracies in modeling and to errors in measurement. Overcomingthis sensitivity requires regularization.

In this work, we study regularizing two-dimensional (2D) tomographic problems using totalvariation (TV) regularization, which was introduced in [74]. We need a finite approximationto the continuous measurement model (1.1). Consider the discrete model

(1.2) g = Af ,

where f ∈ RN represents the discretized 2D body under imaging. We divide the square Ω intoN = n× n square pixels with side length 1/n. The function u is approximated by a functionwhich is constant inside each pixel; those pixel values are listed as elements of the vector f .Further, g ∈ RM denotes the measurement data consisting of pixel values in digital X-rayprojection images, and A ∈ RM×N is a matrix approximation to A (see [65, section 2.3.4]).

The discrete TV-regularized solution is defined by

(1.3) fα,n := arg minf∈RN+

‖Af − g‖22 +

α

n

∑κ∼κ′|fκ − fκ′ |

,

where α > 0 is a so-called regularization parameter. We use the anisotropic definition of TV:in (1.3) the relation κ ∼ κ′ is true whenever elements fκ and fκ′ of the vector f correspondto values at either horizontally or vertically neighboring pixels.

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 3: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

940 NIINIMAKI ET AL.

We remark that the ordinary least-squares formulation in (1.3) is a simplification that doesnot take into account, e.g., the nonuniform noise variance present in CT; see, for example,[69, 94] for details.

How to choose the regularization parameter α in (1.3)? Many methods have been suggestedin the literature. Classical approaches include cross-validation (see [42] for an early examplewith Tikhonov-regularized tomography), the L-curve method [48], and Stein’s unbiased riskestimates (SURE) [72, 73, 33]. See also the interesting manuscript [75]. One possibility forparameter choice is to use the Bayesian framework by interpreting TV as a MAP estimator[54, 12], provided that the signal has suitable statistical properties. Clason, Jin, and Kunischproposed a method based on balancing `1 and TV norms [25] and tested it on image deblurringproblems. Multiresolution-based parameter choices are discussed by Dong, Hintermuller, andRincon-Camacho [31] and Frick, Marnitz, and Munk [40]. Wen and Chan [93] suggested adiscrepancy principle designed for TV and tested it with image restoration. The so-calledS-curve method was introduced in [56] and implemented for wavelet-based tomography in[47]. TV parameter choices based on the quasi-optimality principle and Hanke–Raus rulesare described by Kindermann, Mutimbu, and Resmerita in [55]. Furthermore, Chen, LoliPiccolomini, and Zama [24] suggest determining the TV parameter in image deblurring byincluding it in a Karush–Kuhn–Tucker (KKT) system. Toma, Sixou, and Peyrin introduce aniterative choice in [86].

There is so far no parameter choice method that would be the best for all applications.Rather, it seems that for any given application there is a quest for a method that givesconsistent, robust, and useful results. For this reason we think that there needs to be a largecollection of methods for parameter choice, based on different principles, for scientists andengineers to have a better chance of finding a good option for each practical situation.

We propose a new parameter choice method, based on solving the inverse problem atmultiple resolutions. Namely, while the number M of measurement points is fixed (it issimply the number of detector elements times the number of projection images), the numberN = n×n of pixels in the reconstruction can be freely chosen. See Figure 1 for an illustration.Our method is different from all previous methods.

Given a tomographic dataset, we can use definition (1.3) to compute reconstructions inthe same square domain but at varying resolutions n. We prove in section 3 that if we keepα > 0 in (1.3) fixed and let n → ∞, then fα,n converges to a limit image. The technique ofproof is analogous to the one-dimensional result [61].

We propose the following practical method for choosing the parameter α > 0.Parameter choice method. Given a noisy dataset g, compute fα,n defined by (1.3) for a set

of α values and at two or more resolutions n, including the resolution intended for showing thefinal result. Calculate the discrete TV norms of the reconstructions, and define the optimalα to be the smallest value that leads to TV norms that do not depend significantly on n.

At this point we do not claim that the above algorithm is a parameter choice rule inthe strict sense as defined in [34]. However, the numerical evidence presented in section 5suggests that the method works well and, as expected, leads to larger α when the noise levelis increased.

In light of our new theorem, the TV norms of the reconstructions converge for any choiceof α as n → ∞. How exactly can the above approach work at all? It seems that the ill-

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 4: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TV PARAMETER CHOICE 941

posedness of the tomography problem shows up as significant n-dependence of the TV normsof reconstructions whenever α is too small to regularize away the instability caused by theterm ‖Af − g‖22. Since the final intended resolution is included in the test resolutions, ourchoice method adapts to the relevant resolution and the noise level.

The reader may wonder why α is divided by n in (1.3). The TV regularization formulationfor smooth functions takes the form

(1.4) arg minu≥0

‖Au− g‖22 + α

∫Ω|∇u|dx

to the discrete minimization problem (1.3). The division by n comes from approximationswith a finite difference quotient and a midpoint rule integration quadrature:

(1.5)

∫Ω

∣∣∣∣ ∂u∂x1

∣∣∣∣ dx ≈ ∫Ω

∣∣∣∣∣u(x1 + 1n , x2)− u(x1, x2)

1n

∣∣∣∣∣ dx ≈(

1

n

)2 ∑κ∼κ′

∣∣∣∣∣fκ − fκ′1n

∣∣∣∣∣ .In (1.5) the relation κ ∼ κ′ is true whenever elements fκ and fκ′ of f correspond to values athorizontally neighboring pixels. A similar computation using vertical differences is needed aswell for the analysis regarding (1.4).

Applying TV regularization to tomographic problems dates back to 1998 [29] in the caseof simulated data and to 2003 [57] in the case of measured data. Since then, there have beenmany further studies [58, 63, 80, 50, 84, 32, 8, 52, 85, 53, 78, 22, 23].

There are many computational approaches to minimizing the TV-regularized least-squaresfunctional (1.3). These include quadratic programming [61, 51, 43, 56, 47], lagged diffusivitymethods [30], domain decomposition methods [39, 38], Bregman distance methods [70, 95, 44,14, 96], primal-dual methods [20, 19, 41, 35, 67], finite element methods [37, 5], discontinuousGalerkin methods [46], and other methods [62, 91, 92, 45, 17]. See also [90, 71, 21, 77, 49].In this paper we use a primal-dual interior-point (PD-IP) quadratic programming methodbased on dividing the unknown into nonnegative and nonpositive parts in the spirit of [65,section 6.2]. This is not the most efficient TV regularization algorithm, but it works well forus. One advantage of this approach is natural and effective enforcement of nonnegativity ofthe attenuation coefficient.

Scientific novelties of this paper include• proving the convergence of TV reconstructions as the resolution is increased;• testing the S-curve method in the context of TV-regularized tomography; and• introducing a novel multiresolution parameter choice method and testing it with mea-

sured X-ray data.While we discuss here the new multiresolution parameter choice method for 2D tomographicproblems only, the computational approach generalizes directly to three-dimensional settingsand other inverse problems as well. A one-dimensional analogue of our convergence result isavailable in [61].

Let us comment on related results in the literature. The interplay between resolution andTikhonov regularization was studied by Vainikko in [88] in 1992. Chambolle’s study [15] ofMumford–Shah type image segmentation uses geometric arguments very similar to the proofof Lemma 5 below. Estimated TV norm constraint is also considered in [26] in connection

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 5: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

942 NIINIMAKI ET AL.

with an adaptive level set algorithm. The papers [16, 89, 7, 87] study Γ-convergence of TVfunctionals as the discretization is refined, but without a measurement operator crucial fortomography. We note that the convergence results of the type we prove are quite delicate, asshown by [5, Example 4.1].

Most closely connected to our work is [13], where Cai et al. use wavelet frames to show Γ-convergence of discrete regularization functionals to the infinite-dimensional variational func-tional. Penalizing `1 norms of wavelet coefficients in their approach leads to a regularizationresembling TV (similarly to [60] in the context of Besov B1

11 priors in Bayesian inversion).However, they have the luxury of representing functions using frames, which is not availablein our case of actual anisotropic TV regularization.

2. The space BV (D): Definitions and approximations. Let D be the square [0, 1]2 ⊂R2. We use a special `′1-based anisotropic norm for the space BV (D) of functions of boundedvariation defined on D.

Definition 1. The space BV (D) consists of functions u ∈ L1(D) whose distributionalderivative ∇u belongs to M(D,R2) and for which

supϕ

〈∇u, ϕ〉

∣∣∣ϕ ∈ Cc(D,R2), ‖ϕ`‖∞ ≤ 1<∞.

Above M(D,R2) denotes R2-valued Borel measures on D.

Definition 2. Define an anisotropic seminorm for BV (D) in weak form by

(2.1) V (u) := V (u;D) =2∑`=1

supϕ`

⟨∂u(x)

∂x`, ϕ`(x)

⟩ ∣∣∣ϕj ∈ Cc(D,R), ‖ϕ`‖∞ ≤ 1

.

Note that if u ∈ BV (D) is smooth, then we have

(2.2) V (u) =

∫D

(∣∣∣∣∂u(x)

∂x1

∣∣∣∣+

∣∣∣∣∂u(x)

∂x2

∣∣∣∣)dx.

The anisotropic ABV norm is defined by

(2.3) ‖u‖ABV (D) := ‖u‖L1(D) + V (u;D).

We will use two different concepts of weak convergence in the space BV (D).

Definition 3. A sequence uj ∈ BV (D) converges weakly in BV (D) to limit u as j → ∞,that is, uj −→BVw u, if uj and their distributional derivatives ∇uj satisfy

(2.4)

limj→∞

‖uj − u‖L1(D) = 0 and

∇uj → ∇u weakly in M(D,R2) as j →∞.

Note that M(D,R2) is the dual space of C(D;R2). Recall now from [3, Definition 10.1.3]the definition of another useful type of weak convergence.

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 6: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TV PARAMETER CHOICE 943

Definition 4. Let uj be a sequence in BV (D) and u ∈ BV (D). Then uj converges to u asj →∞ in the sense of anisotropic intermediate convergence if and only if

limj→∞

‖uj − u‖L1(D) = 0 and(2.5)

limj→∞

V (uj) = V (u).(2.6)

Also, below we use the following distance function leading to a notion of convergence equivalentto (2.5) and (2.6):

(2.7) ρ1,1(u, v) = ‖u− v‖L1(D) + |V (u)− V (v)|.

According to [3, Proposition 10.1.1], anisotropic intermediate convergence of Definition 4implies weak convergence of Definition 3.

We need a result on approximating elements of BV (D) by functions that are piecewiseconstant on square pixels of fine enough resolution. For each j > 0, consider the closed dyadicsquares

(2.8) Sj,~k

= Sj,k1,k2 = [k12−j , (k1 + 1)2−j ]× [k22−j , (k2 + 1)2−j ] ⊂ D,

where 0 ≤ k1 ≤ 2j − 1 and 0 ≤ k2 ≤ 2j − 1. See Figure 2(b) for an illustration of a dyadicgrid formed by such squares. Let Tj : L2(D)→ L2(D) be the orthogonal projections definedby averaging over squares:

(2.9) (Tju)∣∣int(S

j,~k)

=1

|Sj,~k|

∫Sj,~k

u(x) dx.

For any j > 0, the set Yj := Range(Tj) consists of L2(D) functions that are piecewise constantin the open squares belonging to the rectangular 2j×2j grid (2.8).

Lemma 5. For all u ∈ BV (D) and ε > 0 there is j1 > 0 such that there is a v2 that ispiecewise constant in the dyadic 2j1×2j1 grid (2.8) and satisfies

ρ1,1(u, v2) < ε,

where the ρ1,1-distance is defined by (2.7).

Proof. Let u ∈ BV and ε > 0. We know from [6] that there is a function v0 which ispiecewise constant in a finite triangularization of D and satisfies

(2.10) ρ1,1(u, v0) < ε/3.

Set M = ‖v0‖L∞(D). Let us choose some notation for the concepts related to the fixedtriangularization corresponding to v0.

1. Triangles are denoted by Tν ⊂ D with ν = 1, 2, . . . , T .2. Edges of the triangles are line segments γ` ⊂ D indexed by ` = 1, 2, . . . , L.

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 7: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

944 NIINIMAKI ET AL.

(a) (b)

Figure 2. Examples of different discretizations of the square domain D = [0, 1]2 used below overlaid in thesame square. (a) Finite triangulation. (b) Square grid corresponding to j = 4 and n = 16.

(a) (b) (c) (d)

Br

B′r

Figure 3. Possible positions of a triangle vertex (blue dot) in the dyadic square grid. The triangle edgesare shown in blue. The sets B′r are shown as yellow and the clusters Br as light gray. (a) Vertex at the cornerof squares. (b) Vertex at a horizontal edge between squares. (c) Vertex at a vertical edge between squares. (d)Vertex in the interior of a square.

3. Vertices are denoted by pr ∈ D with r = 1, 2, . . . , R.4. The minimum distance between two separate vertices is denoted by δ0 > 0.5. The minimum angle between any two edges sharing an endpoint is called α0 > 0.6. The length of the longest edge is denoted by G.

See Figure 2(a) for an illustration of a finite triangularization.Next we define clusters. Given j > 1, let B′r = B′r,j be the union of all the closed squares

Sj,~k

containing the vertex pr. See the orange sets in Figure 3. Furthermore, define clustersBr = Br,j by

(2.11) Br =⋃S

j,~k|S

j,~k∩B′r 6= ∅.

Then pr ∈ B′r ⊂ Br. The clusters are shown in gray in Figures 3, 4, and 7.In the following, consider integers j0 large enough for the following to hold:

(2.12) 4√

2 · 2−j0 < δ0.

Now (2.12) implies that the sets Br are disjoint and contain at most one vertex each.Let v1 be the function obtained from v0 by replacing its value in each cluster Br by 0.

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 8: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TV PARAMETER CHOICE 945

Then we have

‖v0 − v1‖L1(D) ≤R∑r=1

∫Br

|v0 − v1|dm ≤ 16RM(2−j0)2

since each cluster consists of at most 16 squares of area (2−j0)2. Furthermore, the V expression(2.2) can be very simply computed for functions that are piecewise constant with respect toeither a finite triangulation or a square grid. Namely, (2.2) is the sum of (Manhattan distance)lengths of jump curve segments multiplied by the jump in function value over those curvesegments. Therefore, we can estimate

|V (v0)− V (v1)| ≤ 8LMR2−j0 + 32MR2−j0 ,

where the first term on the right-hand side comes from the triangularization. We used thefacts that the Manhattan length of any edge restricted inside a cluster is at most 4 · 2−j0 andthat v0 can jump at most 2M at an edge. The second term on the right-hand side comes fromthe jump of v1, bounded by 2M , at the cluster boundary whose maximal length is 16 · 2−j0 .

Now we can fix j0 so large that (2.12) holds and we have

(2.13) ρ1,1(v0, v1) < ε/3.

Next we construct tubular neighborhoods. Denote the union of all clusters by

B =

R⋃r=1

Br.

Note that B ⊂ R2 is a closed set. Further, consider the parts of triangle edges that are outsidethe clusters:

(2.14) γ` := γ` \ B.

For all % > 0, define

(2.15) Γ`,% := x ∈ R2 |dist(x, γ`) < % \ B,

where “dist” refers to Euclidean distance. The sets Γ`,% are open tubular neighborhoods of γ`in D \ B. See Figures 4(a) and 5.

We need to determine an upper bound for % > 0 ensuring that the sets Γ`,% are disjoint.It is clear from the construction of the sets Br and B′r that (Euclidean) dist(pr, D \B) > 2−j0 .As illustrated in Figure 4(a), the minimal angle α0 between edges determines a lower boundfor the distance between the intersection points of two edges with the boundary ∂Br. Atrigonometric consideration shown in Figure 4(b) implies that the tubular neighborhoods Γ`,%are disjoint whenever % satisfies 0 < % < h0 with

(2.16) h0 := 2−j0 tanα0

2.

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 9: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

946 NIINIMAKI ET AL.

(a)

prα0

Br

B′r

γ1

Γ1,%

γ2

Γ2,%

(b)

2−j0

α0

2

h0

Figure 4. (a) Illustration of two disjoint sets Γ1,% and Γ2,% (pink) as defined in (2.15). The two edges γ1and γ2, shown in blue, have the minimal angle α0 between them. The construction of Br and B′r is done in thesquare grid of size 2−j0×2−j0 . (b) The worst-case limit situation when pr is at the boundary of B′r is wherethe definition (2.16) comes from.

See Figure 5 for an illustration of the tubular neighborhoods in a triangulation.Next we consider refining the square grid outside the clusters as shown in Figure 6. The

smaller squares of size 2−j1×2−j1 are defined by the formula (2.8) with a large enough j1 > j0.The idea is to take j1 so large that we can fit a dyadic polygonal chain inside the tubularneighborhoods, as shown in Figure 7(a).

Consider two vertices p1 and p2 connected by an edge γ1. (There is no loss of generalitysince we can always renumber the vertices and edges.) Take j1 so large that

(2.17) 2−j1 <%

4.

Then we can connect the clusters B1 and B2 using a piecewise linear curve

c1 : [0, 1]→ Γ1,%

with the following properties:• The edge γ1 and the dyadic polygonal chain c1 are disjoint: γ1 ∩ c1 = ∅.• The curve starts from the boundary of B1 and ends at the boundary of B2; in other

words, c1(0) ∈ ∂B1 and c1(1) ∈ ∂B2.

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 10: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TV PARAMETER CHOICE 947

Figure 5. This is an illustration of the tubular neighborhoods Γ`,% (pink), defined in (2.15), which aredisjoint whenever 0 < % < h0 with h0 given by (2.16). The vertices pr are shown as blue dots, and the “triangleedges outside clusters,” or line segments γ` defined in (2.14), are shown as blue lines.

• The curve c1 is close to the edge γ1:

(2.18) minx∈γ1‖c1(t)− x‖2 < 3 · 2−j1 for all t ∈ [0, 1].

• The range c1([0, 1]) of the curve consists of boundary segments of squares belongingto the 2−j1×2−j1 grid.• The curve is injective: if t1 6= t2, then c1(t1) 6= c1(t2).• The dyadic polygonal approximation c1 is optimal in the following sense: if c′1 :

[0, 1] → Γ1,% is another curve satisfying all of the above conditions for c1, thenlength(c′1([0, 1])) ≥ length(c1([0, 1])).

See Figure 6 for an illustration. Construct c` corresponding to each γ` with ` = 1, . . . , L.We move on to approximating functions using a square grid. Consider the function v1,

which is piecewise constant in the triangulation of D, restricted to a tubular neighborhoodΓ`,%. The line segment γ` divides Γ`,% into two components, and v1 has a constant value ineach component. See Figure 7(b).

Now define another function v2, which is piecewise constant on the square grid of size2−j1×2−j1 . Outside the tubular neighborhoods Γ`,% we set v1 ≡ v2. Inside Γ`,% we define thevalues of v2 uniquely as follows. Consider the dyadic polygonal chains c` that are located closeto the edges γ`. Now v2 is allowed to have discontinuities only along the curves c` or at the

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 11: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

948 NIINIMAKI ET AL.

B1 γ1

B2

Figure 6. Refining the grid outside clusters. The bigger squares have size 2−j0×2−j0 , and the smallersquares have size 2−j1×2−j1 . A dyadic polygonal chain inside the tubular neighborhood Γ1,% connects theclusters B1 and B2.

cluster boundaries ∂Br. See Figure 7(c).By the construction of the dyadic polygonal chain the area between an edge γ` and the

polygonal chain is bounded by 3G2−j1 . Therefore, we can estimate

(2.19) ‖v1 − v2‖L1(D) ≤ 6MG2−j1 ,

since we have |v1(x)− v2(x)| ≤ 2M for almost all x ∈ D.Now both v1 and v2 are piecewise constant, and the domains of constant value are polygons.

The expressions V (v1) and V (v2) can thus be calculated by summing the following quantitiesover the linear boundary segments: the jump in function values over the segment multipliedby the Manhattan length of the segment. Note that in |V (v1)− V (v2)| all of the terms canceleach other except those coming from boundary segments located in the tubular neighborhoodsΓ`,%. Due to the construction of c`, the Manhattan and Euclidean lengths of c` are the same.Further, the Manhattan length of γ` equals the length of c`, apart from the difference betweenthe two red intervals I` and I ′` shown in Figure 7(a). By (2.18) we have∣∣|I`| − |I ′`|∣∣ ≤ |I`|+ |I ′`| ≤ 6 · 2−j1 .

This together with the maximal jump size 2M allows us to estimate

(2.20) |V (v1)− V (v2)| ≤ 12ML2−j1 .

Using (2.19) and (2.20) we see that for large enough j1 we have ρ1,1(v1, v2) ≤ ε/3, and theproof is complete.

3. Multiresolution convergence analysis. Define a functional S∞ : BV (D)→ R by

(3.1) S∞(u) = ‖Au−m‖2L2(Ω) + α1‖u‖L1(D) + αV (u)

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 12: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TV PARAMETER CHOICE 949

(a)

I`

I ′`

(b)

v1 = a

v1 = b

γ`

(c)

v2 = a

v2 = b

c`

Figure 7. Square-grid based approximation of functions that are piecewise constant on a triangulation. (a)Detail from a situation similar to that shown in Figure 6. The straight blue line is the edge γ`, and the blackjagged line is the dyadic polygonal chain c`. The red intervals I` and I ′` are used for estimating the differencebetween the BV norms of v1 and v2. (b) A function v1 which is piecewise constant on a triangulation. (c)Approximation of v1 by another function v2 which is piecewise constant on a square grid. The difference of theBV norms of v1 and v2 comes entirely from jumps over the two red vertical intervals shown in (a).

with positive regularization parameters α1 > 0 and α > 0, and V given by (2.2). Thefunctional S∞ defines an infinite-dimensional minimization problem which we will approximateby discrete minimization problems. For this we define functionals Sj : BV (D)→ R∪ ∞ by

(3.2) Sj(u) =

S∞(u) for u ∈ Range(Tj),∞ for u 6∈ Range(Tj).

Denote

(3.3) s∞ := infu∈BV

S∞(u), sj := infu∈BV

Sj(u).

We study the existence and properties of minimizers uj , u∞ ∈ BV (D) satisfying Sj(uj) = sjand S∞(u∞) = s∞, respectively. Such minimizers may not be unique. Therefore, we use thenotation

arg min(Sj) = u ∈ BV (D) : Sj(u) = sj,arg min(S∞) = u ∈ BV (D) : S∞(u) = s∞.

We are now ready to state our main theorem.

Theorem 6. Assume either (A) or (B) about the linear operator A:(A) A : L2(D)→ L2(Ω) is compact and A : L1(D)→ D′(Ω) is continuous with some open

and bounded set Ω ⊂ R2.(B) A : L1(D)→ RM is bounded.

Let the functionals Sj and S∞ be given by (3.2) and (3.1), respectively, and define sj and s∞as in (3.3). Then the following hold:

(i) limj→∞ sj = s∞.(ii) There exists a minimizer uj ∈ arg min(Sj) for all j = 1, 2, 3, . . . .(iii) There exists a minimizer u∞ ∈ arg min(S∞).

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 13: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

950 NIINIMAKI ET AL.

(iv) Any sequence uj ∈ arg min(Sj) of minimizers has a subsequence uj(`) that convergesweakly in BV (D) to some limit w ∈ BV (D), or, more briefly, uj(`) −→BVw w as `→∞.Furthermore, lim`→∞ V (uj(`)) = V (w); that is, the subsequence uj(`) converges to win the anisotropic intermediate topology of BV (D).

(v) The limit w =: u∞ in (iv) is a minimizer: u∞ ∈ arg min(S∞).

The set Ω in (3.1) can be an open rectangle with L2(Ω), modeling sinogram space, or Ω =1, 2, . . . ,M, equipped with the counting measure, leading to L2(Ω) = RM . These twooptions correspond to assumptions (A) and (B) in Theorem 6.

The connection between Theorem 6 and the computational explanation in the introductionis this. In the computational part of this paper we take α1 = 0, which roughly correspondsto having a really small α1 > 0 in the theory. The minimizing vector fα,n ∈ Rn×n defined in

(1.3) can be seen as an n× n matrix with n = 2j . The matrix fα,n is a 2D array of the valuesof a piecewise constant minimizer function uj ∈ arg min(Sj) in the pixels (2.8). Thus we see

that for any choice of regularization parameter α in (1.3), the TV norms of fα,n with n = 2j

converge to a limit as j →∞. The measurement operator discussed in the introduction is asin assumption (B).

Remark. Our convergence result is quite delicate. Even a slight change of norms willdestroy the argumentation. See Appendix A for more details about this.

Before proving the main theorem we prove a couple of lemmas.

Lemma 7. Let the assumptions of Theorem 6 hold. Assume that uj → u as j →∞ in theweak topology of BV (D) given in Definition 3. Then

(3.4) limj→∞

Auj = Au

with convergence in L2(Ω) or in RM in the cases of assumption (A) or (B) in Theorem 6,respectively. Moreover, it follows that

‖Au−m‖2L2(Ω) + α1‖u‖L1(D) + αV (u)

≤ lim infj→∞

(‖Auj −m‖2L2(Ω) + α1‖uj‖L1(D) + αV (uj)

).(3.5)

Proof. Assume (A) of Theorem 6. By [3, Theorem 10.1.3] we have

‖u‖L2(D) ≤ C1‖u‖BV (D).

Moreover, there is c1 such that ‖uj‖BV (D) ≤ c1 for all n. Hence, ‖uj‖L2(D) ≤ C for all j. Toshow that Auj converges in L2(Ω) to Au as j →∞, assume the opposite. Then there is ε > 0and a subsequence j(`)→∞ as `→∞ such that

‖Auj(`) −Au‖L2(Ω) ≥ ε.(3.6)

Now ‖uj(`)‖L2(D) ≤ C, and as A : L2(D) → L2(Ω) is compact, there is a subsequence uj(`p)

and w ∈ L2(Ω) such that

limp→∞

‖Auj(`p) − w‖L2(Ω) = 0.(3.7)

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 14: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TV PARAMETER CHOICE 951

On the other hand, as uj −→BVw u, we have that uj → u in the strong topology of L1(D) bydefinition. This implies that

limp→∞

Auj(`p) = Au in D′(Ω).(3.8)

The above two limits (3.7) in L2(Ω) and (3.8) in D′(Ω) imply that Au = w. This, (3.6), and(3.7) are in contradiction. Hence we have (3.4) in the case of assumption (A) in Theorem 6.

Inequality (3.5) follows from the above and [3, Proposition 2.4.12(c)].Assumption (B) can be handled by adding a projection operator similar to (2.9).

Lemma 8. Let the assumptions of Theorem 6 hold. Consider the functionals Sj : (BV (D), T ′′)→ [0,+∞] defined in (3.2) and S∞ : (BV (D), T ′′) → [0,+∞] defined in (3.1). Here T ′′ de-notes the anisotropic intermediate topology of Definition 4. Then the sequence Sj of functionalsΓ-converges to S∞ as j →∞.

Proof. Note that one can use the distance function (2.7) to show that T ′′ is metrizable.Because of the metrizability, showing Γ-convergence can be done by proving two things [28,Proposition 8.1]. The lower-bound inequality

(3.9) S∞(u) ≤ lim infj→∞

Sj(uj)

should hold whenever there is anisotropic intermediate convergence of uj ∈ BV (D) to thelimit u ∈ BV (D). Further, we must show that for every u ∈ BV (D) there is a sequenceuj ∈ BV (D) that converges to u in the anisotropic intermediate topology and satisfies theupper-bound inequality

(3.10) S∞(u) ≥ lim supj→∞

Sj(uj).

This proves the lower-bound inequality (3.9). Assume that uj converges to u in the anisotropicintermediate topology of BV as j → ∞. Then by definition we have limj→∞ ‖uj‖L1(D) =‖u‖L1(D). Furthermore, since anisotropic intermediate convergence implies weak conver-gence by [3, Proposition 10.1.1], Lemma 7 shows that limj→∞Auj = Au in L2(Ω), andso limj→∞ ‖Auj−m‖L2(Ω) = ‖Au−m‖L2(Ω). Moreover, anisotropic intermediate convergenceimplies V (u) = limj→∞ V (uj). Now

S∞(u) = ‖Au−m‖2L2(Ω) + α1‖u‖L1(D) + αV (u)

= limj→∞

(‖Auj −m‖2L2(Ω) + α1‖uj‖L1(D) + αV (uj)

)(3.11)

= lim infj→∞

S∞(uj)

≤ lim infj→∞

Sj(uj).(3.12)

Proving the upper-bound inequality (3.10). Take any u ∈ BV (D). Recall the orthogonalprojection operators Tj : L2(D) → L2(D) defined in (2.9), and the spaces Yj = Range(Tj)of piecewise constant functions. As Yj ⊂ Yj+1, by Lemma 5 there are uj ∈ Yj such thatlimj→∞ ρ1,1(vj , u) = 0, so there is anisotropic intermediate convergence of uj to u as j →∞.Now uj ∈ Yj and formula (3.2) together imply Sj(uj) = S∞(uj). The upper-bound inequality(3.10) now follows from (3.11).

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 15: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

952 NIINIMAKI ET AL.

Proof of Theorem 6. Let us restrict Sj and S∞ to the set

BR = u ∈ BV (D) : (α1‖u‖L1(D) + αV (u)) ≤ R

and call the resulting functionals S(R)j and S

(R)∞ , respectively. Denote by T ′ the topology

in BR ⊂ BV (D) induced by the weak topology of BV (D). Since C(D) is separable, thespace (BR, T ′) is metrizable and, in particular, satisfies the first axiom of countability as atopological space.

Next we show that the restricted functionals S(R)j Γ-converge to the restricted limit func-

tional S(R)∞ . When uj ∈ BR converges in the weak topology to the limit u ∈ BR, the lower-

bound inequality (3.9) follows from Lemma 7. Furthermore, Lemma 8 yields that for allu ∈ BR there is some sequence uj ∈ BV (D) converging to u in the anisotropic intermediatetopology (and therefore in the weak topology) of BV for which the the upper-bound inequality(3.10) holds. Then, we see that the sequence

(3.13) vj(x) :=V (u) + ‖u‖L1(D)

V (uj) + ‖uj‖L1(D) + 1j

uj(x)

satisfies vj ∈ BR, vj converges to u in the anisotropic intermediate topology of BV , and theupper-bound inequality (3.10) is valid also for the sequence vj and u. As the anisotropicintermediate convergence in BV implies the weak convergence in BV , these facts show that

S(R)j Γ-converge to S

(R)∞ , in (BR, T ′), as j →∞ for any fixed R > 0; see, e.g., [4, 3].

Let us prove the existence of minimizers for S(R)∞ and S

(R)j . Note first that

• by Lemma 7 the functionals S∞ and Sj are lower semicontinuous with respect to theweak topology of BV (D), and• by the Banach–Alaoglu theorem and [3, Theorem 10.1.4] the space (BR, T ′) is compact.

Therefore, the functionals S∞ and Sj have minimizers for any R > 0.

Set R = s∞+1 with s∞ defined in (3.3). Then there exists a minimizer u∞ ∈ arg min(S(R)∞ )

belonging to the open interior of BR.Using (3.13), we see that there are uj ∈ Yj ∩ BR for which uj → u∞ in the anisotropic

intermediate topology and (3.12) holds. For all large enough j we have

S∞(uj) ≤ S∞(u∞) + 1 = s∞ + 1.

As Sj(uj) = S∞(uj) and Sm(u) ≤ Sj(u) for m ≥ j, we see that for all large enough j the

minimizers of the functionals S(R)j coincide with the minimizers of Sj . Therefore, we do not

lose any generality in our minimization analysis by restricting to BR.Let us now study the convergence properties of minimizers. As we saw above, for large

enough j all minimizers uj ∈ argmin(Sj) = argmin(S(R)j ) and u∞ ∈ argmin(S∞) = argmin(S

(R)∞ )

are in the ball BR. The compactness of (BR, T ′) implies that any sequence (uj)∞j=1 of mini-

mizers has a subsequence (uj(`))∞`=1 converging to some w ∈ BR in the weak topology of BV .

By [28, Corollary 7.20], the limit w is then a minimizer of S(R)∞ , and we have

(3.14) lim`→∞

Sj(`)(uj(`)) = lim`→∞

S(R)j(`)(uj(`)) = S(R)

∞ (w) = S∞(w) = s∞.

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 16: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TV PARAMETER CHOICE 953

Moreover, by [28, Corollary 7.20], (3.14) holds for an arbitrary sequence uj(`) ∈ argmin(Sj(`)),` = 1, 2, . . . , that converges in the weak topology of BV to some limit w, and w is a minimizerof S∞.

Let us next consider a subsequence uj(`) ∈ argmin(Sj(`)) converging to u∞ ∈ argmin(S∞)in the weak topology of BV . Then we have

V (u∞) ≤ lim inf`→∞

V (uj(`)).(3.15)

Assume that we would have here a strict inequality.By definition we have for any subsequence uj(`) converging to u∞ in the weak topology of

BV

‖u∞‖L1(D) = lim`→∞

‖uj(`)‖L1(D).

By Lemma 7, we have

‖Au∞ −m‖2L2(Ω) = lim`→∞

‖Auj(`) −m‖2L2(Ω).(3.16)

These and a strict inequality in (3.15) would imply that lim inf`→∞ Sj(`)(uj(`)) < s∞, but thatis not possible in view of (3.14). Therefore,

V (u∞) = lim inf`→∞

V (uj(`)).(3.17)

Similar arguments with lim inf replaced by lim sup yield the desired equality.

Remark 9. In the numerical computations below we include a nonnegativity constraintin the minimization problem. This models the physical fact that X-rays can only attenuateinside matter—not intensify. The nonnegativity constraint can be added also to the aboveconvergence proofs. Let us explain.

The set L1+(D) = u ∈ L1(D); u(x) ≥ 0 a.e. x ∈ D is closed in L1(D), and thus the set

BV+(D) = u ∈ BV (D); u(x) ≥ 0 a.e. x ∈ D is closed both in the weak topology of BV (D)and in the anisotropic intermediate topology of BV (D). Also, let u+(x) = max(u(x), 0)denote the positive part of u, let χ+(u) be an indicator function that is zero for u ∈ BV+(D)and equal to the infinity for u ∈ BV (D) \BV+(D), and let

S(R,+)j (u) = S

(R)j (u) + χ+(u), S(R,+)

∞ (u) = S(R)∞ (u) + χ+(u).

Moreover, let B+R = BR ∩BV+(D).

By [36, section 5.5, Theorem 1], the map v 7→ v+ satisfies ‖v‖BV ≤ ‖v+‖BV . Using thisand [3, Proposition 10.1.1], we observe that if vn ∈ Yn ∩ BR converge in the weak topology ofBV to u ∈ BV (D) as n→∞, then also the positive parts of vn satisfy v+

n ∈ Yn ∩BR and thefunctions v+

n converge in the weak topology of BV to u+ ∈ BV+(D) as n→∞.

Using these facts, we can see, similarly to the proof of Theorem 6, that S(R,+)j Γ-converges,

with respect to the weak topology of BV, to S(R,+)∞ as n→∞. This shows that the result of

Theorem 6 holds also with the positivity constraint.

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 17: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

954 NIINIMAKI ET AL.

X-ray source

Lj

Detector

Figure 8. Left: Illustration of the pencil-beam attenuation model of X-ray tomography. Right: Domain Ωdiscretized into a lattice of pixels/voxels Ωi.

4. Computational models and algorithms.

4.1. X-ray attenuation model. In X-ray imaging the radiation source is placed on oneside of an object, and the detector is placed at the opposite side. The rays pass throughthe object, and the attenuated signal is detected by a digital sensor (an array of almostpointlike detectors); see Figure 8 for an illustration. We model the (2D slice of the) objectby a rectangle Ω ⊂ R2 and by a nonnegative attenuation coefficient u : Ω → [0,∞). Theattenuation coefficient u gives the relative intensity loss of an X-ray traveling a small distanceds at s ∈ Ω. This leads to the following linear model:

(4.1) gj = − log

(IjI0

)=

∫Lj

u(s)ds,

where Lj is the jth line of the X-ray, gj is the value of the projection measurement of the jthsource to detector line Lj , Ij is the measured X-ray intensity, and I0 is the initial intensity ofthe X-ray beam before entering Ω.

We discretize this model by dividing the domain Ω into a lattice of pixels Ωi and bycomputing the length of the path Lj inside each pixel Ωi; see Figure 8. Assuming that theattenuation function u(s) is constant inside each pixel Ωi, the projection measurements gj canbe approximated by

(4.2) gj =

∫Lj

u(s)ds ≈∑i

fi|Ωi ∩ Lj |,

where fi is the constant value of the function u in the ith pixel and |Ωi ∩ Lj | denotes thelength of the ray Lj through pixel Ωi. Arranging the set of M measurements into a vectorg = (g1, g2, . . . , gM ) ∈ RM , we obtain the model

g = Af ,

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 18: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TV PARAMETER CHOICE 955

where f ∈ RN = Rn×n is the vector containing the attenuation values in the pixels andA ∈ RM×N is the matrix that implements the transform from the pixel values to the projectiondata; see (4.2).

4.2. Discrete anisotropic TV. We consider the anisotropic TV

‖f‖TV = ‖DHf‖1 + ‖DV f‖1,

where the N ×N matrices DH and DV implement horizontal and vertical differences in then×n pixel image represented by vector f . We use periodic boundary conditions. For example,in the case of 3× 3 images

f = [f11, f21, f31, f12, f22, f32, f13, f23, f33]T ∈ R9,

the matrix DH takes the form

DH =1

3

−1 0 0 1 0 0 0 0 00 0 0 −1 0 0 1 0 01 0 0 0 0 0 −1 0 00 −1 0 0 1 0 0 0 00 0 0 0 −1 0 0 1 00 1 0 0 0 0 0 −1 00 0 −1 0 0 1 0 0 00 0 0 0 0 −1 0 0 10 0 1 0 0 0 0 0 −1

.

The TV-regularized X-ray imaging problem (1.3) can now be rewritten as follows:

(4.3) arg minf∈RN+

1

2‖Af − g‖22 + α (‖DHf‖1 + ‖DV f‖1)

.

4.3. Quadratic programming and primal-dual interior-point (PD-IP) method. The min-imization problem of (4.3) can be reformulated as a quadratic programming problem. Denot-ing DHf = h+ − h− and DV f = v+ − v− and following [68], we get

minz

1

2zTQz + cT z + d

s.t. Bz = b,(4.4)

z ≥ 0,

where Q is a symmetric and semi–positive definite matrix and

B =

[DH −I I O ODV O O −I I

],

where I denotes an identity matrix and O denotes a matrix of all zeros.

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 19: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

956 NIINIMAKI ET AL.

We seek to solve (4.4) by a PD-IP method similar to that proposed in [68]. Thus byintroducing the Lagrangian function

(4.5) L(z,y;µ) =1

2zTQz + cT z + d− µ

∑log(z)− yT (Bz− b),

with y being the Lagrangian multiplier and µ > 0, we can derive the first order necessaryoptimality conditions, i.e., the Karush–Kuhn–Tucker (KKT) conditions, of the problem in thefollowing reduced form:

(4.6)

[−(Q+ Z−1X) BT

B 0

] [∆z∆y

]=

[p1 − Z−1Xp3

p2

],

where Z = diag(z1, z2, . . . , znz), X = diag(x1, x2, . . . , xnz), p1 = c+Qz−BTy−x, p2 = b−Bz,p3 = µ/x− z−∆x∆z/x, and ∆x = Z−1X(p3 −∆z). For further details see [68].

The PD-IP algorithm that we use to solve our quadratic programming problem is aniterative method based on Mehrotra’s predictor-corrector approach [64]. The algorithm isexplained in detail in section 4 of [68].

4.4. Projected Barzilai–Borwein method. As another method for solving (4.3) approx-imately, we apply a projected Barzilai–Borwein (PBB) minimization algorithm, which is agradient-based method suitable for large-scale problems due to its relatively cheap matrix-free computations. Because of the differentiability assumption of the algorithm, we use thewell-known differentiable approximation

|||f |||β :=N∑i=1

√f2i + β, f ∈ RN ,

for the 1-norm. Here β > 0 is a small parameter; we use β = 10−2 in computations. Theresulting functional to be minimized is given by

Lβ(f) :=1

2‖Af − g‖22 + α (|||DHf |||β + |||DV f |||β) ,

whose nonnegativity constrained iterative PBB minimization is given by

fk+1 = P(fk − λk∇Lβ(fk)

), k = 0, . . . , Imax − 1,

where the step size is

λk =(fk − fk−1)T (fk − fk−1)

(fk − fk−1)T (∇Lβ(fk)−∇Lβ(fk−1))

and P : RN → RN is a projection operator to the nonnegative quadrant of RN , i.e.,

(P (f))i =

fi if fi ≥ 0,0 if fi < 0,

i = 1, . . . , N.

We choose the first step size to be λ0 = 10−4 and the number of iterations to be Imax = 200.

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 20: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TV PARAMETER CHOICE 957

5. Results. This section is organized as follows. Section 5.1 presents the datasets we usedto test our method. Section 5.2 is devoted to testing the proposed method, whereas section5.3 adapts the S-curve method to TV-regularized tomography and presents the associatedresults. In section 5.4 we use the L-curve results as a reference and present a comparison ofall three parameter selection methods.

5.1. X-ray measurement data. The proposed parameter selection method was testedusing three different data sets: simulated projection data from the Shepp–Logan phantom,experimental sparse angle projection data from a walnut specimen, and experimental projec-tion data from sugar cubes.

5.1.1. Simulated data. The simulated measurement data was computed using the modi-fied Shepp–Logan phantom. Gaussian measurement noise N (0, σ2I) with standard deviationof 5% of the maximum of the computed noise-free data was added to the projection data.A more dense grid was used in the generation of the noise-free data than was used in anyof the reconstructions, thus avoiding inverse crime. The simulated dataset consists of 180projections with uniform angular sampling from angular spanning of 180.

5.1.2. Experimental data. The projection data of the walnut1 and of the sugar cubes wereacquired with a custom-built µCT device nanotom 180 supplied by Phoenix Xray Systems +Services GmbH (Wunstorf, Germany). The measurement set-up is presented in Figure 9. TheX-ray detector is a 12-bit CMOS flat panel detector with 2304 × 2284 pixels of 50 µm size(Hamamatsu Photonics, Japan).

From the walnut dataset, a set of 90 projection images was acquired over a 180 degreerotation with uniform angular step of two degrees between projections. Each projection imagewas composed of an average of six 750 ms exposures. The X-ray tube acceleration voltagewas 80 kV, the tube current was 200 µA, and the full polychromatic beam was used forimage acquisition. For this work we chose only the projections corresponding to the middlecross-section of the walnut, thus resulting in a 2D reconstruction task.

For the sugar cubes, a full-cycle 120-projection set with three degree angular step wasmeasured. Each projection was composed of eight 500 ms exposures. The X-ray tube voltagewas 80 kV, and the tube current was 225 µA. Similarly to the walnut data, only the sinogramcorresponding to the middle 2D cross-section of the sugar cubes is considered in this work,and the measured sinogram was down-sampled to resolution 512× 120 for computations.

We consider the following five test cases:(i) Reconstruction using simulated noisy projection data from Shepp–Logan phantom

with discretization levels n = 64, 128, 192, 256.(ii) Reconstruction using the measured walnut data with n = 128, 192, 256.(iii) Reconstruction using the measured walnut data with 5% additional additive random

noise, i.e., using measurement g + e, where e is the realization from multivariateGaussian with standard deviation equivalent to 5% of the maximum value of theactual measurement g. The discretization levels were set to n = 128, 192, 256.

(iv) Reconstruction using the measured sugar cube data with discretization levels n =128, 256, 512.

1The data is openly available at http://www.fips.fi/dataset.php.

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 21: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

958 NIINIMAKI ET AL.

Figure 9. Left: Experimental setup for collecting tomographic X-ray data of a walnut. The detector plane ison the left and the X-ray source is on the right in the picture. The walnut is attached to a computer-controlledrotator platform. Right: Two examples of the resulting projection images.

(v) Reconstruction using the measured sugar cube data with 5% additional additive ran-dom noise (similar to (iii) above) with n = 128, 256, 512.

5.2. Selection of α using the parameter choice method. In this work we consider thefollowing levels of discretization: n = 64, 128, 192, 256, 512 within the five different test cases,each resulting in a 2D reconstruction on an n × n grid. TV-regularized reconstructions fα,nwere computed with several values of regularization parameter α. The results of cases (i),(ii), and (iii) were computed by solving the minimization problem of (4.3) using the PD-IPalgorithm of [68]. In cases (iv) and (v), the PBB algorithm described in section 4.4 was usedfor reconstruction.

For f ∈ Rn×n, consider the discrete anisotropic TV norm

‖f‖TV =1

n

∑κ∼κ′|fκ − fκ′ |,

where κ and κ′ are as in (1.3). The norms ‖fα,n‖TV of the reconstructed images were computedas a function of α at each discretization level n. Results from the simulated data (case (i))are shown in Table 1. Table 1 also shows the relative reconstruction errors

δfα,n

=‖ftrue − fα,n‖‖ftrue‖

of fα,n for all discretization levels as a function of α. A few samples of the correspondingreconstructions in case (i) are shown in Figure 10.

The TV norms of the reconstructed images computed from the experimental walnut data(case (ii)) and from the walnut data corrupted with 5% of random additional noise (case (iii))are shown in Table 2. A few samples with the walnut data are shown in Figures 11 and 12for test cases (ii) and (iii), respectively.

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 22: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TV PARAMETER CHOICE 959

Table 1TV norms of 2D reconstructions fα,n and relative reconstruction errors δfα,n =

‖ftrue−fα,n‖‖ftrue‖ of fα,ns.

The reconstructions are computed with the PD-IP algorithm from the simulated X-ray data on four differentdiscretization levels with several values of α ranging in the interval [10−4, 106]. The discretization levels wereset to n = 64, 128, 192, and 256. Roughly stable TV norm regimes are indicated by yellow. See Figure 10 forreconstructions.

‖fα,n‖TV δfα,n

α 642 1282 1922 2562 642 1282 1922 2562

10−4 7.62 17.35 35.34 59.19 0.19 0.09 0.09 0.0910−3 7.62 17.35 35.33 59.19 0.19 0.09 0.09 0.0910−2 7.62 17.34 35.32 59.20 0.19 0.09 0.09 0.0910−1 7.61 17.33 35.32 59.18 0.19 0.09 0.09 0.09100 7.61 17.33 35.22 58.98 0.19 0.09 0.09 0.09101 7.58 17.02 34.22 57.03 0.19 0.09 0.09 0.09101.33 7.54 16.67 33.01 54.66 0.19 0.09 0.09 0.09101.66 7.45 15.94 30.61 49.84 0.19 0.09 0.09 0.09102 7.28 14.51 26.04 40.52 0.19 0.09 0.09 0.08102.33 6.96 12.14 19.09 27.16 0.19 0.09 0.09 0.07102.66 6.45 9.10 11.26 13.55 0.19 0.09 0.08 0.07103 5.82 6.67 6.83 7.00 0.19 0.09 0.08 0.07103.33 5.35 5.78 5.81 5.84 0.19 0.10 0.09 0.07103.66 4.91 5.17 5.25 5.28 0.19 0.11 0.09 0.08104 4.16 4.40 4.46 4.48 0.19 0.14 0.13 0.13104.33 3.23 3.37 3.41 3.42 0.24 0.22 0.22 0.22104.66 2.22 2.27 2.28 2.28 0.31 0.31 0.30 0.30105 1.30 1.31 1.32 1.32 0.37 0.36 0.36 0.36106 0.19 0.19 0.19 0.19 0.52 0.52 0.52 0.52

For the experimental sugar cube data (cases (iv) and (v)), the TV norms of the recon-structions are presented in Table 3, and the reconstructions computed with the parameters αchosen by the proposed parameter selection strategy are shown in Figure 13.

The reconstructions in Figures 10, 11, and 12 are selected as follows: On the top row thevalue of the regularization parameter is too small, producing a “noisy” reconstruction, andon the bottom row the regularization parameter is too big, producing an “overregularized”reconstruction. In the middle row the value of the regularization parameter is selected as thesmallest, keeping the TV norms of reconstructions approximately independent of resolution.See Tables 1 and 2.

5.3. Selection of α using the S-curve method. The S-curve method was originally in-troduced in [56, 47] within a Bayesian inversion framework. In this work we apply the S-curvemethod for TV-regularized tomography. The S-curve results are presented here only for testcases (ii) and (iii). For further studies of the S-curve method applied to simulated and mea-sured datasets, see [56, 47].

The S-curve method was applied as follows. For test case (ii), we computed TV-regularized

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 23: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

960 NIINIMAKI ET AL.

Table 2TV norms of 2D reconstructions fα,n computed from the walnut data on three different discretization levels

with several values of α ranging in the interval [10−6, 106]. The discretization levels were set to n = 128, 192,and 256. The results are computed from projection data with 90 projection angles. Data is corrupted withadditional additive random noise (5%). The choice of α is the smallest possible leading to stable TV norms:α = 1 in the case of low noise, and α = 10 in the case of 5% added noise. Roughly stable TV norm regimesare indicated by yellow. See Figures 11 and 12.

Low noise 5% noise

α 1282 1922 2562 1282 1922 2562

10−4 1.51 2.29 3.64 2.42 5.05 8.7110−3 1.51 2.29 3.46 2.43 5.05 8.5910−2 1.50 2.23 2.97 2.42 5.01 8.5910−1 1.43 1.85 1.93 2.37 4.83 8.16100 1.08 1.11 1.11 1.99 3.50 5.12101 0.78 0.78 0.77 0.86 0.86 0.88102 0.48 0.48 0.48 0.48 0.48 0.48103 0.12 0.12 0.12 0.12 0.12 0.12104 0.04 0.04 0.04 0.04 0.04 0.04105 0 0 0 0 0 0106 0 0 0 0 0 0

Table 3TV norms of 2D reconstructions fα,n computed with projected Barzilai–Borwein from the sugar cube X-ray

data on three different discretization levels with several values of α ranging in the interval [10−4, 103]. Thediscretization levels were set to n = 128, 256, and 512. The results were computed from projection data with120 equiangular fan-beam projections with and without additional Gaussian noise (5%). The choice of α is thesmallest possible leading to stable TV norms: α = 10 in the case of low noise, and α = 100 in the case of 5%added noise. Roughly stable TV norm regimes are indicated by yellow. See Figure 13.

Low noise 5% noise

α 1282 2562 5122 1282 2562 5122

10−4 1.36 5.41 18.3 2.96 12.0 27.810−3 1.38 5.96 17.7 2.95 10.7 24.310−2 1.36 4.84 13.5 2.92 10.9 25.410−1 1.26 2.48 3.95 2.64 7.26 9.98100 0.95 1.11 1.28 1.66 2.41 2.90101 0.63 0.64 0.65 0.82 0.91 0.98102 0.45 0.45 0.45 0.48 0.49 0.50103 0.34 0.34 0.36 0.34 0.34 0.34

reconstructions fα,n with fixed n for 14 different values of α ranging in the interval [10−6, 106]and computed the TV norms

S(α) := ‖fα,n‖TV

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 24: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TV PARAMETER CHOICE 961

n = 64 n = 128 n = 192 n = 256

α = 10−4

α = 103.66

α = 106

Figure 10. TV-regularized reconstructions fα,n from the simulated X-ray projection data with differentvalues of regularization parameter α. The additive noise in the data is 5% of the maximum of the noiselessdata. The values of α are indicated on the right. Discretization levels n = 64, 128, 192, 256 are indicated on thetop. See Table 1.

for the reconstructions. Then the data α, S(α) were interpolated to get the S-curve. TheS-curve was computed using the same three different discretization levels (n = 128, 192, 256)that were used in section 5.2 with the walnut data.

In the test case (iii) we computed reconstructions with 14 different values of α rangingin the interval [10−4, 108] and applied the S-curve interpolation procedure similarly as in case(ii).

As in [47], the a priori value for the sparsity level S was estimated using three digitalphotographs of split walnuts. The photographs of the three walnuts are shown in Figure 14.The walnut used to measure the X-ray data is not included in the photos. In reality, as isthe case also in this work, the a priori information, which we use to estimate the value ofS, comes from a different modality (e.g., anatomical atlases) than the one we are considering

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 25: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

962 NIINIMAKI ET AL.

n = 128 n = 192 n = 256

α = 10−3

α = 100

α = 103

Figure 11. TV-regularized reconstructions fα,n from the projection data of the walnut computed withdifferent values of regularization parameter α. The values of α are indicated on the right. Discretization levelsn = 128, 192, 256 are indicated on the top. See Table 2.

(here digital photographs versus X-ray attenuation function). Therefore, in order to computethe S for the TV-regularized X-ray tomography, each of the digital photographs fp was scaledsuch that the norm of the computed X-ray projection data of the photograph is the same asthe norm of the measured projection data g. This was obtained by

(5.1) fp =‖g‖‖Afp‖

fp.

This scaling of the photographs is essential since the TVs of the photographs are not directlycomparable to the TV of the X-ray attenuation function.

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 26: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TV PARAMETER CHOICE 963

n = 128 n = 192 n = 256

α = 10−3

α = 101

α = 104

Figure 12. TV-regularized reconstructions fα,n from the measured projection data corrupted with 5% addi-tional noise computed with different values of regularization parameter α. The values of α are denoted on theright. Discretization levels n = 128, 192, 256 are indicated on the top. See Table 2.

The S-curve plots and the resulting reconstructions for each discretization levels are pre-sented in Figure 15 for the walnut data. Corresponding results with 5% additional noise arepresented in Figure 16.

5.4. Comparison of the results. As a reference, results with the L-curve method werecomputed with both of the walnut data test cases (ii) and (iii). The L-curve plots andcorresponding reconstructions are shown here only for the discretization level n = 128, butthe results for other discretization levels were similar. Figures 17 and 18 present the L-curveplots and the corresponding reconstructions for the test cases (ii) and (iii), respectively.

A comparison of the results obtained with all of the three methods, i.e., TV parameter

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 27: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

964 NIINIMAKI ET AL.

n = 128 n = 256 n = 512

α = 10

α = 100

FBP reconstruction:

Figure 13. TV-regularized reconstructions computed from the measured projection data of 10 sugar cubeswith projected Barzilai–Borwein and smoothed TV. Top: Filtered backprojection reconstruction (for reference).Middle: TV reconstructions computed from the X-ray data with no additional noise and with three different dis-cretization levels, n = 128, 256, and 512. The regularization parameter is α = 10. Bottom: TV reconstructionscomputed from X-ray data with 5% additional Gaussian noise. Here α = 100. See Table 3.

choice, S-curve method, and L-curve method, are presented in Figures 19 and 20 for test cases(ii) and (iii), respectively.

6. Discussion and conclusions. The results listed in Tables 1, 2, and 3 provide numer-ical evidence supporting the proposed parameter choice method. It seems that the newmultiresolution-based parameter choice method is useful and robust. The fact that we alsouse measured X-ray projection data lends credibility to these initial findings. From a practicalpoint of view, a very nice feature of the approach is that the parameter selection does not

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 28: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TV PARAMETER CHOICE 965

Figure 14. Photographs of three walnuts split in half. These images were used to provide a priori in-formation about the expected TV of the target. Of course, these are optical photographs and thus physicallyvery different objects from the reconstructions (tomographic slices representing X-ray attenuation coefficient).However, we assume that the TVs of the photographs, when scaled by (5.1), are comparable to the TV of theX-ray attenuation coefficient.

require (quantitative) prior information about the measurement noise (e.g., expected norm ofthe noise) or the unknowns (e.g., expected value of the TV norm) as input variables for theparameter selection.

The results of Table 1 show that including a coarser discretization level in the computationsof α, i.e., level n = 64, leads to the choice α = 103.66. While this choice of α produces thesmallest reconstruction error for estimate fα,n, n = 64, the corresponding errors for n =128, 192, 256 are not the smallest. A smaller error for n = 128, 192, 256 is obtained withα = 103. However, reconstructions with α = 103 and α = 103.66 both yield reasonableestimates for fα,n for all resolutions n considered. Thus it seems that the proposed methodworks acceptably also at low resolutions.

The precise mechanism behind the proposed method is still unclear. We presented amathematical proof that the reconstructions converge for any choice of α as the resolution isrefined. Therefore, it remains an open question whether the unstable norms in Tables 1, 2, and3 would converge with higher resolutions, or whether those numbers are inaccurate, reflectingthe numerical instability arising from the ill-posedness of the underlying tomographic problem.There may be a connection to [82], where the scale of image features is defined by measuringtheir tendency to change under TV regularization. In any case, in these examples at least,the proposed method automatically adapts to the relevant resolution and actual noise level.

The S-curve method, introduced in [56, 47], was found to work robustly and reliably for thepresent datasets. Curiously, the S-curve method produces parameter values of the same orderof magnitude as the new resolution-based method. However, a priori information about theTV of the unknown function is needed for applying the S-curve method, while the proposedmultiresolution-based method does not need anything other than the data.

The proposed method has the drawback that many reconstructions need to be computedwith various choices of resolution n and regularization parameter α. This can be computa-tionally demanding. In this work we solve (1.3) using a PD-IP method and a PBB method.The Barzilai–Borwein method is a gradient-based method which is relatively cheap due tomatrix-free computations. For the Barzilai–Borwein method, the absolute values of the 1-

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 29: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

966 NIINIMAKI ET AL.

S-curve Reconstruction

n = 128

α = 3

n = 192

α = 3

n = 256

α = 3

S = 0.74

S = 0.75

S = 0.76

10−6 106

1.6

2.5

4

100

Figure 15. S-curve method applied to the measured projection data with no additional noise (case (i)). Leftcolumn: Plots of the S-curves used to determine the values of α. Right column: Reconstruction computed withthe selected α.

norms are approximated due to the differentiability requirement of the method. By dividingthe unknown into nonnegative and nonpositive parts, we can solve (1.3) without approxima-tions. However, this division has the drawback of increasing the number of unknowns in theproblem. The new, bigger problem is, however, sparse and can be solved efficiently usingPD-IP methods as demonstrated in [41], for example. With the PD-IP method that we use,the most demanding part is the computation of (4.6). However, (4.6) can be solved itera-

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 30: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TV PARAMETER CHOICE 967

S-curve Reconstruction

n = 128

α = 14

n = 192

α = 28

n = 256

α = 37

S = 0.74

S = 0.75

S = 0.76

10−4 108

2.5

6

9

102

Figure 16. S-curve method applied to the measured projection data corrupted with 5% additional noise(case (ii)). Left column: Plots of the S-curves which were used to determine the values of α. Right column:Reconstruction computed using the selected α.

tively using a suitable preconditioned iterative scheme, such as the preconditioned conjugategradient method, a generalized minimal residual method, or a preconditioned quasi-minimalresidual method, for example. An efficient implementation of a suitable preconditioner willenable the acceleration of the convergence of a selected iterative solver. All computations canalso be made matrix-free, making the method suitable also for large-scale problems. One ofthe advantages of PD-IP methods is the natural and effective enforcement of nonnegativity.

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 31: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

968 NIINIMAKI ET AL.

L-curve Reconstructionα = 1358

TV (f)

10−5 106‖Af − g‖

Figure 17. L-curve method applied to walnut data; no additional noise. Left: L-curve plot. Right: Recon-struction computed using the selected α = 1358. The discretization level is n = 128.

L-curve Reconstruction withα = 10.4

TV (f)

10−4 107‖Af − g‖

Figure 18. L-curve method applied to walnut data with 5% additional noise. Left: L-curve plot. Right:Reconstruction computed using the selected α = 10.4. The discretization level is n = 128.

Figure 19. The final results from the walnut data with low noise. Reconstruction from left to right: TVparameter choice αTV = 1, S-curve method αScurve = 3.00, and L-curve method αLcurve = 1358, respectively.The discretization level is n = 128.

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 32: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TV PARAMETER CHOICE 969

Figure 20. The final results from the walnut data with 5% additional noise. Reconstruction from left toright: TV parameter choice αTV = 10, S-curve method αScurve = 14.31, and L-curve method αLcurve = 10.4,respectively. The discretization level is n = 128.

Also, with primal-dual methods the evaluation of the duality gap provides a nonheuristiccheck of convergence. In this work we used a preconditioned quasi-minimal residual methodfor solving (4.6). All the PD-IP computations of this work were performed using MATLABversion R2014.

We note that, in addition to the methods discussed in the introduction, (1.3) could also besolved using proximal splitting methods [27], for example. However, these methods introduceadditional variables into the optimization problem, and without acceleration the number ofiterations is considerable. With proper preconditioning and/or acceleration schemes, proximalsplitting methods provide feasible algorithms for solving convex, nonsmooth imaging problems;see [18, 1, 79, 11, 10, 59], for example.

We stress that this work is just an initial feasibility study for the new parameter choicemethod. More comprehensive testing and comparisons to other methods are still needed toassess the properties of the new approach. Also, it would be interesting to see how the methodworks with generalizations of TV regularization, such as [9, 81].

Appendix A. A note on BV norms. What would happen if one used the BV norm

‖u‖L1(D) + Vp(u) := ‖u‖L1(D) +

∫D

( ∣∣∣∣∂u(x)

∂x1

∣∣∣∣p +

∣∣∣∣∂u(x)

∂x2

∣∣∣∣p)1/p

dm(x)

with p > 1 in the analysis of section 3? The norm (2.2) used there corresponds to the choicep = 1. First, taking p > 1 would only give an inequality in (2.6),

limj→∞

Vp(vj) ≥ Vp(u),

and the proof would not work.Let us point out a specific example using p = 2. Consider the characteristic function

χ = χ|x|<1 of the unit disc, whose BV norm is 2π: the length of the boundary (which is 2π)times the jump (which is 1). Now approximate χ using functions equal to one inside pixel-based set approximations to the unit disc; see Figure 21. The total length of the boundary ofeach such set is 8. Thus the BV norms of the approximations converge to 8 instead of 2π.

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 33: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

970 NIINIMAKI ET AL.

Figure 21. Pixel-based piecewise linear approximations to the characteristic function of the unit disc. Herewhite denotes zero value, and gray denotes value one. The TV norm of all of these functions is exactly 8 (equalto the length of the boundary).

Now using the BV norm with p = 1 leads to the (Manhattan) length of the unit circle tobe 8, ensuring the appropriate convergence.

REFERENCES

[1] S. Anthoine, J.-F. Aujol, Y. Boursier, and C. Melot, Some proximal methods for Poisson intensityCBCT and PET, Inverse Probl. Imaging, 6 (2012), pp. 565–598.

[2] V. S. Antyufeev and A. N. Bondarenko, X-ray tomography in scattering media, SIAM J. Appl.Math., 56 (1996), pp. 573–587, doi:10.1137/S0036139994266396.

[3] H. Attouch, G. Buttazzo, and G. Michaille, Variational Analysis in Sobolev and BV Spaces: Appli-cations to PDEs and Optimization, 2nd ed., MOS–SIAM Ser. Optim. 17, SIAM, Philadelphia, 2014,doi:10.1137/1.9781611973488.

[4] H. Attouch and R. Wets, Epigraphical analysis, Ann. Inst. H. Poincare, 6 (1989), suppl., pp. 73–100.[5] S. Bartels, Total variation minimization with finite elements: Convergence and iterative solution, SIAM

J. Numer. Anal., 50 (2012), pp. 1162–1180, doi:10.1137/11083277X.[6] P. Belık and M. Luskin, Approximation by piecewise constant functions in a BV metric, Math. Models

Methods Appl. Sci., 13 (2003), pp. 373–393.[7] G. Bellettini, A. Chambolle, and M. Goldman, The Γ-limit for singularly perturbed functionals of

Perona–Malik type in arbitrary dimension, Math. Models Methods Appl. Sci., 24 (2014), pp. 1091–1113.

[8] J. Bian, X. Han, E. Y. Sidky, G. Cao, J. Lu, O. Zhou, and X. Pan, Investigation of sparse datamouse imaging using micro-CT with a carbon-nanotube-based X-ray source, Tsinghua Sci. Technol.,15 (2010), pp. 74–78.

[9] K. Bredies, K. Kunisch, and T. Pock, Total generalized variation, SIAM J. Imaging Sci., 3 (2010),pp. 492–526, doi:10.1137/090769521.

[10] K. Bredies and H. Sun, Preconditioned Douglas-Rachford algorithms for TV and TGV regularizedvariational imaging problems, J. Math. Imaging Vision, 52 (2015), pp. 317–344.

[11] K. Bredies and H. Sun, Preconditioned Douglas–Rachford splitting methods for convex-concave saddle-point problems, SIAM J. Numer. Anal., 53 (2015), pp. 421–444, doi:10.1137/140965028.

[12] M. Burger and S. Osher, A guide to the TV zoo, in Level Set and PDE Based Reconstruction Methodsin Imaging, M. Burger and S. Osher, eds., Springer, New York, 2013, pp. 1–70.

[13] J.-F. Cai, B. Dong, S. Osher, and Z. Shen, Image restoration: Total variation, wavelet frames, andbeyond, J. Amer. Math. Soc., 25 (2012), pp. 1033–1089.

[14] J.-F. Cai, S. Osher, and Z. Shen, Linearized Bregman iterations for frame-based image deblurring,SIAM J. Imaging Sci., 2 (2009), pp. 226–252, doi:10.1137/080733371.

[15] A. Chambolle, Image segmentation by variational methods: Mumford and Shah functional and the dis-crete approximations, SIAM J. Appl. Math., 55 (1995), pp. 827–863, doi:10.1137/S0036139993257132.

[16] A. Chambolle, A. Giacomini, and L. Lussardi, Continuous limits of discrete perimeters, M2ANMath. Model. Numer. Anal., 44 (2010), pp. 207–230.

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 34: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TV PARAMETER CHOICE 971

[17] A. Chambolle, S. E. Levine, and B. J. Lucier, An upwind finite-difference method for total variation–based image smoothing, SIAM J. Imaging Sci., 4 (2011), pp. 277–299, doi:10.1137/090752754.

[18] A. Chambolle and P. Thomas, A first-order primal-dual algorithm for convex problems with applica-tions to imaging, J. Math. Imaging Vision, 40 (2011), pp. 120–145.

[19] T. F. Chan and K. Chen, On a nonlinear multigrid algorithm with primal relaxation for the image totalvariation minimisation, Numer. Algorithms, 41 (2006), pp. 387–411, doi:10.1007/s11075-006-9020-z.

[20] T. F. Chan, G. H. Golub, and P. Mulet, A nonlinear primal-dual method for total variation-basedimage restoration, SIAM J. Sci. Comput., 20 (1999), pp. 1964–1977, doi:10.1137/S1064827596299767.

[21] T. F. Chan and J. Shen, Image Processing and Analysis: Variational, PDE, Wavelet, and StochasticMethods, SIAM, Philadelphia, 2005, doi:10.1137/1.9780898717877.

[22] R. Chartrand, E. Y. Sidky, and X. Pan, Nonconvex compressive sensing for X-ray CT: An algorithmcomparison, in Proceedings of the Asilomar Conference on Signals, Systems and Computers, PacificGrove, CA, 2013, pp. 665–669, doi:10.1109/ACSSC.2013.6810365.

[23] B. Chen, M. Yang, Z. Zhang, X. Han, J. Bian, E. Sidky, and X. Pan, Constrained TV-minimization reconstruction from exterior CT data, in Proceedings of the Nuclear Science Sym-posium and Medical Imaging Conference (NSS/MIC), IEEE, Washington, DC, 2013, pp. 1–3,doi:10.1109/NSSMIC.2013.6829356.

[24] K. Chen, E. Loli Piccolomini, and F. Zama, An automatic regularization parameter selection algo-rithm in the total variation model for image deblurring, Numer. Algorithms, 67 (2014), pp. 73–92.

[25] C. Clason, B. Jin, and K. Kunisch, A duality-based splitting method for `1-TV image restorationwith automatic regularization parameter choice, SIAM J. Sci. Comput., 32 (2010), pp. 1484–1505,doi:10.1137/090768217.

[26] P. L. Combettes and J. Luo, An adaptive level set method for nondifferentiable constrained imagerecovery, IEEE Trans. Image Process., 11 (2002), pp. 1295–1304.

[27] P. L. Combettes and J.-C. Pesquet, Proximal splitting methods in signal processing, in Fixed-Point Al-gorithms for Inverse Problems in Science and Engineering, Springer-Verlag, New York, 2011, pp. 185–212.

[28] G. Dal Maso, An Introduction to Γ-Convergence, Birkhauser Boston, Boston, 1993.[29] A. H. Delaney and Y. Bresler, Globally convergent edge-preserving regularized reconstruction: An

application to limited-angle tomography, IEEE Trans. Image Process., 7 (1998), pp. 204–221.[30] D. C. Dobson and C. R. Vogel, Convergence of an iterative method for total variation denoising,

SIAM J. Numer. Anal., 34 (1997), pp. 1779–1791, doi:10.1137/S003614299528701X.[31] Y. Dong, M. Hintermuller, and M. M. Rincon-Camacho, Automated regularization parameter

selection in multi-scale total variation models for image restoration, J. Math. Imaging Vision, 40(2011), pp. 82–104.

[32] X. Duan, L. Zhang, Y. Xing, Z. Chen, and J. Cheng, Few-view projection reconstruction withan iterative reconstruction-reprojection algorithm and TV constraint, IEEE Trans. Nuclear Sci., 56(2009), pp. 1377 –1382, doi:10.1109/TNS.2008.2009990.

[33] V. Duval, J.-F. Aujol, and Y. Gousseau, A bias-variance approach for the nonlocal means, SIAM J.Imaging Sci., 4 (2011), pp. 760–788, doi:10.1137/100790902.

[34] H. Engl, M. Hanke, and A. Neubauer, Regularization of Inverse Problems, Kluwer Academic Pub-lishers, Dordrecht, The Netherlands, 1996.

[35] E. Esser, X. Zhang, and T. F. Chan, A general framework for a class of first order primal-dualalgorithms for convex optimization in imaging science, SIAM J. Imaging Sci., 3 (2010), pp. 1015–1046, doi:10.1137/09076934X.

[36] L. C. Evans and R. F. Gariepy, Measure Theory and Fine Properties of Functions, revised ed., Text-books in Math., CRC Press, Boca Raton, FL, 2015.

[37] X. Feng and A. Prohl, Analysis of total variation flow and its finite element approximations, ESAIMMath. Model. Numer. Anal., 37 (2003), pp. 533–556, doi:10.1051/m2an:2003041.

[38] M. Fornasier, A. Langer, and C.-B. Schonlieb, A convergent overlapping domain decompositionmethod for total variation minimization, Numer. Math., 116 (2010), pp. 645–685, doi:10.1007/s00211-010-0314-7.

[39] M. Fornasier and C.-B. Schonlieb, Subspace correction methods for total variation and l1-minimization, SIAM J. Numer. Anal., 47 (2009), pp. 3397–3428, doi:10.1137/070710779.

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 35: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

972 NIINIMAKI ET AL.

[40] K. Frick, P. Marnitz, and A. Munk, Statistical multiresolution Dantzig estimation in imaging:Fundamental concepts and algorithmic framework, Electron. J. Statist., 6 (2012), pp. 231–268,doi:10.1214/12-EJS671.

[41] H. Fu, M. K. Ng, M. Nikolova, and J. L. Barlow, Efficient minimization methods of mixedl2-l1 and l1-l1 norms for image restoration, SIAM J. Sci. Comput., 27 (2006), pp. 1881–1902,doi:10.1137/040615079.

[42] D. A. Girard, Optimal regularized reconstruction in computerized tomography, SIAM J. Sci. Statist.Comput., 8 (1987), pp. 934–950, doi:10.1137/0908076.

[43] D. Goldfarb and W. Yin, Second-order cone programming methods for total variation-based imagerestoration, SIAM J. Sci. Comput., 27 (2005), pp. 622–645, doi:10.1137/040608982.

[44] T. Goldstein and S. Osher, The split Bregman method for L1-regularized problems, SIAM J. ImagingSci., 2 (2009), pp. 323–343, doi:10.1137/080725891.

[45] E. T. Hale, W. Yin, and Y. Zhang, Fixed-point continuation applied to compressed sensing: Imple-mentation and numerical experiments, J. Comput. Math., 28 (2010), pp. 170–194.

[46] K. Hamalainen, L. Harhanen, A. Hauptmann, A. Kallonen, E. Niemi, and S. Siltanen, Totalvariation regularization for large-scale X-ray tomography, Int. J. Tomogr. Simul., 25 (2014), pp. 1–25.

[47] K. Hamalainen, A. Kallonen, V. Kolehmainen, M. Lassas, K. Niinimaki, and S. Siltanen, Sparsetomography, SIAM J. Sci. Comput., 35 (2013), pp. B644–B665, doi:10.1137/120876277.

[48] P. C. Hansen, Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion,SIAM, Philadelphia, 1998.

[49] P. C. Hansen, Discrete Inverse Problems: Insight and Algorithms, Fundam. Algorithms 7, SIAM,Philadelphia, 2010, doi:10.1137/1.9780898718836.

[50] G. T. Herman and R. Davidi, Image reconstruction from a small number of projections, Inverse Prob-lems, 24 (2008), 045011, http://stacks.iop.org/0266-5611/24/i=4/a=045011.

[51] M. Hintermuller and K. Kunisch, Total bounded variation regularization as a bilater-ally constrained optimization problem, SIAM J. Appl. Math., 64 (2004), pp. 1311–1333,doi:10.1137/S0036139903422784.

[52] T. L. Jensen, J. H. Jørgensen, P. C. Hansen, and S. H. Jensen, Implementation of an optimalfirst-order method for strongly convex total variation regularization, BIT, 52 (2012), pp. 329–356,doi:10.1007/s10543-011-0359-8.

[53] J. H. Jørgensen, T. L. Jensen, P. C. Hansen, S. H. Jensen, E. Y. Sidky, and X. Pan, Acceler-ated Gradient Methods for Total-Variation-Based CT Image Reconstruction, preprint, arxiv:1105.4002[math.NA], 2011.

[54] J. Kaipio and E. Somersalo, Statistical and Computational Inverse Problems, Appl. Math. Sci. 160,Springer-Verlag, New York, 2004.

[55] S. Kindermann, L. D. Mutimbu, and E. Resmerita, A numerical study of heuristic parameter choicerules for total variation regularization, J. Inverse Ill-Posed Problems, 22 (2013), pp. 63–94.

[56] V. Kolehmainen, M. Lassas, K. Niinimaki, and S. Siltanen, Sparsity-promoting Bayesian inversion,Inverse Problems, 28 (2012), 025005, doi:10.1088/0266-5611/28/2/025005.

[57] V. Kolehmainen, S. Siltanen, S. Jarvenpaa, J. Kaipio, P. Koistinen, M. Lassas, J. Pirttila,and E. Somersalo, Statistical inversion for medical X-ray tomography with few radiographs: II. Ap-plication to dental radiology, Phys. Med. Biol., 48 (2003), pp. 1465–1490.

[58] V. Kolehmainen, A. Vanne, S. Siltanen, S. Jarvenpaa, J. Kaipio, M. Lassas, and M. Kalke,Parallelized Bayesian inversion for three-dimensional dental X-ray imaging, IEEE Trans. MedicalImaging, 25 (2006), pp. 218–228.

[59] N. Komodakis and J.-C. Pesquet, Playing with duality: An overview of recent primal-dual approachesfor solving large-scale optimization problems, IEEE Signal Process. Mag., 32 (2015), pp. 31–54.

[60] M. Lassas, E. Saksman, and S. Siltanen, Discretization-invariant Bayesian inversion and Besov spacepriors, Inverse Probl. Imaging, 3 (2009), pp. 87–122.

[61] M. Lassas and S. Siltanen, Can one use total variation prior for edge-preserving Bayesian inversion?,Inverse Problems, 20 (2004), pp. 1537–1564.

[62] Y. Li and F. Santosa, A computational algorithm for minimizing total variation in image restoration,IEEE Trans. Image Process., 5 (1996), pp. 987–995, doi:10.1109/83.503914.

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 36: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

TV PARAMETER CHOICE 973

[63] H. Liao and G. Sapiro, Sparse representations for limited data tomography, in Biomedical Imaging:From Nano to Macro, Proceedings of ISBI 2008, the 5th IEEE International Symposium on BiomedicalImaging, IEEE, Washington, DC, 2008, pp. 1375–1378, doi:10.1109/ISBI.2008.4541261.

[64] S. Mehrotra, On the implementation of a primal-dual interior point method, SIAM J. Optim., 2 (1992),pp. 575–601, doi:10.1137/0802028.

[65] J. L. Mueller and S. Siltanen, Linear and Nonlinear Inverse Problems with Practical Applications,Comput. Sci. Eng. 10, SIAM, Philadelphia, 2012, doi:10.1137/1.9781611972344.

[66] F. Natterer, The Mathematics of Computerized Tomography, reprint of the 1986 original (Tuebner,Stuttgart, John Wiley & Sons, Chichester), Classics Appl. Math. 32, SIAM, Philadelphia, 2001,doi:10.1137/1.9780898719284.

[67] Y. Nesterov, Barrier subgradient method, Math. Program., 127 (2011), pp. 31–56, doi:10.1007/s10107-010-0421-3.

[68] K. Niinimaki, Computational Optimization Methods for Large-scale Inverse Problems, Ph.D. thesis, Uni-versity of Eastern Finland, Kuopio, Finland, 2013.

[69] J. Nuyts, B. De Man, J. Fessler, W. Zbijewski, and F. Beekman, Modelling the physics in theiterative reconstruction for transmission computed tomography, Phys. Med. Biol., 58 (2013), pp. R63–R96.

[70] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin, An iterative regularization methodfor total variation-based image restoration, Multiscale Model. Simul., 4 (2005), pp. 460–489,doi:10.1137/040605412.

[71] S. Osher and R. Fedkiw, Level Set Methods and Dynamic Implicit Surfaces, Appl. Math. Sci. 153,Springer-Verlag, New York, 2003.

[72] S. Ramani, T. Blu, and M. Unser, Monte-Carlo sure: A black-box optimization of regularizationparameters for general denoising algorithms, IEEE Trans. Image Process., 17 (2008), pp. 1540–1554.

[73] S. Ramani, Z. Liu, J. Rosen, J. Nielsen, and J. A. Fessler, Regularization parameter selection fornonlinear iterative image restoration and MRI reconstruction using GCV and sure-based methods,IEEE Trans. Image Process., 21 (2012), pp. 3659–3672.

[74] L. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based noise removal algorithms, Phys.D, 60 (1992), pp. 259–268.

[75] H. Rullgard, A New Principle for Choosing Regularization Parameter in Certain Inverse Problems,preprint, arxiv:0803.3713v2 [math:NA], 2008.

[76] K. Sauer, S. J. James, and K. Klifa, Bayesian estimation of 3-D objects from few radiographs, IEEETrans. Nuclear Sci., 41 (1994), pp. 1780–1790.

[77] O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier, and F. Lenzen, Variational Methods inImaging, Appl. Math. Sci. 167, Springer-Verlag, New York, 2009.

[78] E. Y. Sidky, R. Chartrand, and X. Pan, Constrained non-convex TpV minimization for extremelysparse projection view sampling in CT, in Nuclear Science Symposium and Medical Imaging Confer-ence (NSS/MIC), IEEE, Washington, DC, 2013, pp. 1–3, doi:10.1109/NSSMIC.2013.6829176.

[79] E. Y. Sidky, J. H. Jorgensen, and X. Pan, Convex optimization problem prototyping for imagereconstruction in computed tomography with the Chambolle-Pock algorithm, Phys. Med. Biol., 57(2012), pp. 3065–3091.

[80] E. Y. Sidky and X. Pan, Image reconstruction in circular cone-beam computed tomography by con-strained, total-variation minimization, Phys. Med. Biol., 53 (2008), pp. 4777–4807, http://stacks.iop.org/0031-9155/53/i=17/a=021.

[81] W. Stefan, R. A. Renaut, and A. Gelb, Improved total variation-type regularization using higherorder edge detectors, SIAM J. Imaging Sci., 3 (2010), pp. 232–251, doi:10.1137/080730251.

[82] D. M. Strong, J.-F. Aujol, and T. F. Chan, Scale recognition, regularization parameter selection, andMeyer’s G norm in total variation regularization, Multiscale Model. Simul., 5 (2006), pp. 273–303,doi:10.1137/040621624.

[83] P. Sukovic and N. H. Clinthorne, Penalized weighted least squares image reconstruction for dualenergy X-ray transmission tomography, IEEE Trans. Medical Imaging, 19 (2000), pp. 1075–1081.

[84] J. Tang, B. E. Nett, and G.-H. Chen, Performance comparison between total variation (TV)-basedcompressed sensing and statistical iterative reconstruction algorithms, Phys. Med. Biol., 54 (2009),pp. 5781–5804, http://stacks.iop.org/0031-9155/54/i=19/a=008.

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Page 37: Multiresolution Parameter Choice Method for Total ...siltanen-research.net/publ/NiinimakiLassasHamalain... · Kati Niinim aki while the proposed method does not require such input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

974 NIINIMAKI ET AL.

[85] Z. Tian, X. Jia, K. Yuan, T. Pan, and S. B. Jiang, Low-dose CT reconstruction via edge-preservingtotal variation regularization, Phys. Med. Biol., 56 (2011), pp. 5949–5967, http://stacks.iop.org/0031-9155/56/i=18/a=011.

[86] A. Toma, B. Sixou, and F. Peyrin, Iterative choice of the optimal regularization parameter in TVimage restoration, Inverse Probl. Imaging, 9 (2015), pp. 1171–1191, doi:10.3934/ipi.2015.9.1171.

[87] N. G. Trillos and D. Slepcev, Γ-Convergence of Perimeter on Random Geometric Graphs, Technicalreport, PIRE, Carnegie Mellon Univeristy, Pittsburgh, PA, https://www.math.cmu.edu/PIRE/pub/pire66/13-CNA-024.pdf, 2013.

[88] G. Vainikko, On the discretization and regularization of ill-posed problems with noncompact operators,Numer. Funct. Anal. Optim., 13 (1992), pp. 381–396.

[89] Y. van Gennip and A. L. Bertozzi, γ-convergence of graph Ginzburg-Landau functionals, Adv. Differ-ential Equations, 17 (2012), pp. 1115–1180.

[90] C. R. Vogel, Computational Methods for Inverse Problems, Frontiers Appl. Math. 23, SIAM, Philadel-phia, 2002, doi:10.1137/1.9780898717570.

[91] C. Vogel and M. Oman, Fast, robust total variation-based reconstruction of noisy, blurred images, IEEETrans. Image Process., 7 (1998), pp. 813–824.

[92] Y. Wang, J. Yang, W. Yin, and Y. Zhang, A new alternating minimization algorithm for totalvariation image reconstruction, SIAM J. Imaging Sci., 1 (2008), pp. 248–272, doi:10.1137/080724265.

[93] Y.-W. Wen and R. Chan, Parameter selection for total-variation-based image restoration using discrep-ancy principle, IEEE Trans. Image Process., 21 (2012), pp. 1770–1781, doi:10.1109/TIP.2011.2181401.

[94] J. Xu and M. Tsui, Electronic noise modeling in statistical iterative reconstruction, IEEE Trans. ImageProcess., 18 (2009), pp. 1228–1238.

[95] W. Yin, S. Osher, D. Goldfarb, and J. Darbon, Bregman iterative algorithms for `1-minimization with applications to compressed sensing, SIAM J. Imaging Sci., 1 (2008), pp. 143–168,doi:10.1137/070703983.

[96] X. Zhang, M. Burger, and S. Osher, A unified primal-dual algorithm framework based on Bregmaniteration, J. Sci. Comput., 46 (2011), pp. 20–46, doi:10.1007/s10915-010-9408-8.

Dow

nloa

ded

09/2

0/16

to 1

30.2

16.4

8.12

6. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p