1 a hierarchical bayesian model for frame representation representation is a crucial operation in...

30
arXiv:0911.2888v1 [stat.ME] 15 Nov 2009 1 A Hierarchical Bayesian Model for Frame Representation L. Chaˆ ari, Student Member, IEEE, J.-C. Pesquet, Senior Member, IEEE, J.-Y. Tourneret, Senior Member, IEEE, Ph. Ciuciu, Member, IEEE and A. Benazza-Benyahia, Member, IEEE Abstract In many signal processing problems, it may be fruitful to represent the signal under study in a frame. If a probabilistic approach is adopted, it becomes then necessary to estimate the hyper-parameters characterizing the probability distribution of the frame coefficients. This problem is difficult since in general the frame synthesis operator is not bijective. Consequently, the frame coefficients are not directly observable. This paper introduces a hierarchical Bayesian model for frame representation. The posterior distribution of the frame coefficients and model hyper-parameters is derived. Hybrid Markov Chain Monte Carlo algorithms are subsequently proposed to sample from this posterior distribution. The generated samples are then exploited to estimate the hyper-parameters and the frame coefficients of the target signal. Validation experiments show that the proposed algorithms provide an accurate estimation of the frame coefficients and hyper-parameters. Application to practical problems of image denoising show the impact of the resulting Bayesian estimation on the recovered signal quality. Index Terms Frame representations, Bayesian estimation, MCMC, Gibbs sampler, Metropolis Hastings, hyper- parameter estimation, Generalized Gaussian, sparsity, compressed sensing, wavelets. L. Chaˆ ari and J-C Pesquet are with LIGM and UMR-CNRS 8049, Universit´ e Paris-Est, Champs-sur-Marne, 77454 Marne-la-Vall´ ee, France. E-mail: {lotfi.chaari,jean-christophe.pesquet}@univ-paris-est.fr, J.-Y. Tourneret is with the Univer- sity of Toulouse, IRIT/ENSEEIHT/TSA, 31071 Toulouse, France. E-mail: [email protected]. Ph. Ciuciu is with CEA/DSV/I 2 BM/Neurospin, CEA Saclay, Bat. 145, Point Courrier 156, 91191 Gif-sur-Yvette cedex, France. E-mail: [email protected], A. Benazza-Benyahia is with the Ecole Sup´ erieure des Communications de Tunis (SUP’COM-Tunis), Unit´ e de Recherche en Imagerie Satellitaire et ses Applications (URISA) , Cit´ e Technologique des Communications, 2083, Tunisia. E-mail: [email protected] This work was supported by grants from R´ egion Ile de France and the Agence Nationale de la Recherche under grant ANR-05-MMSA-0014-01. October 26, 2018 DRAFT

Upload: others

Post on 27-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

arX

iv:0

911.

2888

v1 [

stat

.ME

] 15

Nov

200

91

A Hierarchical Bayesian Model for Frame

RepresentationL. Chaari,Student Member, IEEE, J.-C. Pesquet,Senior Member, IEEE, J.-Y. Tourneret,

Senior Member, IEEE, Ph. Ciuciu,Member, IEEEand A. Benazza-Benyahia,Member, IEEE

Abstract

In many signal processing problems, it may be fruitful to represent the signal under study in a

frame. If a probabilistic approach is adopted, it becomes then necessary to estimate the hyper-parameters

characterizing the probability distribution of the frame coefficients. This problem is difficult since in

general the frame synthesis operator is not bijective. Consequently, the frame coefficients are not directly

observable. This paper introduces a hierarchical Bayesianmodel for frame representation. The posterior

distribution of the frame coefficients and model hyper-parameters is derived. Hybrid Markov Chain Monte

Carlo algorithms are subsequently proposed to sample from this posterior distribution. The generated

samples are then exploited to estimate the hyper-parameters and the frame coefficients of the target

signal. Validation experiments show that the proposed algorithms provide an accurate estimation of the

frame coefficients and hyper-parameters. Application to practical problems of image denoising show the

impact of the resulting Bayesian estimation on the recovered signal quality.

Index Terms

Frame representations, Bayesian estimation, MCMC, Gibbs sampler, Metropolis Hastings, hyper-

parameter estimation, Generalized Gaussian, sparsity, compressed sensing, wavelets.

L. Chaari and J-C Pesquet are with LIGM and UMR-CNRS 8049, Universite Paris-Est, Champs-sur-Marne, 77454

Marne-la-Vallee, France. E-mail:{lotfi.chaari,jean-christophe.pesquet}@univ-paris-est.fr, J.-Y. Tourneret is with the Univer-

sity of Toulouse, IRIT/ENSEEIHT/TSA, 31071 Toulouse, France. E-mail: [email protected]. Ph. Ciuciuis

with CEA/DSV/I2BM/Neurospin, CEA Saclay, Bat. 145, Point Courrier 156, 91191 Gif-sur-Yvette cedex, France. E-mail:

[email protected], A. Benazza-Benyahia is with the Ecole Superieure des Communications de Tunis (SUP’COM-Tunis),

Unite de Recherche en Imagerie Satellitaire et ses Applications (URISA) , Cite Technologique des Communications, 2083,

Tunisia. E-mail: [email protected]

This work was supported by grants from Region Ile de France and the Agence Nationale de la Recherche under grant

ANR-05-MMSA-0014-01.

October 26, 2018 DRAFT

Page 2: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

2

I. INTRODUCTION

Data representation is a crucial operation in many signal and image processing applications. These

applications include signal and image reconstruction [1, 2] , restoration [3, 4] and compression [5, 6]. In

this respect, many linear transforms have been proposed in order to obtain suitable signal representations

in other domains than the original spatial or temporal ones.The traditional Fourier and discrete cosine

transforms provide a good frequency localization, but at the expense of a poor spatial or temporal localiza-

tion. To improve localization both in the spatial/temporaland frequency domains, the wavelet transform

(WT) was introduced as a powerful tool in the1980’s [7]. Many wavelet-like basis decompositions have

been subsequently proposed offering different features. For instance, we can mention the wavelet packets

[8] or the grouplet bases [9]. To further improve signal representations, redundant linear decomposition

families calledframeshave become the focus of many works during the last decade. For the sake of

clarity, it must be pointed out that the term frame [10] is understood in the sense of Hilbert space theory

and not in the sense of some recent works like [11].

The main advantage of frames lies in their flexibility to capture local features of the signal. Hence,

they may result in sparser representations as shown in the literature on curvelets [10], bandelets [12]

or dual-trees [13] in image processing. However, a major difficulty when using frame representations

in a statistical framework is to estimate the parameters of the frame coefficient probability distribution.

Actually, since frame synthesis operators are generally not injective, even if the signal is perfectly known,

the determination of its frame coefficients is an underdetermined problem.

This paper studies a hierarchical Bayesian approach to estimate the frame coefficients and their hyper-

parameters. Although this approach is conceptually able todeal with any desirable distribution for the

frame coefficients, we focus in this paper on generalized Gaussian (GG) priors. Note however that we do

not restrict our attention to log-concave GG prior probability density functions (pdf), which may be limited

for providing accurate models of sparse signals [14]. In addition, the proposed method can be applied to

noisy data when imprecise measurements of the signal are only available. Our work takes advantage of the

current developments in Markov Chain Monte Carlo (MCMC) algorithms [15–17] that have already been

investigated for instance in image separation [18], image restoration [19] and brain activity detection

in functional MRI [20, 21]. These algorithms have also been investigated for signal/image processing

problems with sparsity constraints. These constraints maybe imposed in the original space like in [22],

where a sparse image reconstruction problem is assessed in the image domain. They may also be imposed

on some redundant representation of the signal like in [23],where a time-series sparse coding problem

October 26, 2018 DRAFT

Page 3: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

3

is addressed.

Hybrid MCMC algorithms [24, 25] are designed combining Metropolis-Hastings (MH) [26] and Gibbs

[27] moves to sample according to the posterior distribution of interest. MCMC algorithms and WT

have been jointly investigated in some works dealing with signal denoising in a Bayesian framework

[18, 28–30]. However, in contrast with the present work where overcomplete frame representations are

considered, these works are limited to wavelet bases for which the hyper-parameter estimation problem

is much easier to handle.

This paper is organized as follows. Section II presents a brief overview on the concepts of frame and

frame representation. The hierarchical Bayesian model proposed for frame representation is introduced

in Section III. Two algorithms for sampling the posterior distribution are proposed in Section IV. To

illustrate the effectiveness of these algorithms, experiments on both synthetic and real world data are

presented in Section V. In this section, applications to image recovery problems are also considered.

Finally some conclusions are drawn in Section VI.

II. PROBLEM FORMULATION

A. The frame concept

In the following, we will consider real-valued digital signals of lengthL as elements of the Euclidean

spaceRL endowed with the usual scalar product and norm denoted as〈.|.〉 and‖ · ‖, respectively. LetK

be an integer greater than or equal toL. A family of vectors(ek)1≤k≤K in the finite-dimensional space

RL is a frame when there exists a constantµ in ]0,+∞[ such that1

∀y ∈ RL, µ‖y‖2 ≤

K∑

k=1

| 〈y|ek〉 |2. (1)

1The classical upper bound condition is always satisfied in finite dimension. In this case, the frame condition is also equivalent

to saying that the frame operator has full rankL.

October 26, 2018 DRAFT

Page 4: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

4

If the inequality (1) becomes an equality,(ek)1≤k≤K is called atight frame. The bounded linear frame

analysis operatorF and the adjoint synthesis frame operatorF ∗ are defined as

F : RL → RK (2)

y 7→ (〈y|ek〉)1≤k≤K ,

F ∗ : RK → RL (3)

(ξk)1≤k≤K 7→K∑

k=1

ξkek.

Note thatF is injective whereasF ∗ is surjective. WhenF−1 = F ∗, (ek)k∈K is an orthonormal basis. A

simple example of a redundant frame is the union ofM > 1 orthonormal bases. In this case, the frame

is tight with µ = M and thus, we haveF ∗F = MI whereI is the identity operator.

B. Frame representation

An observed signaly ∈ RL can be written according to its frame representation (FR) involving

coefficientsx ∈ RK as follows

y = F ∗x+n (4)

wheren is the error between the observed signaly and its FRF ∗x. This error is modeled by imposing

thatx belongs to the closed convex set

Cδ = {x ∈ RK | N(y − F ∗x) ≤ δ} (5)

whereδ ∈ [0,∞[ is some error bound andN(.) can be any norm onRL.

In signal/image recovery problems,n is nothing but an additive noise that corrupts the measured data.

By adopting a probabilistic approach,y andx are assumed to be realizations of random vectorsY and

X. In this context, our goal is to characterize the probability distribution ofX|Y , by considering some

parametric probabilistic model and by estimating the associated hyper-parameters.

A useful example where this characterization may be of greatinterest is frame-based signal/image

denoising in a Bayesian framework. Actually, denoising in the wavelet domain using wavelet frame

decompositions has already been investigated since the seminal work [31] as this kind of representation

provides sparse description of regular signals. The related hyper-parameters have then to be estimated.

When F is bijective andδ = 0, this estimation can be performed by inverting the transform so as

to deducex from y and by resorting to standard estimation techniques onx. However, as mentioned

in Section II-A, for redundant frames,F ∗ is not bijective, which makes the hyper-parameter estimation

October 26, 2018 DRAFT

Page 5: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

5

problem more difficult since deducingx from y is no longer unique. This paper presents hierarchical

Bayesian algorithms to address this issue.

III. H IERARCHICAL BAYESIAN MODEL

In a Bayesian framework, we first need to define prior distributions for the frame coefficients. For

instance, this prior may be chosen so as to promote the sparsity of the representation. In the following,

f(x|θ) denotes the pdf of the frame coefficients that depends on an unknown hyper-parameter vector

θ and f(θ) is the a priori pdf of the hyper-parameter vectorθ. In compliance with the observation

model (4),f(y|x) is a uniform distribution on the closed convex setDδ defined as

Dδ = {y ∈ RL | N(y − F ∗x) ≤ δ} (6)

whereδ > 0. Denoting byΘ the random variable associated with the hyper-parameter vectorθ and using

the hierarchical structure betweenY ,X andΘ, the conditional distribution of(X,Θ) givenY can be

written as

f(x,θ|y) ∝ f(y|x)f(x|θ)f(θ) (7)

where∝ meansproportional to.

In this work, we assume that frame coefficients are a priori independent with marginal GG distributions.

This assumption has been successfully used in many studies [32–37] and leads to the following frame

coefficient prior

f(xk|αk, βk) =βk

2αkΓ(1/βk)exp

(−|xk|

βk

αβk

k

)(8)

whereαk > 0, βk > 0 (with k ∈ {1, . . . ,K}) are the scale and shape parameters associated withxk,

which is thekth component of the frame coefficient vectorx andΓ(.) is the Gamma function. Note that

small values of the shape parameters are appropriate for modelling sparse signals. When∀k ∈ {1, . . . ,K},

βk = 1, a Laplace prior is obtained, which was shown to play a central role in sparse signal recovery

[38] and compressed sensing [39].

By introducing∀k ∈ {1, . . . ,K}, γk = αβk

k , the frame prior can be rewritten as2

f(xk|γk, βk) =βk

2γ1/βk

k Γ(1/βk)exp

(−|xk|

βk

γk

). (9)

The distribution of a frame coefficient generally differs from one coefficient to another. However, some

frame coefficients can have very similar distributions (that can be defined by the same hyper-parameters

2The interest of this new parameterization will be clarified in Section IV.

October 26, 2018 DRAFT

Page 6: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

6

βk andγk). As a consequence, we propose to split the frame coefficients into G different groups. The

gth group will be parameterized by a unique hyper-parameter vector denoted asθg = (βg, γg) (after the

reparameterization mentioned above). In this case, the frame prior can be expressed as

f(x|θ) =G∏

g=1

(

βg

2γ1/βg

g Γ(1/βg)

)ng

exp

− 1

γg

k∈Sg

|xk|βg

(10)

where the summation covers the index setSg of the elements of thegth group containingng elements and

θ = (θ1, . . . ,θG). Note that in our simulations, each groupg will correspond to a given wavelet subband.

A coarser classification may be made when using multiscale frame representations by considering that

all the frame coefficients at a given resolution level belongto a same group.

The hierarchical Bayesian model for the frame decomposition is completed by the following improper

hyperprior

f(θ) =G∏

g=1

f(θg)

=

G∏

g=1

[f(γg)f(βg)]

∝G∏

g=1

[1

γg1R+(γg)1[0,3](βg)

](11)

where for a setA ⊂ R,

1A(ξ) =

1 if ξ ∈ A

0 otherwise.(12)

The motivations for using this kind of prior are summarized below:

• the interval[0, 3] covers all possible values ofβg encountered in practical applications. Moreover,

there is no additional information about the parameterβg.

• The prior for the parameterγg is a Jeffrey’s distribution that reflects the absence of knowledge about

this parameter [40]. This kind of prior is often used for scale parameters.

The resulting posterior distribution is therefore given by

f(x,θ|y) ∝ 1Cδ(x)

G∏

g=1

(

βg

2γ1/βg

g Γ(1/βg)

)ng

exp

− 1

γg

k∈Sg

|xk|βg

1

γg1R+(γg)1[0,3](βg)

. (13)

The Bayesian estimators (e.g., the maximum a posteriori (MAP) or minimum mean square error

(MMSE) estimators) associated with the posterior distribution (13) have no simple closed-form expression.

The next section studies different sampling strategies forgenerating samples distributed according to the

October 26, 2018 DRAFT

Page 7: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

7

posterior distribution (13). The generated samples will beused to estimate the unknown model parameter

and hyper-parameter vectorsx andθ.

IV. SAMPLING STRATEGIES

This section proposes different MCMC methods to generate samples distributed according to the

posteriorf(x,θ|y) defined in (13).

A. Hybrid Gibbs Sampler

A very standard strategy to sample according to (7) is provided by the Gibbs sampler. The Gibbs

sampler iteratively generates samples distributed according to conditional distributions associated with

the target distribution. More precisely, the basic Gibbs sampler iteratively generates samples distributed

according tof(x|θ,y) andf(θ|x,y).

1) Sampling the frame coefficients:

Straightforward calculations yield the following conditional distribution

f(x|θ,y) ∝ 1Cδ(x)

G∏

g=1

exp

− 1

γg

k∈Sg

|xk|βg

(14)

whereCδ is defined in (5). This conditional distribution is a productof GG distributions truncated on

Cδ. Actually, sampling according to this truncated distribution is not always easy to perform since the

adjoint frame operatorF ∗ is usually of large dimension. However, two alternative sampling strategies

are detailed in what follows.

a) Naive sampling:

This sampling method proceeds by sampling according to independent GG distributions

G∏

g=1

exp

− 1

γg

k∈Sg

|xk|βg

(15)

and then accepting the proposed candidatex only if N(y−F ∗x) ≤ δ. This method can be used for any

frame decomposition and any norm. However, it can be quite inefficient because of a very low acceptance

ratio, especially whenδ takes small values.

b) Gibbs sampler:

This sampling method is designed to sample more efficiently from the conditional distribution in (14)

when the considered frame is the union ofM orthonormal bases andN(.) is the Euclidean norm. In

October 26, 2018 DRAFT

Page 8: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

8

this case, the analysis frame operator and the corresponding adjoint can be written asF =

F1

...

FM

and

F ∗ = [F ∗1 . . . F ∗

M ], respectively, where∀m ∈ {1, . . . ,M}, Fm is the decomposition operator onto the

mth orthonormal basis such asF ∗mFm = FmF ∗

m = I.

Every x ∈ RK with K = ML, can be decomposed asx = [x⊤

1 , . . . ,x⊤

M ]⊤ where∀m ∈ {1, . . . ,M},

xm ∈ RL.

The Gibbs sampler for the generation of frame coefficients draws vectors according to the conditional

distributionf(xn|x−n,y,θ) under the constraintN(y−F ∗x) ≤ δ, wherex−n is the reduced size vector

of dimensionRK−L built from x by removing thenth vectorxn. If N(.) is the Euclidean norm, we

have∀n ∈ {1, . . . ,M},

N(y −M∑

m=1

F ∗mxm) ≤ δ

⇔ ‖ F ∗n(Fny −

M∑

m=1

FnF∗mxm) ‖≤ δ

⇔ ‖ Fny −∑

m6=n

FnF∗mxm − xn ‖≤ δ (since ∀z ∈ R

L, ‖ F ∗nz ‖=‖ z ‖)

⇔N(xn − cn) ≤ δ, (16)

where

cn = Fn

(y −

m6=n

F ∗mxm

).

Having x−n = (xm)m6=n, it is thus easy to compute the vectorcn. To sample eachxn, we propose to

use an MH step whose proposal distribution is supported on the ballBcn,δ defined by

Bcn,δ = {a ∈ RL | N(a− cn) ≤ δ}. (17)

Random generation from a pdfqδ defined onB0,δ which has a simple expression is described in

Appendix A. Having a closed form expression of this pdf is important to be able to calculate the

acceptance ratio of the MH move. To take into account the value of x(i−1)n obtained at the previous

iteration(i−1), it may however be preferable to choose a proposal distribution supported on a restricted

ball of radiusη ∈]0, δ[ containingx(i−1)n . This strategy similar to the random walk MH algorithm [15,

p. 287] results in a better exploration of regions associated with large values of the conditional distribution

f(x|θ,y).

October 26, 2018 DRAFT

Page 9: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

9

More precisely, we propose to choose a proposal distribution defined onBx(i−1)n ,η, where x(i−1)

n =

P (x(i−1)n − cn) + cn andP is the projection onto the ballB0,δ−η defined as

∀a ∈ RL, P (a) =

a if N(a) ≤ δ − η

δ − η

N(a)a otherwise.

(18)

This choice of the center of the ball guarantees thatBx(i−1)n ,η ⊂ Bcn,δ. Moreover, any point ofBcn,δ can

be reached after consecutive draws inBx(i−1)n ,η. Note that the radiusη has to be adjusted to ensure a

good exploration ofBcn,δ. In practice, it may also be interesting to fix a small enough value ofη so as

to improve the acceptance ratio.

Remark:

Alternatively, a Gibbs sampler can be used to draw successively theL elements(xn,l)1≤l≤L of xn under

the following constraint

‖ xn − cn ‖≤ δ

⇔−

√δ2 −

k 6=l

(xn,k − cn,k)2 ≤ xn,l − cn,l ≤

√δ2 −

k 6=l

(xn,k − cn,k)2, ∀l ∈ {1, . . . , L}

(19)

wherecn,k is thekth element of the vectorcn (see [41, p.133] for related strategies). However, this method

is very time-consuming since it proceeds sequentially for each component of the high dimensional vector

x.

2) Sampling the hyper-parameter vector:

Instead of samplingθ according tof(θ|x,y), we propose to iteratively sample according tof(γg|βg,x,y)

andf(βg|γg,x,y). Straightforward calculations allow us to obtain the following results

f(γg|βg,x,y) ∝ γ−

ng

βg−1

g exp

− 1

γg

k∈Sg

|xk|βg

1R+(γg), (20)

f(βg|γg,x,y) ∝βng

g

γng/βg

g [Γ (1/βg)]ng

exp

− 1

γg

k∈Sg

|xk|βg

1[0,3](βg). (21)

Consequently, due to the new parameterization introduced in (9), f(γg|βg,x,y) is the pdf of the inverse

gamma distributionIG(ng

βg,∑

k∈Sg|xk|

βg

)that is easy to sample. Conversely, it is more difficult to

sample according to the truncated pdff(βg|γg,x,y). This is achieved by using an MH move whose

proposalq(βg | β(i−1)g ) is a Gaussian distribution truncated on the interval[0, 3] with standard deviation

October 26, 2018 DRAFT

Page 10: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

10

σβg= 0.05 [42]. Note that the mode of this distribution is the value of the parameterβ(i−1)

g at the

previous iteration(i− 1).

The resulting method is the hybrid Gibbs sampler summarizedin Algorithm 1.

➀ Initialize with someθ(0) = (θ(0)g )1≤g≤G = (γ

(0)g , β

(0)g )1≤g≤G andx(0) ∈ Cδ, and seti = 1.

➁ Samplingx

For n = 1 to M

– Computec(i)n = Fn

(y −

∑m<n F

∗mx

(i)m −

∑m>n F

∗mx

(i−1)m

)

and x(i−1)n = P (x

(i−1)n − c

(i)n ) + c

(i)n .

– Simulatex(i)n as follows:

∗ Generatex(i)n ∼ qη(xn − x(i−1)

n ) whereqη is defined onB0,η (see Appendix A).

∗ Compute the ratio

r(x(i)n ,x

(i−1)n ) =

f(ex(i)n |θ(i−1),(x(i)

m )m<n,(x(i−1)m )m>n,y) qη

(x

(i−1)n −P (ex(i)

n −c(i)n )−c

(i)n

)

f(x(i−1)n |θ(i−1),(x(i)

m )m<n,(x(i−1)m )m>n,y) qη

(ex(i)

n −x(i−1)n

)

and accept the proposed candidate with the probabilitymin{1, r(x(i)n ,x

(i−1)n )}.

➂ Samplingθ

For g = 1 to G

– Generateγ(i)g ∽ IG(

ng

β(i−1)g

,∑

k∈Sg|x

(i)k |

β(i−1)g

).

– Simulateβ(i)g as follows:

∗ Generateβ(i)g ∽ q(βg | β

(i−1)g )

∗ Compute the ratio

r(β(i)g , β(i−1)

g ) =f(β

(i)g |γ

(i)g ,x(i),y)q(β

(i−1)g | β

(i)g )

f(β(i−1)g |γ

(i)g ,x(i),y)q(β

(i)g | β

(i−1)g )

and accept the proposed candidate with the probabilitymin{1, r(β(i)g , β

(i−1)g )}.

➃ Set i← i+ 1 and goto➁ until convergence.

Algorithm 1: Proposed Hybrid Gibbs sampler to simulate according tof(x,θ|y) (superscript·(i)

indicates values computed at iteration numberi).

Although this algorithm is intuitive and simple to implement, it must be pointed out that it was derived

October 26, 2018 DRAFT

Page 11: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

11

under the restrictive assumption that the considered frameis the union ofM orthonormal bases. When

this assumption does not hold, another algorithm proposed in the next section allows us to sample frame

coefficients and the related hyper-parameters by exploiting algebraic properties of frames.

B. Hybrid MH sampler using algebraic properties of frame representations

As a direct generation of samples according tof(x|θ,y) is generally impossible, we propose here an

alternative that replaces the Gibbs move by an MH move. This MH move aims at sampling globally a

candidatex according to a proposal distribution. This candidate is accepted or rejected with the standard

MH acceptance ratio. The efficiency of the MH move strongly depends on the choice of the proposal

distribution for x. We denote asx(i) the ith accepted sample of the algorithm andq(x | x(i−1)) the

proposal that is used to generate a candidate at iterationi. The main difficulty for choosingq(x | x(i−1))

stems from the fact that it must guarantee thatx ∈ Cδ (as mentioned in Section II-B) while yielding a

tractable expression ofq(x(i−1) | x)/q(x | x(i−1)).

For this reason, we propose to exploit the algebraic properties of frame representations. More precisely,

any frame coefficient vector can be decomposed asx = xH +xH⊥ , wherexH andxH⊥ are realizations

of random vectors taking their values inH = Ran(F ) andH⊥ = [Ran(F )]⊥ = Null(F ∗), respectively.3

The proposal distribution used in this paper allows us to generate samplesxH ∈ H andxH⊥ ∈ H⊥.

More precisely, the following separable form of the proposal pdf will be considered

q(x | x(i)) = q(xH | x

(i−1)H

)q(xH⊥ | x

(i−1)H⊥

)(22)

wherex(i−1)H ∈ H, x(i−1)

H⊥ ∈ H⊥ andx(i−1) = x(i−1)H +x

(i−1)H⊥ . In other words, independent sampling of

xH andxH⊥ will be performed.

If we consider the decompositionx = xH+xH⊥ , samplingx in Cδ is equivalent to samplingλ ∈ Cδ,

whereCδ = {λ ∈ RL|N(y − F ∗Fλ) ≤ δ} is the inverse image ofCδ underF .

Indeed, we can writexH = Fλ whereλ ∈ RL and, sincexH⊥ ∈ Null(F ∗), F ∗x = F ∗Fλ. Sampling

λ in Cδ can be easily achieved, e.g., by generatingu from a distribution supported on the ballBy,δ and

by takingλ = (F ∗F )−1u.

To make the sampling ofxH at iteration i more efficient, taking into account the sampled value at

the previous iterationx(i−1)H = Fλ(i−1) = F (F ∗F )−1u(i−1) may be interesting. Similarly to Sec-

tion IV-A1b), and to random walk generation techniques, we proceed by generating randomlyu in

3We recall that the range ofF is Ran(F ) = {x ∈ RK |∃y ∈ R

L, Fy = x} and the null space ofF ∗ is Null(F ∗) = {x ∈

RK |F ∗x = 0}.

October 26, 2018 DRAFT

Page 12: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

12

Bu(i−1),η whereη ∈]0, δ[ and u(i−1) = P (u(i−1) − y) + y. This allows us to draw a vectoru such that

xH = F (F ∗F )−1u ∈ Cδ and N(u − u(i−1)) ≤ 2η. The generation ofu can then be performed as

explained in Appendix A provided thatN(.) is anℓp norm with p ∈ [1,+∞].

Once we have simulatedxH = Fλ ∈ H ∩Cδ (which ensures thatx is in Cδ), xH⊥ has to be sampled

as an element ofH⊥. Sincey = F ∗x + n = F ∗xH + n, there is no information iny aboutxH⊥ .

As a consequence, and for simplicity reasons, we propose to samplexH by drawingz according to the

Gaussian distributionN (x(i−1), σ2xI) and by projectingz ontoH⊥, i.e.,

xH⊥ = ΠH⊥z (23)

whereΠH⊥ = I− F (F ∗F )−1F ∗ is the orthogonal projection operator ontoH⊥.

Note here that using a tight frame makes the computation of both xH andxH⊥ much easier due to

the relationF ∗F = µI.

Let us now derive the expression of the proposal pdf. It can benoticed that, ifK > L, there exists a

linear operatorF⊥ from RK−L to R

L which is semi-orthogonal (i.e.,F ∗⊥F⊥ = I) and orthogonal toF

(i.e., F ∗⊥F = 0), such that

x = Fλ︸︷︷︸xH

+ F⊥λ⊥︸ ︷︷ ︸xH⊥

(24)

andλ⊥ = F ∗⊥x ∈ R

K−L. Standard rules on bijective linear transforms of random vectors lead to

q(x | x(i−1)) = |det([F F⊥]

)|−1q(λ | x(i−1))q(λ⊥ | x

(i−1)) (25)

where, due to the bijective linear mapping betweenλ andu = F ∗Fλ

q(λ | x(i−1)) = det(FF ∗) qη(u− u(i−1)) (26)

andq(λ⊥ | x(i−1)) is the pdf of the Gaussian distributionN (λ

(i−1)⊥ , σ2

xI) with meanλ(i−1)⊥ = F ∗

⊥x(i−1).

Recall thatqη denotes a distribution on the ballB0,η as expressed in Appendix A. Due to the symmetry

of the Gaussian distribution, it can be deduced that

q(x(i−1) | x)

q(x | x(i−1))=

qη(u(i−1) − P (u− y)− y)

qη(u− u(i−1)). (27)

This expression remains valid in the degenerate case whenK = L (yielding xH⊥ = 0). Finally, it is

important to note that, ifqη can be chosen as a uniform distribution on the ballB0,η, the above ratio

reduces to1, which simplifies the computation of the MH acceptance ratio.

The final algorithm is summarized in Algorithm 2. Note that the sampling of the hyper-parameter vector

is performed as for the hybrid Gibbs sampler in Section IV-A2.

October 26, 2018 DRAFT

Page 13: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

13

➀ Initialize with someθ(0) = (θ(0)g )1≤g≤G = (γ

(0)g , β

(0)g )1≤g≤G andu(0) ∈ By,δ. Setx(0) =

F (F ∗F )−1u(0) and i = 1.

➁ Samplingx

– Computeu(i−1) = P (u(i−1) − y) + y.

– Generateu(i)∽ qη(u− u(i−1)) whereqη is defined onB0,η (see Appendix A).

– Computex(i)H = F (F ∗F )−1u(i).

– Generatez(i) ∽ N (x(i−1), σ2xI).

– Computex(i)H⊥ = ΠH⊥z(i) and x(i) = x

(i)H + x

(i)H⊥ .

– Compute the ratio

r(x(i),x(i−1)) =f(x(i)|θ(i−1),y) qη

(u(i−1) − P (u(i) − y)− y

)

f(x(i−1)|θ(i−1),y) qη(u(i) − u(i−1)

)

and accept the proposed candidatesu(i) and x(i) with probability

min{1, r(x(i),x(i−1))}.

➂ Samplingθ

For g = 1 to G

– Generateγ(i)g ∽ IG(

ng

β(i−1)g

,∑

k∈Sg|x

(i)k |

β(i−1)g

).

– Simulateβ(i)g as follows

∗ Generateβ(i)g ∽ q(βg | β

(i−1)g )

∗ Compute the ratio

r(β(i)g , β(i−1)

g ) =f(β

(i)g |γ

(i)g ,x(i),y)q(β

(i−1)g | β

(i)g )

f(β(i−1)g |γ

(i)g ,x(i),y)q(β

(i)g | β

(i−1)g )

and accept the proposed candidate with the probabilitymin{1, r(β(i)g , β

(i−1)g )}.

➃ Set i← i+ 1 and goto➁ until convergence.

Algorithm 2: Proposed Hybrid MH sampler using algebraic properties of frame representations

to simulate according tof(x,θ|y).

Experimental estimation results and applications to some image denoising problems of the proposed

stochastic sampling techniques are provided in the next section.

October 26, 2018 DRAFT

Page 14: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

14

V. SIMULATION RESULTS

A. Validation experiments

1) Example 1:

To show the effectiveness of our algorithm, a first set of experiments was carried out on synthetic

images. As a frame representation, we used the union of two 2Dseparable wavelet basesB1 andB2

using Daubechies and shifted Daubechies filters of length 8 and 4, respectively. Theℓ2 norm was used

for N(·) in (4) with δ = 10−4. To generate a synthetic image, we synthesized wavelet frame coefficients

x from known prior distributions.

Let x1 = (a1, (h1,j , v1,j, d1,j)1≤j≤2) andx2 = (a2, (h2,j , v2,j , d2,j)1≤j≤2) be the sequences of wavelet

basis coefficients generated inB1 andB2, wherea, h, v, d stand for approximation, horizontal, vertical

and diagonal coefficients and the indexj designates the resolution level. Wavelet frame coefficients have

been generated from a GG distribution in accordance with thechosen priors. The coefficients in each

subband have been modeled with the same values of the hyper-parametersαg and βg, which means

that each subband forms a group of indexg. The number of groups (i.e. the number of subbands)G is

therefore equal to14. A uniform prior distribution over[0, 3] has been chosen for parameterβg whereas

a Jeffrey’s prior has been assigned to each parameterγg.

After generating the hyper-parameters from their prior distributions, a set of frame coefficients is

randomly generated to synthesize the observed data. The hyper-parameters are then supposed unknown,

sampled using the proposed algorithm, and estimated by computing the mean of the generated samples

according to the MMSE principle. Having reference values, the normalized mean square erors (NMSEs)

related to the estimation of each hyper-parameter belonging to a given group (here a given subband) have

been computed from30 Monte Carlo runs. The NMSEs computed for the estimators associated with the

two samplers of Sections IV-A and IV-B are reported in Table I.

October 26, 2018 DRAFT

Page 15: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

15

TABLE I

NMSES FOR THE ESTIMATED HYPER-PARAMETERS(30 RUNS).

NMSE

Sampler 1 Sampler 2

β α β α

h1,1 0.015 0.006 0.012 0.030

v1,1 0.022 0.021 0.022 0.026

d1,1 0.06 0.016 0.011 0.044

h1,2 0.04 0.003 0.021 0.026

v1,2 0.020 0.027 0.020 0.019

d1,2 0.013 0.016 0.023 0.041

a1 0.039 0.08 0.039 0.023

h2,1 0.015 0.030 0.015 0.025

v2,1 0.051 0.07 0.025 0.031

d2,1 0.027 0.039 0.029 0.023

h2,2 0.040 0.024 0.016 0.034

v2,2 0.08 0.019 0.013 0.022

d2,2 0.05 0.015 0.011 0.040

a2 0.010 0.064 0.010 0.028

Table I shows that the proposed algorithms (using Sampler1 of Section IV-A and Sampler2 of

Section IV-B) provide accurate estimates of the hyper-parameters. The two samplers perform similarly

for this experiment. However, one advantage of Sampler 2 is that it can be applied to different kinds

of redundant frames, unlike Sampler 1. Indeed, as reported in Section IV-A, the conditional distribution

(14) is generally difficult to sample when the frame representation is not the union of orthonormal bases.

Two examples of empirical histograms of known reference wavelet frame coefficients (corresponding

to B1) and pdfs with estimated hyper-parameters are plotted in Fig. 1 to illustrate the good performance

of the estimator.

October 26, 2018 DRAFT

Page 16: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

16

a1: β = 1.7, γ = 104 h1,2: β = 1.98, γ = 143.88

−60 −40 −20 0 20 40 600

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

−80 −60 −40 −20 0 20 40 60 800

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Fig. 1. Examples of empirical approximation (left) and detail (right) histograms and pdfs of frame coefficients corresponding

to a synthetic image.

2) Example 2:

In this experiment, another frame representation is considered, namely a tight frame version of the

translation invariant wavelet transform [43] with Daubechies filters of length 8. Theℓ2 norm was also

used forN(.) in (4) with δ = 10−4. Let x = (a, (hj , vj , dj)1≤j≤2) denote the frame coefficients vector.

We used the same process to generate frame coefficients as forExample 1. The coefficients in each

subband (i.e. each group) have been modeled with the same values of the hyper-parametersγg andβg,

the number of groups being equal to7. The same priors for the hyper-parametersγg and βg as for

Example 1 have been used.

After generating the hyper-parameters and frame coefficients, the hyper-parameters are then supposed

unknown, sampled using the proposed algorithm, and estimated using the MMSE estimator. Table II

shows NMSEs based on reference values of each hyper-parameter. Note that Sampler 1 is difficult to

be implemented in this case because of the used frame properties. Consequently, only NMSE values for

Sampler 2 have been reported in Table II.

B. Convergence results

To be able to automatically stop the simulated chain and ensure that the last simulated samples are

appropriately distributed according to the posterior distribution of interest, a convergence monitoring

technique based on the potential scale reduction factor (PSRF) has been used by simulating several chains

in parallel (see [44] for more details). Using the union of two orthonormal bases as a frame representation,

Figs. 2 and 3 show examples of convergence profiles corresponding to the hyper-parametersβ and γ

when two chains are sampled in parallel using Sampler 2.

Based on these values of the PSRF, the algorithm was stopped after about150, 000 iterations (burn-in

October 26, 2018 DRAFT

Page 17: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

17

TABLE II

NMSES FOR THE ESTIMATED HYPER-PARAMETERS USINGSAMPLER 2 (30RUNS).

NMSE

β α

h1 0.05 0.027

v1 0.024 0.007

d1 0.05 0.014

h2 0.037 0.028

v2 0.051 0.044

d2 0.04 0.012

a 0.04 0.05

period of100, 000 iterations), which corresponds to about4 hours of computational time using Matlab 7.7

on an Intel Core 4 (3 GHz) architecture. When comparing the two proposed samplers in terms of

convergence speed, it turns out from our simulations that Sampler 1 shows faster convergence than

Sampler 2. Indeed, Sampler 1 needs about110, 000 iterations to converge, which reduces the global

computational time to about 3 hours.

Chain1 Chain2

0 5 10 15

x 104

1.8

1.9

2

2.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8

0 5 10 15

x 104

1.8

2

2.2

2.4

2.6

2.8

3

β = 2, 3 - PSRF = 1.02

0 5 10 15

x 104

0

100

200

300

400

500

600

0 5 10 15

x 104

0

100

200

300

400

500

600

γ = 185 - PSRF = 0.96

Fig. 2. Ground truth values and sample path for the hyper-parametersβ andγ related tov1,1 in B1.

October 26, 2018 DRAFT

Page 18: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

18

Chain1 Chain2

0 5 10 15

x 104

1.4

1.5

1.6

1.7

1.8

1.9

2

2.1

2.2

0 5 10 15

x 104

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

β = 1.81 - PSRF = 0.98

0 5 10 15

x 104

10

20

30

40

50

60

70

0 5 10 15

x 104

0

10

20

30

40

50

60

70

80

90

100

γ = 31.5 - PSRF = 1.03

Fig. 3. Ground truth values and sample path for the hyper-parametersβ andγ related tov2,2 in B2.

The posterior distributions of the hyper-parametersβ andγ related to the subbandsh1,2 andh2,2 in

B1 andB2 introduced in Section V-A1 are shown in Fig. 4, as well as the known original values. It is

clear that the mode of the posterior distributions is aroundthe ground truth value, which confirms the

good estimation performance of the proposed approach.

Note that when the resolution level increases, the number ofsubbands also increases, which leads to a

higher number of hyper-parameters to be estimated and a potential increase of the required computational

time to reach convergence. For example, when using the unionof two orthonormal wavelet bases with

two resolution levels, the number of hyper-parameters to estimate is28.

October 26, 2018 DRAFT

Page 19: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

19

B1 B2

0 50 100 150 200 250 3000

1000

2000

3000

4000

5000

6000

7000

f(α |

y)

0 50 100 150 200 250 3000

1000

2000

3000

4000

5000

6000

7000

8000

f(α |

y)

γ = 85.5 γ = 24.07

1 1.5 2 2.50

1000

2000

3000

4000

5000

6000

7000

8000

f(β |

y)

0 0.5 1 1.5 2 2.50

1000

2000

3000

4000

5000

6000

7000

f(β |

y)

β = 1.87 β = 1.35

Fig. 4. Ground truth values (dashed line) and posterior distributions (solid line) of the sampled hyper-parametersγ andβ, for

the subbandsh1,2 andh2,2 in B1 andB2, repectively.

C. Application to image denoising

1) Example 1:

In this experiment, we are interested in recovering an image(the Boat image of size256×256) from its

noisy observation affected by a noisen uniformly distributed over the ball[−δ, δ]256×256 with δ = 30.

We recall that the observation model for this image denoising problem is given by (4). The noisy image

in Fig. 5 (b) was simulated using the available reference imageyref in Fig. 5 (a) and the noise properties

described above.

The union of two2D separable wavelet basesB1 and B2 using Daubechies and shifted Daubechies

filters of length8 and 4 (as for validation experiments in Section V-A) was used as a tight frame

representation. Denoising was performed using the MMSE denoted asx computed from sampled wavelet

October 26, 2018 DRAFT

Page 20: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

20

frame coefficients. The adjoint frame operator is then applied to recover the denoised image from its

denoised estimated wavelet frame coefficients (y = F ∗x). The obtained denoised image is depicted in

Fig. 5 (d). For comparison purpose, the denoised image usinga variational approach [45, 46] based on

a MAP criterion using the estimated values of the hyper-parameters with our approach is illustrated in

Fig. 5 (c). This comparison shows that, for denoising purposes, the proposed method gives better visual

quality than the other reported methods.

Signal to noise ratio (SNR = 20 log10(‖yref‖/‖yref−y‖

)) and structural similarity (SSIM) [47] values

are also given in Table III to quantitatively evaluate denoising performance. Note here that SSIM values

must lie in [0, 1], high values indicating good image quality.

An additional comparison with respect to Wiener filtering isgiven in this table. The SNR and SSIM

values are given for three additional test images with different textures and contents to better illustrate

the good performance of the proposed approach. The corresponding original, noisy and denoised images

are displayed in Figs. 6, 7 and 8.

TABLE III

SNRAND SSIM VALUES FOR THE NOISY AND DENOISED IMAGES.

Noisy Wiener Variational MCMC

SNR (dB) 16.67 18.02 18.41 19.20

Boat SSIM 0.521 0.553 0.570 0.614

SNR (dB) 18.53 19.27 20.55 20.77

Marseille SSIM 0.797 0.802 0.824 0.866

SNR (dB) 17.69 19.63 21.79 22.13

Lenna SSIM 0.496 0.583 0.671 0.695

SNR (dB) 21.23 21.64 22.40 22.67

Peppers SSIM 0.754 0.781 0.807 0.811

It is worth noticing that the visual quality and quantitative results show that the denoised image based

on the MMSE estimate of the wavelet frame coefficients is better than the one obtained with the Wiener

filtering or the variational approach. For the latter approach, it must be emphasized that the choice of

the hyper-parameters always constitutes a delicate problem, for which our algorithm brings a numerical

solution.

October 26, 2018 DRAFT

Page 21: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

21

2) Example 2:

In this experiment, we are interested in recovering an image(the Straw image of size128× 128) from

its noisy observation affected by a noisen uniformly distributed over the centeredℓp ball of radiusη

whenp ∈ {1, 2, 3}. The translation invariant wavelet transform was used as a frame decomposition with

a Symmlet filter of length 8 over 3 resolution levels. Theℓp norm (p ∈ {1, 2, 3}) was used forN(·) in

(4). Figs. 9 (a) and 9 (b) show the original and noisy images using a uniform noise over theℓ2 ball of

radius1600. Figs. 9 (c) and 9 (d) illustrate the denoising strategies based on the variational approach

and the MMSE estimator using frame coefficients sampled withour algorithm.

Table IV illustrates the SNR and SSIM values for noisy and denoised images using the proposed

MMSE estimator with uniformly distributed noise for different values ofp andη.

TABLE IV

SNRAND SSIM VALUES FOR THE NOISY AND DENOISED IMAGES.

Noisy Wiener Variational MCMC

η = 300000 SNR (dB) 15.56 16.42 16.67 18.11

p = 1 SSIM 0.719 0.705 0.730 0.755

η = 3000 SNR (dB) 16.46 17.03 17.84 19.02

p = 2 SSIM 0.749 0.720 0.758 0.796

η = 700 SNR (dB) 16.14 17.05 17.65 19.29

p = 3 SSIM 0.734 0.720 0.671 0.771

October 26, 2018 DRAFT

Page 22: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

22

This second set of image denoising experiments shows that the proposed approach performs well when

using different kinds of frame representations and variousnoise properties.

(a) (b)

(c) (d)

Fig. 5. OriginalBoat image (a), noisy image (b), denoised images using a variational approach (c) and the proposed MMSE

estimator (d).

October 26, 2018 DRAFT

Page 23: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

23

(a) (b)

(c) (d)

Fig. 6. Original Marseille image (a), noisy image (b), denoised images using a variational approach (c) and the proposed

MMSE estimator (d).

October 26, 2018 DRAFT

Page 24: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

24

(a) (b)

(c) (d)

Fig. 7. OriginalLennaimage (a), noisy image (b), denoised images using a variational approach (c) and the proposed MMSE

estimator (d).

October 26, 2018 DRAFT

Page 25: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

25

(a) (b)

(c) (d)

Fig. 8. OriginalPeppersimage (a), noisy image (b), denoised images using a variational approach (c) and the proposed MMSE

estimator (d).

October 26, 2018 DRAFT

Page 26: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

26

(a) (b)

(c) (d)

Fig. 9. Original image (a), noisy image (b) and denoised images using the variational approach (c) and the proposed MMSE

estimator (d).

VI. CONCLUSION

This paper proposed a hierarchical Bayesian algorithm for frame coefficient from a noisy observation

of a signal or image of interest. The signal perturbation wasmodelled by introducing a bound on a

distance between the signal and its observation. A hierarchical model based on this maximum distance

property was then defined. This model assumed GG priors for the frame coefficients. Vague priors were

assigned to the hyper-parameters associated with the framecoefficient priors. Different sampling strategies

October 26, 2018 DRAFT

Page 27: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

27

were proposed to generate samples distributed according tothe joint distribution of the parameters and

hyper-parameters of the resulting Bayesian model. The generated samples were finally used for estimation

purposes. Our validation experiments showed that the proposed algorithms provide an accurate estimation

of the frame coefficients and hyper-parameters. The good quality of the estimates was confirmed on

statistical processing problems in image denoising with multivariate noise uniformly distributed on some

given ball. Despite its interest in dealing with bounded errors, this model was fewly investigated in the

wavelet denoising literature.

The hierarchical model studied in this paper assumed GG priors for the frame coefficients. However, the

proposed algorithm might be generalized to other classes ofprior models. Another direction of research

for future work would be to extend the proposed framework to situations where the observed signal is

degraded by a linear operator (e.g. blur operator).

APPENDIX A

SAMPLING ON THE UNIT ℓp BALL

This appendix explains how to sample vectors in the unitℓp ball (p ∈]0,+∞]) of RL. First, it is

interesting to note that sampling on the unit ball can be easily performed in the particular casep = +∞,

by sampling independently along each space coordinate according to a distribution on the interval[−1, 1].

Thus, this appendix focuses on the more difficult problem associated with a finite value ofp. In the

following, ‖ · ‖p denotes theℓp norm. We recall the following theorem:

Theorem A.1 [48]

LetA = [A1, . . . , AL′ ]⊤ be the random vector of i.i.d. components which have the followingGG(p1/p, p)

pdf

f(ai) =p1−1/p

2Γ(1/p)exp

(−|ai|

p

p

), ai ∈ R. (28)

Let U = [U1, . . . , UL′ ]⊤ = A/‖A‖p. Then, the random vectorU is uniformly distributed on the surface

of the ℓp unit sphere ofRL′

and the joint pdf ofU1, . . . , UL′−1 is

f(u1, . . . , uL′−1) =pL

′−1Γ(L′/p)

2L′−1(Γ(1/p))L′

(1−

L′−1∑

k=1

|uk|p

)(1−p)/p

1Dp,L′ (u1, ..., uL′−1) (29)

whereDp,L′ = {(u1, ..., uL′−1) ∈ RL′−1 |

∑L′−1k=1 |uk|

p < 1}.

The uniform distribution on the unitℓp sphere ofRL′

will be denoted byU(L′, p). The construction of

a random vector distributed within theℓp ball of RL with L < L′ can be derived from Theorem A.1 as

expressed below:

October 26, 2018 DRAFT

Page 28: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

28

Theorem A.2 [48]

Let U = [U1, . . . , UL′ ]⊤ ∼ U(L′, p). For everyL ∈ {1, . . . , L′ − 1}, the pdf ofV = [U1, . . . , UL]⊤ is

given by

q1(u1, · · · , uL) =pLΓ(L′/p)

2L(Γ(1/p))LΓ((L′ − L)/p)

(1−

L∑

k=1

|uk|p

)(L′−L)/p−1

1Dp,L+1(u1, ..., uL). (30)

In particular, if p ∈ N∗ andL′ = L+ p, we obtain the uniform distribution on the unitℓp ball of RL.

Sampling on anℓp ball of radiusη > 0 is straightforwardly deduced by scalingV .

REFERENCES

[1] H. Kawahara, “Signal reconstruction from modified auditory wavelet transform,”IEEE Trans. Signal Process., vol. 41,

no. 12, pp. 3549–3554, Dec. 1993.

[2] L. Chaari, J.-C. Pesquet, A. Benazza-Benyahia, and Ph.Ciuciu, “Autocalibrated parallel MRI reconstruction in the wavelet

domain,” in Proc. IEEE Int. Symposium on Biomedical Imaging (ISBI), Paris, France, May 14-17 2008, pp. 756–759.

[3] E. L. Miller, “Efficient computational methods for wavelet domain signal restoration problems,”IEEE Trans. Signal

Process., vol. 47, no. 2, pp. 1184–1188, Apr. 1999.

[4] C. Chaux, P. L. Combettes, J.-C. Pesquet, and V. Wajs, “A forward-backward algorithm for image restoration with sparse

representations,” inProc. Signal Processing with Adaptative Sparse StructuredRepresentations (SPARS), Rennes, France,

Nov. 16-18 2005, pp. 49–52.

[5] M. B. Martin and A. E. Bell, “New image compression techniques using multiwavelets and multiwavelet packets,”IEEE

Trans. Image Process., vol. 10, no. 4, pp. 500–510, Apr. 2001.

[6] M. De Meuleneire, “Algebraic quantization of transformcoefficients for embedded audio coding,” inProc. IEEE Int. Conf.

Acoust., Speech, and Signal Process. (ICASSP), Las Vegas, USA, Mar. 30 - Apr. 4 2008, pp. 4789–4792.

[7] S. Mallat, A wavelet tour of signal processing, Academic Press, New York, 1999.

[8] R. Coifman, Y. Meyer, and V. Wickerhauser, “Wavelet analysis and signal processing,”In Wavelets and their Applications,

pp. 153–178, 1992.

[9] S. Mallat, “Geometrical grouplets,”Appl. Comput. Harm. Anal., vol. 26, no. 2, pp. 161–180, Mar. 2009.

[10] E. J. Candes, , and D. L. Donoho, “Recovering edges in ill-posed inverse problems: optimality of curvelet frames,”Ann.

Stat., vol. 30, no. 3, pp. 784–842, 2002.

[11] F. Destrempes, J.-F. Angers, and M. Mignotte, “Fusion of hidden Markov random field models and its Bayesian estimation,”

IEEE Trans. Image Process., vol. 15, no. 10, pp. 2920–2935, Oct. 2006.

[12] E. Le Pennec and S. Mallat, “Sparse geometric image representations with bandelets,”IEEE Trans. Image Process., vol.

14, pp. 423–438, Apr. 2005.

[13] C. Chaux, L. Duval, and J.-C. Pesquet, “Image analysis using a dual-tree M-band wavelet transform,”IEEE Trans. Image

Process., vol. 15, no. 8, pp. 2397–2412, Aug. 2006.

[14] M. Seeger, S. Gerwinn, and M. Bethge, “Bayesian inference for sparse generalized linear models,”Machine Learning,

vol. 4701, pp. 298–309, Sep. 2007.

[15] C. Robert and G. Castella,Monte Carlo statistical methods, Springer, New York, 2004.

October 26, 2018 DRAFT

Page 29: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

29

[16] O. Cappe, “A Bayesian approach for simultaneous segmentation and classification of count data,”IEEE Trans. Signal

Process., vol. 50, no. 2, pp. 400–410, Feb. 2002.

[17] C. Andrieu, P. M. Djuric, and A. Doucet, “Model selection by MCMC computation,” Signal Processing, vol. 81, pp.

19–37, Jan. 2001.

[18] M. Ichir and A. Mohammad-Djafari, “Wavelet domain blind image separation,” inProc. SPIE Technical Conference on

Wavelet Applications in Signal and Image Processing X, San Diego, USA, Aug. 4-8 2003.

[19] A. Jalobeanu, L. Blanc-Feraud, and J. Zerubia, “Hyperparameter estimation for satellite image restoration using a MCMC

maximum likelihood method,”Pattern Recognition, vol. 35, no. 2, pp. 341–352, Nov. 2002.

[20] S. Makni, P. Ciuciu, J. Idier, and J.-B. Poline, “Joint detection-estimation of brain activity in functional MRI: amultichannel

deconvolution solution,”IEEE Trans. Signal Process., vol. 53, no. 9, pp. 3488–3502, Sep. 2005.

[21] S. Makni, J. Idier, T. Vincent, B. Thirion, G. Dehaene-Lambertz, and Ph. Ciuciu, “A fully Bayesian approach to the

parcel-based detection-estimation of brain activity in fMRI,” Neuroimage, vol. 41, no. 3, pp. 941–969, Jul. 2008.

[22] N. Dobigeon, A. O. Hero, and J.-Y. Tourneret, “Hierarchical Bayesian sparse image reconstruction with application to

MRFM,” IEEE Trans. Image Process., vol. 19, no. 9, pp. 2059–2070, Sept. 2009.

[23] T. Blumensath and M. E. Davies, “Monte Carlo methods foradaptive sparse approximations of time-series,”IEEE Trans.

Signal Process., vol. 55, no. 9, pp. 4474–4486, Sept. 2007.

[24] S. Zeger and R. Karim, “Generalized linear models with random effects: a Gibbs sampling approach,”J. American Statist.

Assoc., vol. 86, no. 413, pp. 79–86, Mar. 1991.

[25] L. Tierney, “Markov chains for exploring posterior distributions,” Ann. Stat., vol. 22, no. 4, pp. 1701–1762, Dec. 1994.

[26] W. K. Hastings, “Monte Carlo sampling methods using Markov chains and their applications,”Biometrika, vol. 57, no. 1,

pp. 97–109, Apr. 1970.

[27] S. Geman and D. Geman, “Stochastic relaxation, Gibbs distribution and the Bayesian restoration of image,”IEEE Trans.

Pattern Anal. Mach. Intell., vol. 6, no. 6, pp. 721–741, Nov. 1984.

[28] D. Leporini and J.-C. Pesquet, “Bayesian wavelet denoising: Besov priors and non-Gaussian noises,”Signal Process., vol.

81, no. 1, pp. 55–67, Jan. 2001.

[29] P. Mueller and B. Vidakovic, “MCMC methods in wavelet shrinkage: non-equally spaced regression, density and spectral

density estimation,”Bayesian Inference in Wavelet-Based Models, pp. 187–202, 1999.

[30] G. Heurta, “Multivariate Bayes wavelet shrinkage and applications,” J. Applied Stat., vol. 32, no. 5, pp. 529–542, Jul.

2005.

[31] D. L. Donoho, “Denoising by soft-thresholding,”IEEE Trans. Inf. Theory, vol. 41, no. 3, pp. 613–627, May 1995.

[32] S. G. Mallat, “A theory for multiresolution signal decomposition: The wavelet representation,”IEEE Trans. Pattern Anal.

Mach. Intell., vol. 11, no. 7, pp. 674–693, Jul. 1989.

[33] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies,“Image coding using wavelet transform,”IEEE Trans. Image

Process., vol. 1, no. 2, pp. 205220, Apr. 1992.

[34] R. L. Joshi, H. Jafarkhani, J. H. Kasner, T. R. Fischer, N. Farvardin, M. W. Marcellin, and R. H. Bamberger, “Comparison

of different methods of classification in subband coding of Images,” IEEE Trans. Image Process., vol. 6, no. 11, pp.

1473–1486, Nov. 1997.

[35] E. P. Simoncelli and E. H. Adelson, “Noise removal via Bayesian wavelet coring,” inProc. IEEE Int. Conf. Image Process.

(ICIP), Lausanne, Switzerland, Sep. 16-19 1996, pp. 379–382.

October 26, 2018 DRAFT

Page 30: 1 A Hierarchical Bayesian Model for Frame Representation representation is a crucial operation in many signal and image processing applications. These applications include signal and

30

[36] P. Moulin and J. Liu, “Analysis of multiresolution image denoising schemes using generalized-Gaussian priors,”IEEE

Trans. Info. Theory, vol. 45, pp. 909–919, Apr. 1998.

[37] M. N. Do and M. Vetterli, “Wavelet-based texture retrieval using generalized Gaussian density and Kullback-Leibler

distance,” IEEE Trans. Image Process., vol. 11, no. 2, pp. 146–158, Feb. 2002.

[38] M. W. Seeger and H. Nickisch, “Compressed sensing and Bayesian experimental design,” inProc. International Conference

on Machine Learning (ICML), Florence, Italy, Sep. 4-8 2008, pp. 912–919.

[39] S. D. Babacan, R. Molina, and A. K. Katsaggelos, “Fast Bayesian compressive sensing using Laplace priors,” inProc.

IEEE Int. Conf. Acoust., Speech, and Signal Process. (ICASSP), Taipei, Taiwan, Apr. 19-24 2009, pp. 2873–2876.

[40] H. Jeffreys, “An invariant form for the prior probability in estimation problems,”Proc. Royal Society of London, vol. 186,

no. 1007, pp. 453–461, Sep. 1946.

[41] J. S. Liu, Monte Carlo Strategies in Scientific Computing, Springer, New York, 2001.

[42] N. Dobigeon and J.-Y. Tourneret, “Truncated multivariate Gaussian distribution on a simplex,” Tech. Rep., University of

Toulouse, France, Jan. 2007,http://www.enseeiht.fr/˜dobigeon.

[43] R. Coifman and D. Donoho, “Translation-invariant de-noising,” Tech. Rep., Yale University and Stanford University, 1995,

http://citeseer.ist.psu.edu/coifman95translationinvariant.html.

[44] A. Gelman and D. B. Rubin, “Inference from iterative simulation using multiple sequences,”Statistical Science, vol. 7,

no. 4, pp. 457–472, Nov. 1992.

[45] C. Chaux, P. Combettes, J.-C. Pesquet, and V. R Wajs, “A variational formulation for frame-based inverse problems,” Inv.

Problems, vol. 23, no. 4, pp. 1495–1518, Aug. 2007.

[46] P. L. Combettes and J.-C.Pesquet, “A proximal decomposition method for solving convex variational inverse problems,”

Inv. Problems, vol. 24, no. 4, Nov. 2008, 27p.

[47] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural

similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.

[48] A. K. Gupta and D. Song, “Lp-norm uniform distribution,” J. Statist. Plann. Inference, vol. 125, no. 2, pp. 595–601, Feb.

1997.

October 26, 2018 DRAFT