fundamental limits in multi-image alignment - arxiv · 2016-02-05 · fundamental limits in...

1

Fundamental Limits in Multi-image AlignmentCecilia Aguerrebere, Mauricio Delbracio, Alberto Bartesaghi and Guillermo Sapiro

Abstract—The performance of multi-image alignment, bring-ing different images into one coordinate system, is criticalin many applications with varied signal-to-noise ratio (SNR)conditions. A great amount of effort is being invested intodeveloping methods to solve this problem. Several importantquestions thus arise, including: Which are the fundamental limitsin multi-image alignment performance? Does having access tomore images improve the alignment? Theoretical bounds providea fundamental benchmark to compare methods and can helpestablish whether improvements can be made. In this work, wetackle the problem of finding the performance limits in imageregistration when multiple shifted and noisy observations areavailable. We derive and analyze the Cramer-Rao and Ziv-Zakailower bounds under different statistical models for the underlyingimage. The accuracy of the derived bounds is experimentallyassessed through a comparison to the maximum likelihoodestimator. We show the existence of different behavior zonesdepending on the difficulty level of the problem, given by theSNR conditions of the input images. We find that increasing thenumber of images is only useful below a certain SNR threshold,above which the pairwise MLE estimation proves to be optimal.The analysis we present here brings further insight into thefundamental limitations of the multi-image alignment problem.

Index Terms—Multi-image alignment, performance bounds,Cramer-Rao bound, Ziv-Zakai bound, Bayesian Cramer-Rao,maximum likelihood estimator.

I. INTRODUCTION

Multi-image alignment consists in registering a group ofimages to a common reference. 1 The multi-image alignmentproblem is ubiquitous in many fundamental image process-ing applications such as high dynamic range imaging [1],[2], super resolution [3], [4], [5], burst denoising [6] andburst deblurring [7], [8]. Indeed, this problem is of greatimportance for very different domains, such as biomedicalimaging, astronomy and remote sensing, where due to physicalor biological constraints the photographing system captures aseries of unregistered and often noisy images.

Various methods have been proposed for multi-image align-ment [9], [10], [11], [12], [13], [4] and a great amount ofeffort is being invested to further improve their performance,mostly in applications that deal with very low signal-to-noise ratio (SNR) conditions [14], [15], [16], [17]. Hence, animportant question arises: Which are the fundamental limits onmulti-image alignment performance? Theoretical performancebounds provide a fundamental benchmark to compare different

C. Aguerrebere, M. Delbracio and G. Sapiro are with the Department ofElectrical and Computer Engineering at Duke University. A. Bartesaghi iswith the Laboratory of Cell Biology, Center for Cancer Research, NationalCancer Institute, National Institutes of Health.e-mail: {cecilia.aguerrebere, mauricio.delbracio, guillermo.sapiro}@duke.edu,[email protected]

1We use hereafter the terms image alignment and image registrationinter-changeably.

methods and can help establish whether improvements canbe made. In this work, we tackle the problem of finding theperformance limits in image registration when multiple shiftedand noisy observations are available.

Theoretical statistical performance bounds are of great inter-est and have been used in a wide variety of signal processingproblems. One of the most widely used approaches, probablybecause of its simplicity, is the Cramer-Rao bound (CRB) [18],which establishes a lower bound on the variance of any unbi-ased estimator of the parameter of interest. For instance, CRBshave been previously used to establish performance limitsin pairwise image alignment [19], [20], super-resolution [3],high dynamic range imaging [21] and image denoising [22],among others. Another example is the bound proposed byZiv and Zakai [23] and its extensions [24], [25], [26]. Theyproposed to relate the mean squared error of the estimatorto the probability of error in a binary detection problem,leading in general to tighter bounds than the CRB. Examplesof the application of the Ziv-Zakai bound (ZZB) to practicalproblems can be found in pairwise image alignment [27] andtime delay estimation [28], [29], among others. Both the CRBand the ZZB will be computed here for the problem of multi-image alignment.

Image registration can easily become very complex withthe kind of scene motions we face in real world scenarios. Inthis work, we focus on global translation which, despite beingthe most basic motion model, is of great interest because itis present in almost all applications. The considered motionmodel is thus given by

z(x) = u(x− τττ) + n(x), (1)

where z(x) is the observed image at pixel position x, u is theunderlying image, τττ is the 2D translation vector and n(x) isadditive white Gaussian noise independent of u.

A fundamental aspect that has to be considered when com-puting a performance bound for the image alignment problemunder Model (1), is how to characterize the underlying imageu. Even if the parameter of interest is the shift vector τττ ,assumptions have to be made about u and each assumptionwill lead to different performance bounds. For instance, ucould be considered as deterministic, known or unknown, oras a realization of a known random process.

Various performance bounds have been derived for thepairwise image alignment problem (i.e., registration betweentwo images) assuming a deterministic known underlying im-age. Examples of this are the CRB for translation estimationderived by Robinson and Milanfar [19], the CRB for generalparametric registration introduced by Pham et al. [20] , and theZZB derived by Xu et al. [27] for rigid pairwise registrationincluding translation and rotation.

arX

iv:1

602.

0154

1v1

[cs

.CV

] 4

Feb

201

6

2

Regarding multi-image alignment, a specific case was an-alyzed by Rais et al. [30], who computed the CRB for theregistration of a group of Earth satellite images that wereuniformly translated, i.e., all shifts are multiples of a singleunknown value that needs to be estimated. In [3], Robinsonand Milanfar presented a thorough statistical performanceanalysis on super-resolution, of which multi-image registrationis typically a major component. They studied translationestimation and image reconstruction jointly, thus assuming anunknown underlying image. This work shed light on imagesuper-resolution, giving important insight into which are themain bottlenecks for improving performance. They derivedbounds for the combined problem under two different as-sumptions for u: the CRB assuming an unknown deterministicimage and a Bayesian CRB assuming a Gaussian prior for u.In both cases, and assuming the considered images are aliasingfree, the computed CRB for the multiple shifts estimation wasindependent of the number of available images.

It is interesting to remark that the problem of imagetranslation estimation is closely related to the problem of timedelay estimation of a signal observed at two or more spatiallyseparated receivers [28], [29]. Indeed, our analysis follows andextends the results from [29] to the case where multiple noisyversions of the same flat spectrum signal are observed, eachwith a different shift.

In this work, we derive and analyze various performancebounds for the multi-image alignment problem under twodifferent models for the underlying image u. First, we consideru to be deterministic and unknown. Under this image model,we compute the CRB and a Bayesian CRB assuming ageneralized Gaussian prior for the shifts. Second, assuminga stochastic Gaussian model for the underlying image u, wederive the CRB and the extended Ziv-Zakai bounds (EZZB).

A thorough analysis is conducted, which unveils the sim-ilarities between these seemingly different approaches. Wefind a per-region behavior depending on the difficulty level ofthe problem, given by the SNR conditions. For certain SNRvalues, performance depends on the number of images. Also, itdegrades dramatically below a given threshold, until reachinga region where the SNR is too low to enable alignment.

In order to assess the tightness of the computed bounds,we compare them to the alignment accuracy obtained bythe maximum likelihood estimator (MLE). The MLE, besidesbeing a widely used estimator, is known to be asymptoticallyefficient and also efficient for any number of observations invarious problems [21]. A per-region behavior depending onthe SNR level, similar to the one predicted by the EZZB, isobserved for the MLE as well. We find that all the computedbounds are very tight in very high SNR conditions, wherethe MLE achieves them and is thus efficient. For such highSNR, we find that the alignment performance only dependson the ratio between the energy of the image gradient and thenoise level, and does not depend on the number of availableimages. Hence, for very high SNR, multi-image alignment canbe performed in a pairwise fashion without losing information.

However, this is not the case for low SNR where the per-formance shows a dependence on the number of images, untilreaching a steady state error for extremely low SNR where

zi, u Images defined in continuous domain x = [x, y] ∈ R2

zi,u Digital images sampled on discrete grid mr ×mcux,uy Derivatives of u in direction x and yK Number of unknown translationsτττ i 2D translation vector τττ i = [τix , τiy ]

T of image iτττ Concatenation of K 2D translationszi, u Fourier transform of images zi, uz Concatenation of (K + 1) Fourier transforms ziωωω 2D Fourier spatial frequency ωωω = [ωx, ωy ]T

S(ωωω) Power spectral density of 2D random process uJD,JS Fisher information matricesMSE Mean square errorEMSE Expected mean square errorSNR Signal-to-noise ratio as defined by Eq. (43)CRBD Cramer-Rao bound under deterministic image model Eq. (16)BCRB Bayesian Cramer-Rao bound (with shift prior) Eq. (28)CRBS Cramer-Rao bound under stochastic image model Eq. (40)EZZBw Extended Ziv-Zakai bound (flat spectrum) Eq. (63)

Table I: Summary of notation used in this article.

alignment is not possible. The SNR values delimiting theseregions, which are of particular importance in practice, arealso derived and found to depend on the number of availableimages. Therefore, increasing the number of images is usefulsince, not only it improves the achievable performance, but italso shifts the SNR thresholds making alignment possible fora larger noise level range.

This article is organized as follows. Section II presents thestatistical framework used to state multi-image alignment as aparameter estimation problem. Sections III and IV are devotedto the computation and analysis of the different performancebounds, under the deterministic and stochastic image modelsrespectively. Section V presents an analysis and comparisonof all the computed bounds. The bounds accuracy is assessedin Section VI. Section VII summarizes the conclusions.

II. MULTI-IMAGE REGISTRATION: AN ESTIMATIONPROBLEM

In what follows, we present the image model used through-out the article for the derivation of the different performancebounds. Also, we introduce the performance indicators usedto evaluate the translation estimators. Table I summarizes thenotation used in the article.

A. Image model

Let us consider the image acquisition model:

zi(x) = u(x− τττ i) + ni(x), i = 0, . . . ,K, (2)

where zi(x) is the observed i-th image at pixel positionx = [x, y]T ∈ R2, u(x) is the underlying continuous imagegenerating the noisy shifted observations, τττ i = [τix , τiy ]T ∈R2 is the 2D translation vector of frame i with respect to theunderlying image u (frame zero, τττ0 = 0), and ni(x) is additiveGaussian noise assumed to be independent of u.

In practice, we do not have access to the continuous imagesbut to a finite discretization of them. We will assume thatall the images are band-limited and sampled according to theNyquist sampling theorem. Regarding the finite observationsupport, we will additionally assume that the energy of the

3

signal outside the observed sampling grid is negligible. Thesetwo assumptions guarantee an almost perfect interpolation ofthe continuous images from the digital ones. Thus, under thisideal framework, we are able to compute image derivativesor image shifts (or any other linear operator) directly fromthe discrete samples. Although we will omit the details forsimplicity, all the considered operators could be computed viaFourier interpolation (e.g., using the DFT). Let us assume thatthe digital images are indexed into vectors of size Np = mr×mc pixels, where mr and mc are the number of rows andcolumns respectively.

Let τττ = [τττT1 , . . . , τττTK ]T ∈ R2K be the concatenation of all

2D unknown translations, and z =[zT0 , . . . , zTK ]T ∈ R(K+1)Np

be the concatenation of the (K+1) observed images. The goalin multi-image alignment is then to estimate τττ from z.

B. Performance evaluation

Let us call θθθ the vector of parameters to be estimated, e.g.θθθ = τττ . Given any estimate θθθ(z) of θθθ, its performance can bemeasured through the error correlation matrix,

Rε = Ez|θθθ[εεεεεεT ], (3)

where εεε = θθθ−θθθ is the error with respect to the real parametervalue and Ez|θθθ[·] is the expected value over the observed datadistribution given θθθ. The fundamental limits on the estimationof θθθ can be stated through the family of performance boundswhich consider the parameter as an unknown deterministicquantity and provide a limit on Rε. Examples of this familyare the Cramer-Rao [18], Bhattacharyya [31], Barankin [32],and Abel [33] bounds, among others.

In some cases, prior information is known about θθθ. Thismotivates the development of the Bayesian bounds, whichmodel the parameter as a random variable with a knownprior distribution, and give a limit on the expected errorcorrelation matrix under the joint distribution of the data andthe parameter

Rε = Ez,θθθ[εεεεεεT ]. (4)

Examples of Bayesian bounds are the Bayesian Cramer-Rao [34], the Ziv-Zakai [23], and the Weiss-Weinstein [35]bounds.

A more practical performance indicator is the mean squarederror of the estimated parameters, which corresponds to thetrace of the error correlation matrix. We refer hereafter asmean squared error (MSE) to the trace of Rε and expectedmean squared error (EMSE) to the trace of Rε.

In the following sections, we compute and analyze variantsof two performance bounds for the multi-image alignmentproblem, the Cramer-Rao [18] and the Extended Ziv-Zakailower bounds [26]. The performance analysis is conductedunder two different hypothesis for the unknown underlyingimage: u is a deterministic unknown image (Section III), andu is a realization of a zero mean Gaussian random process withknown covariance matrix (Section IV). Although the Gaussianmodel is over-simplistic [36], it is nonetheless interesting, notonly because of its practicality, but also because it has provento be very powerful for locally modeling natural images inseveral applications [37], [38], [39], [40].

III. PERFORMANCE BOUNDS: DETERMINISTIC IMAGEMODEL

In this section, we assume that u is an unknown deter-ministic digital image. We also assume that the noise in thedigital observations n has a diagonal covariance matrix σ2I.Notice that, even if the goal of multi-image registration isto estimate τττ and not u, the latter is unknown and needsto be accounted for in the analysis. This kind of parameters,whose estimation is not of direct interest but because they arerelated to the analysis have to be accounted for, are commonlyreferred to as nuisance parameters [41]. Hence, the parametervector becomes θθθ = [uT , τττT ]T , where we are only interestedin estimating τττ from the (K + 1) noisy observed images z.

A. Cramer-Rao lower bound: deterministic image model

The performance of any unbiased estimator θθθ(z) of θθθ isbounded by the CRB [18]

Rε ≥ J−1, (5)

where J is the Fisher information matrix (FIM) given by

{J}i,j = −Ez|θθθ

[∂2`(z;θθθ)

∂θi∂θj

], (6)

and `(z;θθθ) = log(p(z;θθθ)) is the logarithm of the likelihoodfunction. The FIM in this case can be expressed as

JD =

[Juu JTuτττJuτττ Jττττττ

], (7)

where the term Juu captures the information provided by theimage only, the term Jττττττ captures the available information ofthe translations and Juτττ represents the information providedby the intercorrelation between u and τττ . Using the blockmatrix inversion principle [42], the inverse of J can beexpressed as

J−1D =

[S−1u J−1

uuJuτττS−1τττ

S−1τ JTuτττJ

−1uu S−1

τττ

], (8)

where Sτττ and Su are the Schur complements of the submatrixregarding τττ and u respectively, namely,

Sτττ = Jττττττ − JTuτττJ−1uuJuτττ , (9)

Su = Juu − JuτττJ−1ττττττ J

Tuτττ . (10)

It can be shown that for multi-image registration, S−1τττ is given

by (see Appendix A)

S−1τττ = σ2[IK + 11T ]⊗Q−1, (11)

where IK is the identity matrix of size K×K, 1 is a vector ofones of size K, ⊗ is the Kronecker product between matrices,

Q =

[uTxux uTxuyuTxuy uTy uy

], (12)

and ux, uy are the derivatives of the latent image u in thehorizontal and vertical directions respectively.

Equation (5) gives a bound on the covariance matrix ofany unbiased estimator of θθθ. Therefore, from (5) and (8), the

4

MSE of the estimated translations is bounded by the trace ofS−1τττ [43],

MSE = 12K

K∑j=1

E[(τjx− τjx)2+ (τjy− τjy )2] (13)

≥ 12K tr(S−1

τ ) (14)

= σ2(uTxux + uTy uy)

(uTxux)(uTy uy)− (uTxuy)2. (15)

Hence, we define the CRB under a deterministic image model(CRBD) as,

CRBDdef= σ2

(uTxux + uTy uy)

(uTxux)(uTy uy)− (uTxuy)2. (16)

According to the CRBD, the registration error is proportionalto the noise level and inversely proportional to the energy ofthe gradient. A similar result is presented by Robinson andMilanfar [3], who derived the CRB for the super-resolutionproblem. Multi-image registration can be seen as a partic-ular case of the super-resolution problem, where the under-sampling operator is equal to the identity matrix.

Performance independence of K. An unexpected result isthat the bound (16) does not depend on the number of imagesK. This means that this fundamental limit of multi-imageregistration performance is the same for a set of 2 or anynumber K of images. Nevertheless, unlike stated in [3, Ap.III], this does not imply that registration can be done pairwisewithout loss of information. The CRB gives a lower bound onperformance but it does not ensure the existence of an efficientestimator that reaches this bound. In practice, depending onthe problem, the CRB may or may not be tight. Hence, theindependence of the CRB of the number of images K, doesnot imply that registration can be done pairwise without lossof information.

As it will be shown experimentally in Section VI, for themulti-image registration problem, the CRBD is tight in highSNR conditions, where we observe indeed that registration canbe done pairwise without loss of accuracy. However, it is notnecessarily tight in low SNR conditions. In that case, there areother bounds, which are dependent on K, that are closer tothe actual performance estimators can achieve.

Case with known underlying image. The bound on (16)corresponds to the translations estimate error when the realimage u is unknown, which is usually the case in practice.In previous works [19], [20], however, the CRB has beencomputed for the pairwise image registration problem whenthe only unknown parameters are the shift values.

In that case, the FIM for the multi-image registrationproblem (7) simplifies to

JDkn = Jττττττ =1

σ2IK ⊗Q, (17)

and the CRB for the case where u is known becomes,

CRBDkndef= 1

2K tr(J−1ττττττ ) (18)

=σ2

2

(uTxux + uTy uy)

((uTxux)(uTy uy)− (uTxuy)2)=

CRBD

2. (19)

Therefore, the MSE bound, assuming the underlying imageu is known, is half that of the case when u is unknown.When u is known, and the first image in the set is assumedto be aligned (i.e., τττ0 = 0), all the other images can bepairwise aligned to the known reference. Indeed, in that case,the different observed images are conditionally independentgiven the known underlying image. Hence, there is no gainin using the rest of the images for estimating the translationof one image. Therefore, the limiting factor in the pairwisealignment is the noise in one image.

On the other hand, when u is unknown, the bound doubles.This may represent a best case scenario where the limiting fac-tor is twice the noise, corresponding to the pairwise alignmentof two noisy images.

B. Bayesian Cramer-Rao with prior on shiftsA natural question that arises after finding that the funda-

mental performance limit given by the CRBD (16) does notdepend on the number of images, is whether this limit can beimproved if some prior information about the shifts is known.Intuitively, having more images could improve the alignmentperformance in the case that u is unknown. Let us imagine thecase of an algorithm that uses an estimation of the latent imageu to estimate the shifts. One could expect that the estimationof u could be improved by having more images, for exampleby reducing the noise, and thus leading to a better estimate ofthe shifts.

Hence, the question is what happens if the motion esti-mation can be improved by including prior knowledge onthe shift vectors. A typical assumption is that the shifts areindependent and drawn from a uniform distribution within alimited range. Nevertheless, in some particular applications(e.g., in microscopy or in burst photography), each shift vectordepends on the previous ones so modeling the motion as arandom walk might be more accurate. In this work, we restrictthe analysis to the case where the shifts are independent.

A Bayesian version of the CRB bound can be computedto include prior information on the unknown parameters. TheBayesian Cramer-Rao bound (BCRB) gives a lower bound onthe expected error correlation matrix under the joint data andparameter distribution

Rε ≥ J−1B , (20)

where JB is the Bayesian Fisher information matrix given by

{JB}i,j = −Ez,θθθ

[∂2`(z, θθθ)

∂θi∂θj

], (21)

and `(z, θθθ) = log(p(z, θθθ)) is the logarithm of the jointlikelihood function.Generalized Gaussian prior on shifts. Let us considera centered generalized Gaussian prior distribution for eachcomponent of τττ . This family of densities, indexed by theparameters c > 0 and δ > 0, is given by

p(τ ; c, δ) =c η(δ, c)

Γ(1/c) exp (−ηc(c, δ)|τ |c), (22)

where

η(δ, c) =1

δ

(Γ(3/c)

Γ(1/c)

)1/2

, (23)

5

SNR (dB)-50 -40 -30 -20 -10 0 10 20 30

RM

SE

10-6

10-4

10-2

100

102

CRBD

CRBDkn

BCRB, K = 10BCRB, K = 1000BCRB, K = 100000

Figure 1: Comparison of the CRBD (both for u known and unknown)and the BCRB for different number of images K with a Gaussianprior (c = 2) with δ = 1 on the shifts.

and Γ denotes the gamma function. The case of c = 2corresponds to the Gaussian density, while the distributionapproaches the uniform density with variance δ2 as c→∞.

Given (K + 1) independent samples following model (2),and assuming that u is an unknown deterministic image andτττ is a random variable following the generalized Gaussianprior (22), the Bayesian FIM is given by (see Appendix B)

JB = JD + Jp, (24)

withJp =

[0 00 1

λ2 I2K

], (25)

where λ2 = δ2Γ2(1/c)c2Γ(3/c)Γ(2−1/c) and JD is the FIM given in (7).

Therefore, including a prior on the translations adds the termJp to the classical FIM given in (7).

Then, the Schur complement of the submatrix regarding τττ ,becomes (see Appendix B)

Sτττ = 1σ2

(I− (K + 1)11T

)⊗Q + 1

λ2 I, (26)

and

S−1τττ = I⊗

(1σ2Q + 1

λ2 I)−1

(27)

+ 11T ⊗ λ2(

(K+2)I + λ2

σ2Q + (K+1)σ2

λ2Q−1)−1

.

The EMSE of the translations under the given prior is thenlower bounded by

EMSE ≥ BCRBdef= 1

2K tr(S−1τττ ). (28)

As a first observation, let us point out that adding priorinformation on the translations makes the bound dependenton the number of images K. Figure 1 shows a comparison ofthe CRB (u known and unknown) and the BCRB for differentnumber of images K with a Gaussian prior (c = 2) with δ = 1.Note that both bounds are very similar for high SNR values.This means that if the SNR is high enough, there is no gainin having prior knowledge about the shifts. However, for lowenough SNR and large enough K, the prior bounds the errorson the shift estimates and the BCRB is below the CRBD andapproaches the bound for a known image CRBDkn until itreaches a steady-state value equal to the variance of the prior.

Notice that the example shown in Figure 1 corresponds toa pretty tight prior (δ = 1), meaning that an accurate interval

for the shifts is known a priori. Remarkably, even under thisseemingly favorable condition, the reduction of the BCRB isobserved only for a very low SNR range and for a very largenumber of images. This suggests a very limited impact of thisshift prior in practice, being useful only for very low SNRconditions, a tight prior of the shifts interval, and a very largenumber of images.

The generalized Gaussian prior approaches the uniformdistribution when c → ∞. Thus, for any fixed δ, λ → 0 asthe shift prior approaches a uniform distribution. Hence, theprior information becomes irrelevant and the performance isbounded by the CRBD, which is independent of K. Of course,this does not mean that having more images does not help forestimating the shifts. As previously mentioned, if the CRBDis overoptimistic and cannot be attained, a tighter bound maystill exist, that does depend on the number of images.

IV. PERFORMANCE BOUNDS: STOCHASTIC IMAGE MODEL

In this section, we consider a zero-mean Gaussian stochasticmodel for the underlying unknown image u. As stated before,our goal is to estimate the K shifts between every pair ofK + 1 images given by (2), or equivalently in the Fourierdomain, from

zi(ωωω) = u(ωωω)e−iωωω·τττ i + ni(ωωω), i = 0, . . . ,K, (29)

where ~ denotes 2D image Fourier transforms, ωωω = [ωx, ωy]T

represents the 2D Fourier spatial frequency and · denotes theinner product operation.

We now assume that the signal samples u are drawnfrom a stationary zero-mean Gaussian process with spectraldensity S(ωωω). The additive noise is modeled by the zero-meanGaussian process ni with spectral density N(ωωω), assumed tobe independent of the underlying signal u.

The observed digital images zi can be converted into theFourier domain zi by applying the 2D DFT. In practice, sincethe input images are real, the complex Fourier coefficientshave Hermitian symmetry, where two of the four quadrantsfully determine zi. Here, we arbitrarily choose to work withthe positive values of ωy and the complete range for ωx (i.e.,first and second quadrants of the 2D DFT). Hence, we willonly consider the complex Fourier coefficients correspondingto frequencies ωωωlx,ly = [ωlx , ωly ]T with ωlx = 2πlx

mc, lx =

−mc2 , . . . ,mc2 and ωly =

2πlymr

, ly = 0, . . . , mr2 . In addition, wewill assume that the Fourier coefficients of u are uncorrelatedat the considered spatial frequencies.

Let l(lx, ly) = 1, . . . ,M , with M = mc + mr2 + 2, index

all the considered 2D frequencies ωωωl. The Fourier transformof the (K + 1) observed images can be arranged into a vector

z = [z0(ωωω1), z1(ωωω1), . . . , zK(ωωω1), . . . ,

z0(ωωωM ), z1(ωωωM ), . . . zK(ωωωM )]T .(30)

Under Gaussian assumptions for the noise and the under-lying image, z follows a complex Gaussian distribution withzero mean and covariance matrix

ΣΣΣ = E[zzH ] =

ΣΣΣτττ (ωωω1) 0 . . . 0

0 ΣΣΣτττ (ωωω2) . . . 0...

.... . .

...0 0 . . . ΣΣΣτττ (ωωωM )

, (31)

6

where each matrix ΣΣΣτττ (ωωω) has size (K + 1)× (K + 1) and iscomposed by

ΣΣΣτττ (ωωω)=

S(ωωω)+N(ωωω) S(ωωω)e−iτττ1·ωωω . . . S(ωωω)e−iτττK ·ωωω

S(ωωω)eiτττ1·ωωω S(ωωω)+N(ωωω) . . .S(ωωω)ei(τττ1−τττK)·ωωω

......

. . ....

S(ωωω)eiτττK ·ωωω S(ωωω)e−i(τττ1−τττK)·ωωω. . . S(ωωω)+N(ωωω)

.(32)

A. Cramer-Rao lower bound: stochastic image model

In order to compute the CRB for the shifts estimation inthe multi-image alignment problem under model (2), we firstcompute the corresponding FIM matrix. For the consideredcomplex Gaussian process z, it is given by [18, Ap. 15C]

{JS}ih,jq = tr(

ΣΣΣ−1 ∂ΣΣΣ

∂τihΣΣΣ−1 ∂ΣΣΣ

∂τjq

), (33)

where i, j = 1, . . . ,K, and h, q ∈ {x, y} index the twocomponents of each 2D shift vector τττ i = [τix , τiy ]T . Carryingout the indicated operations (see Appendix C) we get

JS =[(K + 1)IK − 11T

]⊗B, (34)

withB =

[ρx,x −ρx,y−ρx,y ρy,y

], (35)

and

ρh,q =

M∑l=1

2S2(ωωωl)ωlhωlqN2(ωωωl) + (K + 1)S(ωωωl)N(ωωωl)

. (36)

Hence, we have

J−1S = 1

(K+1) [IK + 11T ]⊗B−1. (37)

The error covariance matrix of any unbiased estimate of theshifts is thus bounded by

Ez|τττ [(τττ − τττ)(τττ − τττ)T ] ≥ J−1S , (38)

and the MSE is lower bounded by the trace of J−1S ,

MSE ≥ 1

2Ktr(J−1

S ) =1

(K + 1)

(ρ2x,x + ρ2

y,y

ρx,xρy,y − ρ2x,y

). (39)

If S(ωωω) and N(ωωω) are rotationally symmetric (i.e., rotationinvariant), it can be shown that ρx,y = 0 and ρx,x = ρy,y . Inthis case, we define the CRB under the Gaussian stochasticimage model (CRBS) as

CRBSdef=

2

(K + 1)ρxx=

2

(K + 1)ρyy. (40)

Notice that, unlike the CRBD (16), the CRBS (40) dependson the number of images K.High SNR. Under high signal-to-noise conditions, the CRBSfor a rotation invariant process (40) simplifies to

CRBSHSNR def=

2σ2(2π)2

Np∫S(ωωω)ω2

x dωωω. (41)

This bound is indeed independent of the number of imagesK and agrees with the deterministic CRB given by (16) (seeAppendix C).

To help further understand the behavior of the computedbound we analyze its behavior both for natural and flatspectrum images.Natural images. A typical natural image presents complexstructure that is difficult to model accurately. One classicalassumption, it that the power spectrum of natural images fallsquadratically with the Fourier frequency. Although simplistic,this is in fact reasonable if we consider that natural imageshave a relative contrast energy that is scale invariant [44]. Letus assume that the considered underlying image follows thislaw, that is,

S(ωωω) =

{Sn‖ωωω‖−2 if max(|ωx|, |ωy|) ≤W/2,0 otherwise,

(42)

where Sn is a known parameter, and W ∈ (0, 2π] modelsthe signal bandwidth. Also, we will assume that the additivenoise spectrum has a constant value N in the frequency band[−W2 ,

W2 ]2 and is zero otherwise.

Let us define the signal-to-noise ratio as the ratio betweenthe energy of the derivative and the noise power

SNRdef=

1

NW 2

∫S(ωωω)‖ωωω‖2 dωωω. (43)

For the case of a natural image (42), the SNR is then

SNRn = Sn/N. (44)

and the CRBS bound for natural images becomes (see Ap-pendix C)

CRBSndef=

8π

Np(K + 1)SNR2n acoth

(1+ 2π(K+1)SNRn

W 2

) . (45)

When SNRn →∞, we have that

CRBSn →16π2

NpW 2SNRn, (46)

which does not depend on K. The breaking point from theasymptotic (very high SNR point) occurs approximately when2π(K+1)SNRn ≈W 2, which happens at,

SNRKn1 =W 2

2π(K + 1). (47)

This implies that, if SNRn � SNRK=1n1 , having access to more

than two images will not improve the bound. This absolutebreaking point happens at approximately SNRn1

def= 1

5W 2

4π .

Flat spectrum images. Another helpful case is to study thebehavior of the CRBS when the underlying signal u has a flatpower spectral density, that is,

S(ωωω) =

{Sw if max(|ωx|, |ωy|) ≤W/2,0 otherwise.

(48)

Similarly, we assume that the additive noise spectrum N(ωωω)has a constant value of N in the frequency band [−W2 ,

W2 ]2

and is zero otherwise.The signal-to-noise ratio (as defined in (43)) for white

signals becomes

SNRw =SwW

2

6N. (49)

7

SNR (dB)-20 -10 0 10 20

10-3

10-2

10-1

100

101

SNR1

K=1K=3K=10K=100K=500K=Inf

(a) CRBSn (Eq. (45))

SNR (dB)-20 -10 0 10 20

10-3

10-2

10-1

100

101

SNR1

K=1K=3K=10K=100K=500K=Inf

(b) CRBSw (Eq. (50)

Figure 2: Comparison of CRBS for different number of input imagesK and varying SNR conditions. (a) natural image model, (b) flatspectrum image. Both image models have very similar behavior.

In this case, the CRB bound for flat spectrum images is (seeAppendix C)

CRBSwdef=

8π2(W 2 + 6SNRw(K + 1)

)3Np(K + 1)SNR2

wW2

=8π2

3Np(K + 1)SNR2w

+16π2

NpW 2SNRw. (50)

In particular, when SNRw →∞ we have

CRBSw →16π2

NpW 2SNRw, (51)

which does not depend on the number of images K. Thebreaking point is when both terms in (50) are approximatelyequal, which happens at

SNRKw1 =W 2

6(K + 1). (52)

Thus, if SNRw � SNRK=1w1 , having access to more than two

images will not improve the bound. This absolute breakingpoint happens at approximately SNRw1

def= 1

5W 2

12 .Because this threshold is very similar to the one obtained for

natural images (see Eq. (47)), for simplicity, we refer hereafterto both SNRn1 and SNRw1 as SNR1.

Figure 2 shows the computed CRBS bounds for both imagemodels, with different number of input images K and varyingSNR levels. Both image models have very similar behavior.There is a very high SNR zone where the bounds dependlinearly with the SNR level (SNR > SNR1). Within this SNRregion, having access to more images does not have an impacton the bound. In moderate to low SNRs (SNR < SNR1) bothCramer-Rao stochastic bounds depend super-linearly with theSNR (i.e., performance degrades faster at low SNR valuesthan in the very high SNR region). Increasing the number ofimages pushes back SNR1, increasing the SNR range whereperformance is linear with SNR. The performance is linearwith image size Np in both cases on the whole SNR domain.

An alternative approach to include an image model is tocompute a Hybrid Cramer-Rao bound (HCRB).2 Similarly towhat was done in Section III-B, one can compute a HCRBincluding the desired model as an image prior and thencompute the expected FIM under this prior. Robinson andMilanfar [3] computed such bound for the super-resolution

2Hybrid in the sense that there are random and deterministic parameters.

problem assuming a Gaussian model for the image similar tothe one presented here. Although related, these two boundsare different. The HCRB gives a bound on the expectedMSE under the given image prior. This can be seen asan average bound for the different likelihoods obtained foreach given possible value of the image. On the other hand,the CRB on Equation (40) gives the bound based on theexpected likelihood under the given image model. Under someregularity conditions, it is possible to show that the CRB isalways tighter than the HCRB [45, Thm. 1]. Nevertheless,in many applications the computation of the HCRB is muchsimpler than the CRB, leading to a reasonable alternative.

B. Extended Ziv-Zakai lower bound

In general, the CRB is known to be tight in high SNRbut overoptimistic in low SNR conditions. Various Bayesianbounds have been derived to obtain tighter and more accuratepredictions of the MSE behavior in the entire SNR range. Oneexample of this is the bound proposed by Ziv and Zakai [23],which relates the expected MSE (EMSE) of the estimator overa given prior, to the probability of error in a binary detectionproblem.

Consider the estimation of a 2K-dimensional random vectorθθθ with a prior distribution pθθθ, based upon an observation vectorz. The extended Ziv-Zakai lower bound (EZZB) on the EMSEof any estimate θθθ of θθθ over pθθθ is given by [26]

aTRεa ≥∫ ∞

0

V{

maxδδδ:aTδδδ=h

[ ∫RK

min(pθθθ(ϕϕϕ), pθθθ(ϕϕϕ+ δδδ))

·Pmin(ϕϕϕ,ϕϕϕ+ δδδ) dϕϕϕ

]}hdh, (53)

where a is any 2K-dimensional vector, V{·} is the valley-filling function,3 and Pmin(δδδ), δδδ ∈ R2K , is the probability oferror in the binary detection problem

H0 : δδδ = ϕϕϕ; z ∼ p(z |θθθ = ϕϕϕ), (54)

H1 : δδδ = ϕϕϕ+ δδδ; z ∼ p(z |θθθ = ϕϕϕ+ δδδ), (55)

with equally likely hypotheses. The vector δδδ = [δδδ1, . . . , δδδK ],with δδδi = [δix , δiy ]T represents a possible 2D shift betweenthe i-th and the first image (indexed in the same way as τττ ).

The Ziv-Zakai bound is based on the probability of correctlychoosing the parameter to be estimated between two possiblevalues: ϕϕϕ or ϕϕϕ + δδδ. The bound is found by integrating theminimum error along all possible estimated values (in generalruled by both δδδ and ϕϕϕ), weighted by their prior probabilityof occurrence, and by bounding the minimum probability oferror in this binary detection problem.

If the probability of error is only a function of the offsetbetween the hypothesis, i.e., Pmin(ϕϕϕ,ϕϕϕ+ δδδ) = Pmin(δδδ), whichis precisely the case in our translation estimation problem, thebound simplifies to

aTRεa ≥∫ ∞

0

V{

maxδδδ:aTδδδ=h

A(δδδ)Pmin(δδδ)

}hdh, (56)

3The valley-filling of a function f(h) is obtained by filling-in any val-leys [26], and is given by V{f}(h) = maxt≥0 f(h+ t).

8

where

A(δδδ) =

∫R2K

min (pθθθ(ϕϕϕ), pθθθ(ϕϕϕ+ δδδ)) dϕϕϕ. (57)

Thus, to compute the EZZB of the shift estimation problemwe need to compute A(δδδ) and the probability of error Pmin(δδδ).

If we assume the shifts θθθ to be uniformly distributed θθθ ∼U [0, D]2K , A(δδδ) takes the simplified form

A(δδδ) =

2K∏i=1

(1− δi

D

). (58)

The probability of error Pmin(δδδ) for the case of multi-imageregistration is given by (see Appendix D)

Pmin(δδδ) ≈ 12 exp {a(δδδ) + b(δδδ)}Φ

(√2b(δδδ)

), (59)

where

a(δδδ)=−M∑l=1

log[1+γ(δδδ,ωωωl)

], b(δδδ)=

M∑l=1

γ(δδδ,ωωωl)

1+γ(δδδ,ωωωl), (60)

γ(ω,δδδ) =S(ωωω)2

((K + 1)2 − T (δδδ,ωωω)

)4 (N(ωωω)2 + (K + 1)N(ωωω)S(ωωω))

, (61)

T (δδδ,ωωω) =∣∣∣1 +

K∑j=1

e−iδδδj ·ωωω∣∣∣2 and Φ(t) = 1√

2π

∫ ∞t

e−t2

2 dt.

(62)Flat spectrum signals. As done for the CRBS case, let usconsider the particular case of flat spectrum signals definedpreviously by Equation (48). The analysis presented here-after closely follows and extends the work by Weinstein andWeiss [29] to the case of multiple signals.

For simplicity, the following analysis is restricted to one-dimensional signals. The extension to two-dimensional signalsis straightforward in the case where the image is assumed tobe drawn from a white random process (full bandwidth flatspectrum, i.e., W = 2π) . In this case, knowing the translationin one direction does not give any additional information tothe estimation of the other one. As a consequence, the 2Dimage can be rearranged into a one-dimensional vector byconcatenating its rows without loss of information regardingthe estimation of the translation along the columns. Followingthis remark, in this section, we will consider one-dimensionalsignals having length Np = mr ×mc and W = 2π.

The EZZB corresponding to the estimation of one singlecomponent is given by (see Appendix E)

EMSE1 ≥ EZZBwdef= 1

c2

∫ √2b

0

h exp{− 9h4

20Np

}Φ(h) dh

+ D2

6 ea+bΦ(

√2b), (63)

where

a =−Np log(√

κ2+1+12

), b =

Np2

√κ2+1−1√κ2+1

, c2 =Npπ

2κ1

12

κ1 =9SNR2

w(K+1)8π4+12π2SNRw(K+1) , κ2 =

9SNR2wK

4π4+6π2SNRw(K+1) . (64)

Analysis of the EZZB: different SNR regions. The EZZBbehaves differently in low and high SNR regimes, as dictatedby the two terms in Eq. (63) and as illustrated in Figure 3(a).

i) High SNR. For SNRw � 1, the error term (63) is mainlydriven by the first term since a+ b→ −∞ and b→ Np

2 .Assuming that Np � 1, we obtain [28]

EZZBwSNRw→∞−−−−−−→ 1

c2

∫ √2b

0

h · exp{− 9h4

20Np

}Φ(h) dh (65)

≈ 1c2

∫ ∞0

h · exp{− 9h4

20Np

}Φ(h) dh =

1

4c2(66)

=8π2 + 12SNRw(K + 1)

3NpSNR2w(K + 1)

= CRBSw. (67)

Indeed, in the high SNR regime, the EZZB approaches theCRBS under the stochastic image model given by (50).ii) Low SNR. On the other hand, in a very low SNR scenario,SNRw � 1, a→ 0, b→ 0 and c→ 0. Thus,

EZZBwSNRw→0−−−−−→ D2

6 ea+bΦ(

√2b) ≈ D2

12 , (68)

which is the variance of the shifts prior.iii) Transition zone. The transition from the low-SNR to thehigh-SNR region starts when the two terms have a similarcontribution to the bound, that is,

1

4c2=D2

6ea+bΦ(

√2b). (69)

We could (arbitrarily) say that the transition is completed whenthe bound reaches half the asymptotic value, i.e.,

ea+bΦ(√

2b) = 14 . (70)

Equations (69) and (70) characterize the limit SNR levelsof the transition zone. Let SNR2 be the SNR level thatsatisfies (69) and SNR3 the SNR level satisfying (70). Withinthis transition region the bound is essentially dominated bythe behavior of Φ(

√2b).

Figure 3 shows how this region changes when varyingthe number of images K, the prior D and the image sizeNp. The threshold SNR2, below which the EMSE decreasessignificantly and worsens exponentially with the SNR, dependson the number of available images K. This is probably themost important consequence of having access to more images.Figure 3(b) shows how this threshold can be pushed backseveral dBs by increasing K, until reaching a limit.

Note that this critical SNR level SNR2 also depends on theimage size Np. As a consequence, increasing the image sizereduces the performance bound and pushes this SNR limit asshown in Figure 3(c).

On the other hand, as illustrated in Figure 3(d), the thresholdSNR2 is not significantly affected by the shift prior parameterD. This means that, for practical D values (e.g., D ≥ 1), hav-ing a tighter shift prior does not push back SNR2 significantly.Nevertheless, as expected, the steady state EMSE predicted byEZZB does decrease with D.

V. COMPARISON OF PERFORMANCE BOUNDS

In this section, we analyze and compare the behavior of thepreviously computed CRB and EZZB bounds. To this effect,it is important to make the distintion that the CRB is a boundon the MSE while the EZZB and the BCRB are bounds onthe EMSE over a given prior for the shifts.

9

SNR (dB)-30 -20 -10 0 10 20

10-3

10-2

10-1

100

101

102

SNR2SNR3

EZZBTerm1Term2

(a)SNR (dB)

-30 -20 -10 0 10 20

10-2

10-1

100

101 K=1

K=3K=10K=100K=500K=Inf

(b)

SNR (dB)-30 -20 -10 0 10 20

10-2

100

102

Np=102

Np=202

Np=502

Np=1002

Np=2002

(c)

SNR (dB)-30 -20 -10 0 10 20

10-2

100

102

D=1D=3D=10D=50D=200

(d)

Figure 3: (a) The breaking point of the EZZB bound and itsdecomposition as the sum of two terms for varying SNR conditionsand K = 10. Term 1 corresponds to Equation (65) and term 2corresponds to Equation (68). (b-d) Comparison of EZZB at varyingSNR conditions for different number of input images K (b), differentimage size Np (c) and different shift prior intervals D (d).

To simplify the discussion, let us consider the case of whitesignals. Figure 4 shows a comparison of the CRB bounds(both for deterministic (CRBD) and stochastic (CRBS) imagemodels), the BCRB (with Gaussian shift prior of varianceλ2 = 1) and the EZZB (with uniform [0, D] shift prior,D = 20) assuming an image of size 50 × 50 pixels. For theCRBD and the BCRB cases, that depend on a deterministicsignal, we used a realization from the white random processused in CRBS and EZZB. Based on the SNR values, thebehavior of the bounds can be characterized into four differentregions i-iv.

i) Very high SNR (SNR ≥ SNR1). In this region, all boundsagree. Hence, the same fundamental limit is predicted for boththe MSE and the EMSE. This limit does not depend on theshift value nor on the width of the prior, within practicallimits for λ and D (D,λ ≥ 1). The performance bound onlydepends on the total image gradient energy and the noise level,and it is linear with the SNR and image size Np. Hence, avery important remark is that, in this SNR region, all boundspredict that having access to more than two images (K > 1)or having a more accurate shift priors (a smaller λ or Dwithin practical limits) will not lead to better performance. Thethreshold defining this region, SNR1, depends on the numberof images K (see Eq. (52)). It does not depend, however, onthe variance of the prior (D,λ) nor the image size (Np).

ii) High SNR (SNR1 ≤ SNR ≤ SNR2). In this region, theCRBS and the EZZB agree, while the CRBD and the BCRBare overoptimistic. The main differences with respect to thevery high SNR region is that the CRBS and the EZZB improvewith increasing number of images K, and their dependenceon the SNR is super-linear. This means that the performancedecreases faster when reducing the SNR than in the very highSNR region. In the limit, when K →∞, the EZZB and CRBS

SNR (dB)-25 -20 -15 -10 -5 0 5 10 15

10-2

10-1

100

101

SNR1SNR2SNR3

EZZBCRBSCRBDBCRB

Figure 4: Comparison of the EZZB, CRBS, CRBD and BCRBbounds for K = 10 and varying SNR conditions.

approach the CRBD (same behavior as in very high SNR). Inthis region, performance is linear with image size Np.

iii) Transition (SNR2 ≤ SNR ≤ SNR3). The EZZB predicts athreshold SNR2 below which the EMSE decreases significantly,and worsens exponentially with the SNR. This critical SNRlevel can be improved by increasing the number of availableimages K (up to some limit, see Figure 3(b)) or the image sizeNp. Nevertheless, the threshold does not depend considerablyon the shift prior D (see Figure 3(d)).

iv) Saturation (SNR ≤ SNR3). The EZZB predicts a criticalSNR below which no alignment is possible, and thus the erroris dominated by the shifts prior (the EMSE is essentially givenby the variance of the prior).

VI. PERFORMANCE BOUNDS TIGHTNESS ASSESSMENT

The bounds derived in sections III and IV set an upperlimit on the best possible performance of any estimator, butthere is no guarantee about the existence of an estimatorreaching that performance. Therefore, by only looking at thebounds, it is hard to draw practical conclusions about the actualachievable alignment performance in practice. Indeed, therecould always exist a tighter bound, with a different behaviorthan the computed ones, that gets closer to best achievableperformance. Hence, assessing the tightness of the derivedbounds to the actual alignment performance becomes criticalto close this gap.

In what follows, we compare the empirical performanceof the maximum likelihood estimator (MLE) to the boundspreviously computed. MLE is perhaps the most widely usedestimator in statistical parameter estimation problems. It isasymptotically efficient [18], and it is also known to beefficient for any number of samples in various problems [21].

A. Maximum Likelihood Estimation

Given (K + 1) independent samples following Model (2),and assuming u is an unknown deterministic image, the MLEof θθθ = [u, τττ ]T is the value that maximizes the log-likelihood,

[u, τττ ]MLE = arg maxu,τττ

− 1

2σ2

K∑i=0

||zi(x)− u(x− τττ i)||2, (71)

where we discarded the terms independent of [u, τττ ].

10

The functional in (71) is an example of a separable non-linear least-square problem. Indeed, given the vector τττ con-taining all the shifts, the unknown underlying image u wouldbe given by the least squares solution

u(x) =1

(K + 1)

K∑i=0

zi(x + τττ i). (72)

That is, given the shift values, the MLE of the unknownimage is the average of the aligned frames. Inserting (72) backinto (71), the functional to be optimized depends on the shiftsonly, that is,

τττMLE = arg minτττ

K∑i=1

||zi(x)− u(x− τττ i)||2, (73)

where u(x) is given by (72). Functional (73) is non-convexand different approaches can be followed to find a localminimum [4]. One such approach consists in alternating twosteps: first compute the average of the frames aligned with thecurrent estimate of the shifts (given by (72)); second align eachimage against the current average by choosing the shift thatmaximizes the Euclidean distance against the average. That is,

u(t+1)(x) =1

(K + 1)

K∑i=0

zi(x + τττ(t)i ), (74)

τττ(t+1)i = arg min

τττ i

||zi − u(t+1)(x− τττ i)||2. (75)

This algorithm requires an initialization either for τττ or u. Onepossibility is to align each input image to a reference image inthe set and take those estimated translations as initial values.The algorithm stops when the shifts reach a steady value.

In our implementation, we used image correlation [46]which can be seen as an approximation of the L2 distancethat should be minimized. We initialize the iterative algorithmby aligning each image to the first one in the set, and takethose translations as initial values. We refer hereafter to thisapproximation of the MLE as MLEavg.

B. Experimental analysis

An experimental analysis is conducted in order to comparethe performance of the MLE to the previously introducedbounds. For this purpose, synthetic data is generated accordingto model (2). Two cases are considered for the underlyingimage u: a natural image (Figure 5) and a flat spectrum image(a realization of a uniformly distributed random variable).The shifts τττ i are uniformly sampled in [−5, 5]2. Differentnoise levels σ2 and number of images K are evaluated. Theperformance was computed by averaging the estimation errorsover 100 tests for each particular configuration (K,σ2).

The squared bias of the MLEavg experiments was on average,in all the conducted experiments, four orders of magnitudesmaller than the estimator variance. Hence, we report the rootmean squared error (RMSE) of the MLEavg, which is dominatedby its variance given that the method is almost unbiased.

Figure 6 shows the results obtained by the MLEavg using asunderlying image the ones shown in Figure 5, for differentnumber of input images (K = 1, 10, 50) and SNR levels.

Figure 5: Natural images used for the experimental analysis inSection VI. From left to right: building (by M. Colom / CC BY),paris, napoli, bolivia. All images are 256× 256 pixels.

SNR (dB)-30 -25 -20 -15 -10 -5 0 5 10

RM

SE

10-3

10-2

10-1

100

101

102

MLE, K = 1MLE, K = 10MLE, K = 50CRBS, K = 1CRBS, K = 50CRBD

SNR (dB)-50 -40 -30 -20 -10 0 10

RM

SE

10-3

10-2

10-1

100

101

102


SNR (dB)-40 -30 -20 -10 0 10

RM

SE

10-3

10-2

10-1

100

101

102


SNR (dB)-40 -30 -20 -10 0 10

RM

SE

10-3

10-2

10-1

100

101

102


Figure 6: Natural images. Comparison of the MLE performance fordifferent number of images K to the CRB under the deterministicimage model (CRBD) and the natural image stochastic model (CRBS)for the four examples shown in Fig. 5: building (top-left), paris(top-right), napoli (bottom-left) and bolivia (bottom-right).

The results are compared to the CRBD (Eq. (16)) and to theCRBS (Eq. (45)) for K = 10. Figure 7 shows the resultsof the MLEavg for K = 1, 10 on a flat spectrum underlyingimage, compared to the CRBS and EZZB. The case K = 1corresponds to pairwise alignment.

For both natural and flat spectrum images, similarly towhat was predicted by the EZZB, we identify four differentregions of behavior of the MLEavg depending on the SNR value(compare figures 6 and 7 to Figure 4).

For very high SNR, all bounds agree and the MLEavg attainsthe limiting performance, which is independent of the numberof images K. Thus, under very high SNR, the alignment canbe performed pairwise without loss of accuracy and MLE isan optimal estimator.

For moderate to high SNR, a different behavior is observedfor flat spectrum and natural images. For flat spectrum images,the MLEavg still attains the limiting performance given by theCRBS and is thus optimal. For natural images, on the contrary,the MLEavg performance is close to the CRBS but it is nottight. A possible reason explaining this behavior is the non-optimality of the MLEavg, for which a critical drawback is thatit does not use image prior information. Moreover, for thisSNR region, the MLEavg performance clearly improves withincreasing number of images. Because the CRBS bound forK = 1 is outperformed when using more images, we canconclude that pairwise registration is not optimal when moreimages are available under moderate to high SNR levels (seefigures 6 and 7).

11

SNR (dB)-10 -5 0 5 10

RM

SE

10-2

10-1

100

101

MLE K = 1MLE K = 10CRBS K = 1EZZB K = 1CRBS K = 10EZZB K = 10

Figure 7: Flat spectrum. Comparison of the MLE performance tothe Cramer-Rao bound (CRBS) and the Ziv-Zakai bound (EZZB) forthe pairwise and K = 10 cases.

Similarly, as predicted by the EZZB for white signals,we observe a transition zone where performance degradesdramatically to finally converge to a flat zone. The flat regioncorresponds to SNR levels that are too low to enable align-ment at all. For flat spectrum images and pairwise alignment(K = 1), the EZZB accurately predicts the SNR threshold thatdefines the beginning of this transition region. However, forthe multi-image case (K = 10) the MLEavg algorithm performsworse than the prediction given by the EZZB (see Figure 7).One reason for this might be that the initialization of the non-convex optimization in MLEavg is performed using the pairwiseregistration to one of the input images, which is certainly notoptimal. Nevertheless, as predicted by EZZB, the breakingpoint of the MLEavg is pushed back several dBs when usingmore images in the registration (see figures 6 and 7).

VII. CONCLUSIONS

In this work, we analyzed the fundamental performancelimits in image registration when multiple shifted and noisyobservations are available. We derived and analyzed Cramer-Rao and extended Ziv-Zakai bounds under different statisticalmodels for both the underlying image and the shift vectors.

The first clear finding is that there is a per-region behaviordepending on the difficulty level of the problem, given by theSNR conditions (see for example figures 4 and 7). At veryhigh SNR, the performance is linear with both, the SNR andthe image size, and it is independent of the prior informationon the shifts and the number of available images. Indeed, allcomputed bounds agree, and the MLE achieves the bounds.Hence, doing pairwise alignment using the MLE gives theoptimal performance.

Assuming a stochastic image model, in high to moderateSNR scenarios, the performance is super-linear with the SNRand linear with the image size. Increasing the number ofimages widens the region where performance is linear withthe SNR (very high SNR), so it improves registration. This istrue for both considered stochastic image models: flat powerspectral density or with quadratic decay. Also, this agrees withthe empirical MLE performance in this SNR range.

According to the computed extended Ziv-Zakai bound, thereexists a critical SNR below which performance degradesdramatically with SNR. Having access to more images or

increasing the image size help to push the SNR levels at whichthis transition zone starts. In very low SNR, the performancesaturates to a value essentially given by the prior variance onthe shifts. In this SNR region no alignment is possible.

In general, having access to more images improves theperformance up to a certain limit. The exception is within thevery high SNR region, where pairwise alignment is optimal.Increasing the image size always improves performance, lin-early reducing the performance bounds and pushing the criticalthresholds delimiting the transition and saturation zones. Thestudied shifts priors only had an impact at low SNR levels.

As future work, we would like to analyze the impactof having more complex shift priors, for instance modelingcorrelation between the acquired frames (e.g., modeled bya random walk). In addition, targeting a particular class ofimages, could help to develop better image priors. This willhave an impact on the moderate to low SNR levels, since theperformance in high SNR is found to be independent of theimage prior. Indeed, MLEavg has proven to be optimal whenregistering white noise signals (for the considered image size),but suboptimal for natural images in moderate SNR conditions.Prior information could help to close the gap between thefundamental limit and the MLE performance.

ACKNOWLEDGMENTS

Work partially supported by the Department of Defense andNSF. The authors would like to thank Jean-Michel Morel forfruitful comments and discussions. This work is in honor ofProf. Moshe Zakai, he will always be remembered as one ofthe greatest.

APPENDIX ACRAMER-RAO BOUND: DETERMINISTIC IMAGE MODEL

Let z be (K + 1) independent samples following (2), andassuming u is an unknown deterministic image, the log-likelihood function of z with θθθ = [uT , τττT ]T is given by

`(z;θθθ) = − 1

2σ2

K∑i=0

||zi(x)− u(x− τττ i)||2, (76)

where we discarded the terms independent of θθθ. To computethe CRB we first compute the FIM

JD = −Ez|θθθ

[∂2`(z;θθθ)

∂θθθ2

]=

[Juu JTuτττJuτττ Jττττττ

]. (77)

Hence,

Juu = −Ez|θθθ

[∂2`(z;θθθ)

∂u2

]=

1

σ2(K + 1)INp , (78)

Jττττττ = −Ez|θθθ

[∂2`(z;θθθ)

∂τττ2

]=

1

σ2IK ⊗Q, (79)

Juτττ = −Ez|θθθ

[∂2`(z;θθθ)

∂u∂τττ

]=

1

σ21⊗ [uTx ,u

Ty ]T , (80)

where INp is the identity matrix of size Np × Np (idem forIK), 1 is a vector of ones of size K, ⊗ is the Kronecker

12

product, ux, uy are the derivatives of the latent image u inthe horizontal and vertical directions respectively, and

Q =

[uTxux uTxuyuTxuy uTy uy

]. (81)

Using the block matrix inversion principle [47], the inverseof JD can be expressed as

J−1D =

[S−1u J−1


S−1τττ JTuτττJ

−1uu S−1

τττ

], (82)

where

Su = Juu − JuτττJ−1ττττττ J

Tuτττ , Sτττ = Jττττττ − JTuτττJ

−1uuJuτττ . (83)

From (78)–(80) and (83) we have,

S−1τττ = σ2

(IK + 11T

)⊗Q−1. (84)

The Cramer-Rao bound in (16) follows from (84).

APPENDIX BBAYESIAN CRAMER-RAO BOUND WITH SHIFTS PRIOR

Let z be (K+1) independent samples following model (2),and assuming that u is an unknown deterministic image andτττ is a random variable following a generalized Gaussian priorp(τττ) given by (22), the joint log-likelihood `(z, θθθ), with θθθ =[uT , τττT ]T is given by

`(z, τττ ;u) = log p(z|τττ ;u) + log p(τττ). (85)

Hence, from (21) the Bayesian FIM becomes

JB=−Ez,τττ |u

[∂2 log p(z|τττ ;u)

∂θθθ2

]− Eτττ

[∂2 log p(τττ)

∂θθθ2

]. (86)

Next, from (77),

Ez,τττ |u

[∂2log p(z|τττ ;u)

∂θθθ2

]= Eτττ

[Ez|u,τττ

[∂2log p(z|τττ ;u)

∂θθθ2

]]= −Eτττ [JD] = −JD. (87)

For the generalized prior (22), it can be shown that [48]

−Eτττ[∂2 log p(τττ)

∂τττ2

]=

1

λ2I2K , (88)

where λ2 def= δ2Γ2(1/c)

c2Γ(3/c)Γ(2−1/c) . Hence, from (86)–(88),

JB = JD + Jp, with Jp =

[0 00 1

λ2 I2K

]. (89)

Using the block matrix inversion principle [42], the inverse ofJB can be expressed as

J−1B =

[S−1u J−1


S−1τττ JTuτττJ

−1uu S−1

τττ

], (90)

with

Su = Juu − Juτττ

(Jττττττ + 1

λ2 I2K

)−1JTuτττ (91)

Sτττ = Jττττττ + 1λ2 I2K − JTuτττJ

−1uuJuτττ . (92)

where Juu, Jττττττ and Juτττ are given by (78)–(80). Hence,

Sτττ = 1σ2

(I− (K + 1)11T

)⊗Q + 1

λ2 I, (93)

and

S−1τττ = I⊗

(1σ2Q + 1

λ2 I)−1

+ 11T⊗λ2(

(K+2)I + λ2

σ2Q + (K+1)σ2

λ2Q−1)−1

.

(94)

The Bayesian Cramer Rao bound in (28) follows from (94).

APPENDIX CCRAMER-RAO BOUND: STOCHASTIC IMAGE MODEL

Let z be given by (30). This random variable follows acomplex Gaussian distribution with zero mean and covariancematrix ΣΣΣ given by (31). The FIM corresponding to thecomplex Gaussian process z is given by [18, Ap. 15C]

{JS}ih,jq =

M∑l=1

tr(

ΣΣΣ−1τττ (ωωωl)

∂ΣΣΣτττ (ωωωl)

∂τihΣΣΣ−1τττ (ωωωl)

∂ΣΣΣτττ (ωωωl)

∂τjq

),

(95)where i, j = 1, . . . ,K, and h, q ∈ {x, y} index the twocomponents of each 2D shift vector τττ i. The spatial frequencyωωωl, l(lx, ly) = 1, . . . ,M with M = mc + mr

2 + 2 indexesthe 2D frequencies ωωωl = [ωlx , ωly ]T with ωlx = 2πlx

mc, lx =

−mc2 , . . . ,mc2 and ωly =

2πlymr

, ly = 0, . . . , mr2 . To simplifynotation, we avoid in the following the subindex l on ωωω. Thematrix ΣΣΣτττ (ωωω) can be decomposed as

ΣΣΣτττ (ωωω) = S(ωωω)Pτττ (ωωω)Pτττ (ωωω)H +N(ωωω)IK+1, (96)

withPτττ (ωωω) = [1, eiωωω·τττ1 , eiωωω·τττ2 , . . . , eiωωω·τττK ]T . (97)

Using the Sherman-Morrison formula [42],

ΣΣΣ−1τττ (ωωω) = N−1(ωωω)

(IK+1 + α(ωωω)Pτττ (ωωω)Pτττ (ωωω)H

), (98)

where

α(ωωω) = − S(ωωω)

N(ωωω) + (K + 1)S(ωωω). (99)

To simplify notation we avoid in the following the dependenceon ωωω. Hence we have

ΣΣΣ−1τττ

∂ΣΣΣτττ∂τih

= N−1

(∂ΣΣΣτττ∂τih

+ αPτττPHτττ


), (100)

and

tr(

ΣΣΣ−1τττ


ΣΣΣ−1τττ

∂ΣΣΣτττ∂τjq

)=

N−1

[tr(∂ΣΣΣτττ∂τih


)+ 2αtr

(∂ΣΣΣτττ∂τih

PτττPHτττ


)+ α2tr

(PτττP

Hτττ


PτττPHτττ


)]. (101)

Substituting (101) in (95) and computing the derivatives,

JS =[(K + 1)IK − 11T

]⊗B, with B =

[ρx,x −ρx,y−ρx,y ρy,y

],

(102)

ρh,q =

M∑l=1

2S2(ωωωl)ωlhωlqN2(ωωωl) + (K + 1)S(ωωωl)N(ωωωl)

. (103)

13

Hence, we have

J−1S = 1

(K+1)

(IK + 11T

)⊗B−1. (104)

Then, the error on the shifts estimates is bounded by

MSE ≥ 1

2Ktr(J−1

S ) =1

(K + 1)

(ρ2x,x + ρ2

y,y

ρx,xρy,y − ρ2x,y

). (105)

Notice that if S(ωωω) and N(ωωω) are rotational symmetric (ro-tation invariant), we have ρx,y = 0 and ρx,x = ρy,y. In thatcase, the CRB bound on the MSE (105) becomes

CRBSdef=

2

(K + 1)ρxx=

2

(K + 1)ρyy. (106)

High SNR performance. When the signal-to-noise ratio isvery high, i.e., S(ωωω)/N(ωωω)� 1, we have that

ρx,xS/N→∞−−−−−−→ ρHSNR

x,x =

M∑l=1

2S(ωωωl)ω2x

(K + 1)N(ωωω). (107)

Let us assume the noise is white with N(ωωω) = σ2. Since,M∑l=1

S(ωωωl)ω2lx ≈

Np(2π)2

∫ π

0

∫ π

−πS(ωωω)ω2

x dωx dωy, (108)

=1

2

Np(2π)2

∫ π

−π

∫ π

−πS(ωωω)ω2

x dωx dωy. (109)

Then, the CRBS for a rotation invariant process (Eq. (106))under high SNR simplifies to

CRBSHSNR def=

2σ2(2π)2

Np∫S(ωωω)ω2

x dωωω. (110)

This bound is independent of the number of images K. As weshow at follows, it agrees with the deterministic CRB.

Let u be a deterministic image with Np � 1 pixels, that weassume rotation invariant for simplicity. We can approximatethe power spectral density by its empirical power spectrumSd(ωωω). Then,

1

NpuTxux =

1

(2π)2

∫Sd(ωωω)ω2

x dωωω. (111)

For a rotation invariant image u, we have that

uTxuy =1

(2π)2

∫u(ωx, ωy)ωxωy dωx dωx = 0.

Next, we have that the deterministic CRB in (16), for arotation invariant signal, can be rewritten as

CRBD =2σ2

uTxux=

2σ2(2π)2

Np∫Sd(ωωω)ω2

x dωωω. (112)

That is, the stochastic CRB bound in high SNR agrees withthe deterministic CRB.

Flat Spectrum signals. Let us consider the particular case offlat spectrum signals, that is,

S(ωωω) =

{Sw if max(|ωx|, |ωy|) ≤W/2,0 otherwise.

(113)

We will also assume that the additive noise spectrum is flatin the same frequency band [−W2 ,

W2 ]2 and zero otherwise.

In this case, if we assume M � 1, we can consider the sumin (103) for ρxx, as a Riemann approximation, that is,

ρxx ≈2Np

(2π)2

∫ W2

0

∫ W2

0

2S2ω2lx

N2+(K+1)SNdωωω (114)

=S2W 4Np

3π224(N2 + (K + 1)SN). (115)

Thus, rewriting (115) in terms of the SNR as defined in (49),we obtain that the CRB (Eq. (106)) for white images is:

CRBSwdef=

8π2(W 2 + 6SNRw(K + 1)

)Np(K + 1)SNR2

w3W 2. (116)

Natural images. One classical assumption when modelingnatural images is that the power spectrum falls quadraticallywith the Fourier frequency. Let us assume that the consideredunderlying image follows this law, that is,

S(ωωω) =

{Sn‖ωωω‖−2 if max(|ωx|, |ωy|) ≤W/2,0 otherwise.

(117)

Similarly as for the white signals, let us assume that theadditive noise spectrum is flat in the same frequency band[−W2 ,

W2 ]2 taking value N and zero otherwise. We can ap-

proximate the sum in ρxx (Eq. (103)) by,

ρxx ≈2Np

(2π)2

∫ W2

0

∫ W2

0

2(SnN )2ω2x

(ω2x+ω2

y)2+ SnN (K+1)(ω2

x+ω2y)

dωωω.

(118)

Due to symmetry,∫ W2

0

∫ W2

0

2S2nω

2x

N2(ω2x+ω2

y)2+(K+1)SnN(ω2x+ω2

y)dωωω (119)

=

∫ W2

0

∫ W2

0

S2n(ω2

x + ω2y)

N2(ω2x+ω2

y)2+(K+1)SnN(ω2x+ω2

y)dωωω (120)

≈ π

2

∫ W√π

0

S2nr

N2r2 + (K + 1)SnNdr (121)

= π2S

2n acoth

(1 + 2π(K+1)Sn

W 2N

). (122)

The approximation in (120) is done by changing the area ofintegration from [0, W2 ]2 to the quarter of circle [0, π2 ] of radiusW√π

. This is the maximum overlapping circular region thatcovers the same area as the original one. Thus, under theconsidered natural image model, the CRB in (106) can beapproximated by

CRBSndef=

8π

Np(K + 1)SNR2n acoth

(1+ 2π(K+1)SNRn

W 2

) , (123)

where SNRn is defined in (44).

APPENDIX DEXTENDED ZIV-ZAKAI BOUND: PROBABILITY OF ERROR

The computation of the Extended Ziv-Zakai bound requirescomputing the probability of error P el

min(ϕϕϕ,ϕϕϕ + δδδ), for the

14

equally likely hypothesis case. This probability can be tightlyapproximated [41, Eq. (2.243)] by

P elmin(ϕϕϕ,ϕϕϕ+ δδδ) ≈ (124)12

exp{µ(sm) +

s2m2µ′′(sm)

}Φ(sm√µ′′(sm)

)+

12

exp{µ(sm) + (sm−1)2

2µ′′(sm)

}Φ(

(1−sm)√µ′′(sm)

)where

µ(s) = log

∫[p(z |ϕϕϕ)]

s[p(z |ϕϕϕ+ δδδ)]

1−sdz, (125)

sm verifies µ′(sm) = 0 and Φ(t) = 1√2π

∫∞te−

t2

2 dt.Assuming that the Fourier coefficients at different frequen-

cies are statistically uncorrelated this becomes

µ(s) = −M∑l=1

{s log

∣∣ΣΣΣτττ+δδδ(ωωωl)∣∣+ (1− s) log

∣∣ΣΣΣτττ (ωωωl)∣∣

+ log(∣∣sΣΣΣ−1

τττ+δδδ(ωωωl) + (1− s)ΣΣΣ−1τττ (ωωωl)

∣∣) }, (126)

where ΣΣΣτττ (ωωω) is defined in (96). By doing some algebramanipulations one can see that∣∣ΣΣΣτττ (ωωω)

∣∣= ∣∣ΣΣΣτττ+δδδ(ωωω)∣∣= N(ωωω)K (N(ωωω) + (K+1)S(ωωω)) . (127)

Next, from (98) we can rewrite the determinant in the secondterm of (126) as,

|sΣΣΣ−1τττ+δδδ(ωωωl) + (1− s)ΣΣΣ−1

τττ (ωωωl)∣∣

= N(ωωω)−(K+1)(

1 + sα(ωωω)(K + 1))

·(

1 + (1− s)α(ωωω)((K + 1)− β(ωωω)T (δδδ,ωωω)

)), (128)

where

β(ωωω) =sα(ωωω)

1+(K+1)sα(ωωω), T (δδδ,ωωω) =

∣∣∣1 +

K∑j=1

e−iδδδj ·ωωω

∣∣∣2. (129)

Thus substituting (127) and (128) in (126) one obtains,

µ(s) = −M∑l=1

log[1 + 4s(1− s)γ(δδδ,ωωωl)

], (130)

where

γ(δδδ,ωωωl) =S(ωωωl)

2(

(K + 1)2 − T (δδδ,ωωωl))

4N(ωωωl)2 + (K + 1)N(ωωωl)S(ωωωl). (131)

Thus,

µ′(s) =

M∑l=1

4γ(δδδ,ωωωl)(2s− 1)

1 + 4s(1− s)γ(δδδ,ωωωl)(132)

and the point such that µ′(sm) = 0 is sm = 1/2. Then

µ( 12) =−

M∑l=1

log (1+γ(δδδ,ωωωl)) , µ′′( 1

2) =

M∑l=1

8γ(δδδ,ωωωl)

1 + γ(δδδ,ωωωl).

(133)

Finally, the probability of error can be approximated by

P elmin(τττ , τττ + δδδ) ≈ 1

2 exp {a(δδδ) + b(δδδ)}Φ(√

2b(δδδ)), (134)

where

a(δδδ)=−M∑l=1

log (1+γ(δδδ,ωωωl)) , b(δδδ)=

M∑l=1

γ(δδδ,ωωωl)

1+γ(δδδ,ωωωl), (135)

0

2

4

δ2

6

8

1002

h

46

810

-1000

-2000

-3000

0

-4000

A(δ

)Pe(δ

)

h

0 2 4 6 8 10

A(δ

)Pe(δ

)

10-150

10-100

10-50

100

δ2 = 0

δ2 = hδ2 = h/2

Figure A.1: Visualization of an example function g(h, δ2) givenby (141) (left), and different cuts of g(h, δ2) at δ2 = 0, h, h/2(right). For low h values, the maximum of g happens at (h, h/2),while for large values it happens at (h, 0).

APPENDIX EEXTENDED ZIV-ZAKAI BOUND FOR WHITE SIGNALS

To simplify the analysis we consider one-dimensional ran-dom signals with a constant power spectral density S(ω) in[−W2 ,

W2 ] and zero otherwise, and zero-mean Gaussian noise

with flat spectrum N(ω) = N in the same frequency band.We closely follow the deduction in [29].

First, assuming Np � 1, let us approximate a(δδδ) and b(δδδ)in (135) for the particular case of white signals,

a(δδδ) = −Np2π

∫ W/2

0

log[1 + γ(δδδ, ω)

]dω, (136)

b(δδδ) =Np2π

∫ W/2

0

γ(δδδ, ω)

1 + γ(δδδ, ω)dω, (137)

and

γ(δδδ, ω) =S2((K + 1)2 − T (δδδ, ω)

)4 (N2 + (K + 1)NS)

. (138)

To evaluate the Extended Ziv-Zakai bound in estimatingone single component of τττ = [τ1, τ2, . . . , τM ], we can choosewithout loss of generality a = [1, 0, . . . , 0] in (56). Thus,evaluation of the EZZB requires computing

ε21 ≥∫ ∞

0

PA(h) dh, (139)

where we have defined

PA(h) = V{

maxδδδ:δ1=h

A(δδδ)P elmin(δδδ)

}. (140)

Note that to solve (139) one needs to maximize

g(δδδ) = A(δδδ)P elmin(δδδ) (141)

with respect to [δ2, . . . , δK ] for each value of δ1 = h. Alossy (in general) lower bound can be obtained by setting theunspecified components of δδδ to zero. Due to the symmetry ofthe problem it is clear that the maximum should be attainedat δδδ such that δ2 = δ3 = . . . = δK . Thus, to simplify theexposition, let us do an abuse of notation and refer to δδδ as thecouple [δ1, δ2], and omit δ3, . . . , δK assuming that they are allequal to δ2.

An example of the function g(δ1, δ2) is shown in Figure A.1.From Figure A.1 it is clear that there are roughly two differentbehaviors of g: one in the vicinity of h ≈ 0 where themaximum of g(h, δ2) is obtained at δ2 = h/2, while whenh � 0 the maximum is obtained at δ2 = 0. In what follows,

15

we approximate the probability of error in these two differentscenarios in order to reach a simplified version of the EZZB.Small values of h. Let us note that

a(δδδ) + b(δδδ) ≥ −1

2

Np2π

∫ W/2

0

γ(ω,δδδ)2 dω, (142)

which is a direct consequence of log(1 + x)− x1+x ≤

x2

2 , forx ≥ 0. Also, since x/(1 + x) ≤ x, for x ≥ 0, we have,

c(δδδ)def= 2b(δδδ) ≤ Np

π

∫ W/2

0

γ(ω,δδδ) dω. (143)

Thus, for the particular case δδδ = [h, h/2], we have

(K+1)2−T (h, h2 ) =8(K−1) sin2(ωh4 )+ 4 sin2(ωh2 ) (144)

≤ (K+1)2 h2ω2. (145)

The last inequality is due the fact that sin2(x) ≤ x2. Thisleads to an upper bound on γ, Equation (138),

γ(h, h2 , ω) ≥ κ1ω2h2

4 , with κ1 = S2(K+1)2N2+2(K+1)NS . (146)

Thus, the probability of error can be approximately lowerbounded by

P elmin(h,

h2

) ≈ ea(h,h2)+b(h,

h2)Φ(c(h, h

2)) ≥ e−d

4h4

Φ(c h) (147)

whered4 =

W 5Npκ21

π·5·211 and c2 =W 3Npκ1

π·3·25 .

Note that this lower bound on the probability of error is validfor the whole domain of h and thus can be used to obtain alower bound on the performance of any estimator of τττ . Forthe case δδδ = [h, h2 ] the function A(δδδ) given by (58) takes theform,

A(h, h2 ) = (1− hD )(1− h

2D )K−1. (148)

Next, we have the following lower bound

PA(h) ≥ A(h, h2 )P elmin(h, h2 ) ≥ f1(h), (149)

where

f1(h) = (1− hD )(1− h

2D )K−1e−d4h4

Φ(ch). (150)

When K is very large, the factor (1 − h2D )K−1 in A(h, h2 )

significantly attenuates the probability of error. In this par-ticular case, δδδ = [h, 0] may lead to a tighter lower bound.Nevertheless, we will not consider this scenario.Large values of h. For large values of h, we will boundthe performance with its value at δδδ = [h, 0]. According toFigure A.1, this is the tightest path to maximize g in the regionh� 0. In this case, the function

g(h, 0) = (1− h/D)P elmin(h, 0), (151)

oscillates outside the vicinity of 0 as illustrated in Figure A.1.If h0, h1, . . . , hL are the local maxima of P el

min(h, 0), then thefunction V {g(h, 0)}, is tightly lower bounded by

V {g(h, 0)} ≤ (1− hjD

)P elmin(hj , 0) for hj−1 ≤ h ≤ hj . (152)

Moreover, since the function V {g(h, 0)} is non-increasing(by definition), this is true for any given set of {hn}. Forsimplicity, let us chose hj = 2π

W j, for j = 0, 1, . . . .

Thus, doing similar algebraic operations as before but atδδδ = [h, 0] we obtain

(K + 1)2 − T (h, 0) = 4K sin2(ωh/2) (153)

and substituting (153) in (138), we obtain

γ(h, 0, ω)= κ2 sin2(ωh/2), withκ2 = S2KN2+(K+1)NS . (154)

In this case, the probability of error P elmin(hj , 0) can be

tightly approximated by (134),

P elmin(hj , 0) ≈ ea(hj ,0)+b(hj ,0)Φ(c(hj , 0)), (155)

where a(hj , 0) and b(hj , 0) are obtained by evaluating (136),(137) and (154) at h = hj , respectively, obtaining

a(hj , 0) = −WNp2π log

(√κ2+1+1

2

)def= a, (156)

and similarly

b(hj , 0) =WNp

4π

√κ2+1−1√κ2+1

def= b. (157)

Note that both equations become independent of j. Next itfollows that P el

min(hj , 0) ≈ ea+bΦ(√

2b). Thus,

V {g(h, 0)} ≥ (1− hjD

)ea+bΦ(√

2b) for hj− 2πW≤ h ≤ hj . (158)

Since 1− hjD ≥ 1− 2π

DW for hj−2πW ≤ h ≤ hj , it follows that

PA(h) ≥ V {g(h, 0)} ≥ f2(h), (159)

where

f2(h) = max(1− 2πDW −

hD , 0)ea+bφ(

√2b). (160)

Final Bound. To get the final bound we merge the twoprevious lower-bounds, (149) and (159), into a single lowerbound,

PA(h) ≥ max (f1(h), f2(h)) . (161)

To further simplify the lower bound we can split the domainof integration of h, that is [0, D], and make each one valid in aregion. Let h∗ =

√2b/c. This point is close to the intersection

of f1(h) and f2(h). Thus,

PA(h) ≥

{f1(h) if 0 ≤ h < h∗,

f2(h) if h∗ ≤ h.(162)

Substituting (162) in (139) one obtains a lower bound onthe mean square error,

ε21 ≥∫ h∗

0

hf1(h) dh+

∫ D

h∗hf2(h) dh. (163)

The first term in (163) can be lower bounded by∫ h∗

0

hf1(h) dh ≥ (1− h∗

D)(1− h∗

2D)K−1

∫ h∗

0

he−d4h4

Φ(c h) dh.

(164)

When K is not very large, (1− h∗

D )(1− h∗

2D )K−1 ≈ 1. Then,∫ h∗

0

he−d4h4

Φ(c h) dh =1

c2

∫ √2b

0

he− 9πh4

10WNp Φ(h) dh. (165)

16

The second term in (163) can be (approx.) lower bounded by∫ D

h∗hf2(h) dh =

∫ D

h∗max(1− 2π

WD− hD, 0)ea+bφ(

√2b)h dh

≥ ea+bφ(√

2b)

∫ D−2π/W

4√3/W

max(1− 2πWD− h

D, 0)hdh

≈ D2

6ea+bφ(

√2b). (166)

From (163), (165) and (166), we get the EZZB bound in (63).

REFERENCES

[1] P. E. Debevec and J. Malik, “Recovering high dynamic range radiancemaps from photographs,” in Proc. An. Conf. Comp. Grap. Inter. Tech.(SIGGRAPH), 1997, pp. 369–378.

[2] C. Aguerrebere, J. Delon, Y. Gousseau, and P. Muse, “SimultaneousHDR image reconstruction and denoising for dynamic scenes,” in Proc.IEEE Int. Conf. Comput. Photogr. (ICCP), 2013, pp. 1–11.

[3] D. Robinson and P. Milanfar, “Statistical performance analysis of super-resolution,” IEEE Trans. Image Process., vol. 15, no. 6, pp. 1413–1428,2006.

[4] D. Robinson, S. Farsiu, and P. Milanfar, “Optimal registration of aliasedimages using variable projection with applications to super-resolution,”Comput. J., vol. 52, no. 1, pp. 31–42, 2009.

[5] C. Liu and D. Sun, “On Bayesian adaptive video super resolution,” IEEETrans. Pattern Anal. Mach. Intell., vol. 36, no. 2, pp. 346–360, 2014.

[6] T. Buades, Y. Lou, J.-M. Morel, and Z. Tang, “A note on multi-imagedenoising,” in In Proc. of Local and Non-Local Approx. in Image Proc.(LNLA), 2009.

[7] H. Zhang, D. Wipf, and Y. Zhang, “Multi-image blind deblurring using acoupled adaptive sparse prior,” in Proc. IEEE Conf. Comput. Vis. PatternRecog. (CVPR), 2013, pp. 1051–1058.

[8] M. Delbracio and G. Sapiro, “Burst Deblurring: Removing camera shakethrough Fourier burst accumulation,” in Proc. IEEE Conf. Comput. Vis.Pattern Recog. (CVPR), 2015, pp. 2385–2393.

[9] R. C. Hardie, K. J. Barnard, and E. E. Armstrong, “Joint map registrationand high-resolution image estimation using a sequence of undersampledimages,” IEEE Trans. Image Process., vol. 6, no. 12, pp. 1621–1633,1997.

[10] H. S. Sawhney, S. Hsu, and R. Kumar, “Robust video mosaicing throughtopology inference and local to global alignment,” in Proc. Europ. Conf.Comput. Vis. (ECCV), 1998, pp. 103–119.

[11] V. M. Govindu, “Lie-algebraic averaging for globally consistent motionestimation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR),2004, pp. 684–691.

[12] S. Farsiu, M. Elad, and P. Milanfar, “Constrained, globally optimal,multi-frame motion estimation,” in Proc. Workshop Stat. Signal Process.IEEE, 2005, pp. 1396–1401.

[13] N. A. Woods, N. P. Galatsanos, and A. K. Katsaggelos, “Stochasticmethods for joint registration, restoration, and interpolation of multipleundersampled images,” IEEE Trans. Image Process., vol. 15, no. 1, pp.201–213, 2006.

[14] X. Li, P. Mooney, S. Zheng, C. R. Booth, M. B. Braunfeld, S. Gubbens,D. A. Agard, and Y. Cheng, “Electron counting and beam-inducedmotion correction enable near-atomic-resolution single-particle cryo-EM,” Nat. Methods, vol. 10, no. 6, pp. 584–590, 2013.

[15] A. Bartesaghi, D. Matthies, S. Banerjee, A. Merk, and S. Subramaniam,“Structure of β-galactosidase at 3.2-A resolution obtained by cryo-electron microscopy,” Proc. Natl. Acad. Sci. U.S.A., vol. 111, no. 32,pp. 11 709–11 714, 2014.

[16] T. Grant and N. Grigorieff, “Measuring the optimal exposure for singleparticle cryo-em using a 2.6A reconstruction of rotavirus VP6,” eLife,2015.

[17] J. L. Rubinstein and M. A. Brubaker, “Alignment of cryo-EM moviesof individual particles by optimization of image translations,” J. Struct.Biol., vol. 192, no. 2, pp. 188–195, 2015.

[18] S. M. Kay, Fundamentals of statistical signal processing. [Volume I]. ,Estimation theory. Prentice Hall, 1993.

[19] D. Robinson and P. Milanfar, “Fundamental performance limits in imageregistration,” IEEE Trans. Image Process., vol. 13, no. 9, pp. 1185–1199,2004.

[20] T. Q. Pham, M. Bezuijen, L. J. van Vliet, K. Schutte, and C. L.Luengo Hendriks, “Performance of optimal registration estimators,” inProc. SPIE Vis. Inf. Proces., vol. 5817, 2005, pp. 133–144.

[21] C. Aguerrebere, J. Delon, Y. Gousseau, and P. Muse, “Best algorithmsfor HDR image generation. a study of performance bounds,” SIAM J.Imag. Sci., vol. 7, no. 1, pp. 1–34, 2014.

[22] P. Chatterjee and P. Milanfar, “Is denoising dead?” IEEE Trans. ImageProcess., vol. 19, no. 4, pp. 895–911, 2010.

[23] J. Ziv and M. Zakai, “Some lower bounds on signal parameter estima-tion,” IEEE Trans. Inf. Theory, vol. 15, no. 3, pp. 386–391, 1969.

[24] L. P. Seidman, “Performance limitations and error calculations forparameter estimation,” Proc. IEEE, vol. 58, no. 5, pp. 644–652, 1970.

[25] D. Chazan, M. Zakai, and J. Ziv, “Improved lower bounds on signalparameter estimation,” IEEE Trans. Inf. Theory, vol. 21, no. 1, pp. 90–93, 1975.

[26] K. Bell, Y. Steinberg, Y. Ephraim, and H. Van Trees, “Extended Ziv-Zakai lower bound for vector parameter estimation,” IEEE Trans. Inf.Theory, vol. 43, no. 2, pp. 624–637, 1997.

[27] M. Xu, H. Chen, and P. Varshney, “Ziv-Zakai bounds on image regis-tration,” IEEE Trans. Signal Process., vol. 57, no. 5, pp. 1745–1755,2009.

[28] A. J. Weiss and E. Weinstein, “Fundamental limitations in passive timedelay estimation–part I: Narrow-band systems,” IEEE Trans. Acoust.,Speech, Signal Process., vol. 31, no. 2, pp. 472–486, 1983.

[29] E. Weinstein and A. J. Weiss, “Fundamental limitations in passive time-delay estimation–part II: Wide-band systems,” IEEE Trans. Acoust.,Speech, Signal Process., vol. 32, no. 5, pp. 1064–1078, 1984.

[30] M. Rais, C. Thiebaut, J.-M. Delvit, and J.-M. Morel, “A tight multiframeregistration problem with application to earth observation satellite de-sign,” in Proc. IEEE Int. Conf. Imag. Sys. and Tech. (IST), Oct 2014,pp. 6–10.

[31] A. Bhattacharyya, “On some analogues of the amount of informationand their use in statistical estimation,” Sankhya: Indian J. Stat., pp. 1–14,1946.

[32] E. Barankin, “Locally best unbiased estimates,” Ann. Math. Stat., pp.477–501, 1949.

[33] J. S. Abel, “A bound on mean-square-estimate error,” IEEE Trans. Inf.Theory, vol. 39, no. 5, pp. 1675–1680, 1993.

[34] H. Trees, Detection, estimation and modulation theory, vol. 1. WileyNew York, 1968.

[35] A. J. Weiss, Fundamental bounds in parameter estimation. Tel-AvivUniversity, 1985.

[36] S. Roth and M. J. Black, “Fields of experts: A framework for learningimage priors,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR),vol. 2, 2005, pp. 860–867.

[37] M. Elad and Y. Hel-Or, “A fast super-resolution reconstruction algorithmfor pure translational motion and common space-invariant blur,” IEEETrans. Image Process., vol. 10, no. 8, pp. 1187–1193, 2001.

[38] R. Fransens, C. Strecha, and L. Van Gool, “Optical flow based super-resolution: A probabilistic approach,” Comput. Vis. Image Und., vol.106, no. 1, pp. 106–115, 2007.

[39] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Understandingand evaluating blind deconvolution algorithms,” in Proc. IEEE Conf.Comput. Vis. Pattern Recog. (CVPR), 2009, pp. 1964–1971.

[40] N. Efrat, D. Glasner, A. Apartsin, B. Nadler, and A. Levin, “Accurateblur models vs. image priors in single image super-resolution,” in Proc.IEEE Int. Conf. Comput. Vis. (ICCV), 2013, pp. 2832–2839.

[41] H. L. Van Trees, K. L. Bell, and Z. Tian, Detection Estimation andModulation Theory. John Wiley & Sons, 2013.

[42] K. B. Petersen and M. S. Pedersen, “The matrix cookbook,” TechnicalUniversity of Denmark, 2012.

[43] A. Nehorai and M. Hawkes, “Performance bounds for estimating vectorsystems,” IEEE Trans. Signal Process., vol. 48, no. 6, pp. 1737–1749,2000.

[44] D. J. Field, “Relations between the statistics of natural images and theresponse properties of cortical cells,” J. Opt. Soc. Am. A, vol. 4, no. 12,pp. 2379–2394, 1987.

[45] Y. Noam and H. Messer, “Notes on the tightness of the hybrid Cramer-Rao lower bound,” IEEE Trans. Signal Process., vol. 57, no. 6, pp.2074–2084, 2009.

[46] M. Guizar-Sicairos, S. T. Thurman, and J. R. Fienup, “Efficient subpixelimage registration algorithms,” Opt. Lett., vol. 33, no. 2, pp. 156–158,2008.

[47] F. Graybill, Matrices with Applications in Statistics, ser. Duxbury ClassicSeries. Cengage Learning, 2001.

[48] H. Nguyen and H. L. Van Trees, “Comparison of performance boundsfor DOA estimation,” in IEEE Seventh SP Work. on Stat. Sig ArrayProc., 1994, pp. 313–316.

fundamental limits in multi-image alignment - arxiv · 2016-02-05 · fundamental limits in...

Documents