support recovery with sparsely sampled free random matrices

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 7, JULY 2013 4243

Support Recovery With Sparsely Sampled FreeRandom Matrices

Antonia M. Tulino, Fellow, IEEE, Giuseppe Caire, Fellow, IEEE, Sergio Verd, Fellow, IEEE, andShlomo Shamai (Shitz), Fellow, IEEE

AbstractConsider a BernoulliGaussian complex -vectorwhose components are , with andbinary mutually independent and iid across . This random-sparse vector is multiplied by a square random matrix , anda randomly chosen subset, of average size , , of theresulting vector components is then observed in additive Gaussiannoise. We extend the scope of conventional noisy compressive sam-pling models where is typically a matrix with iid components,to allow satisfying a certain freeness condition. This class ofmatrices encompasses Haar matrices and other unitarily invariantmatrices. We use the replica method and the decoupling principleof Guo and Verd, as well as a number of information-theoreticbounds, to study the inputoutput mutual information and thesupport recovery error rate in the limit of . We alsoextend the scope of the large deviation approach of Rangan etal. and characterize the performance of a class of estimatorsencompassing thresholded linear MMSE and relaxation.

Index TermsCompressed sensing, free probability, randommatrices, rate-distortion theory, sparse models, support recovery.

I. INTRODUCTION

A. Model Setup

C ONSIDER the -dimensional complex-valued observa-tion model:(1)(2)

where1) , and is an iid complex Gaussian -vectorwith components ;

2) is an iid -vector with components Bernoulli- , i.e.,;

Manuscript received August 26, 2012; accepted January 31, 2013. Date ofpublication March 07, 2013; date of current version June 12, 2013. G. Caire, S.Shamai (Shitz), and S. Verd were supported by the Binational Science Foun-dation under Grant 2008269. G. Caire was supported in part by the NationalScience Foundation (NSF) under Grant CCF-0729162. S. Verd was supportedin part by the Center for Science of Information, an NSF Science and Tech-nology Center, under Grant CCF-0939370. This paper was presented in part atthe 2011 IEEE International Symposium on Information Theory.A. M. Tulino is with Bell Laboratories, Alcatel-Lucent, Holmdel, NJ 07733

USA (e-mail: [email protected]).G. Caire is with the Department of Electrical Engineering, University of

Southern California, Los Angeles, CA 90089 USA (e-mail: [email protected]).S. Verd is with the Department of Electrical Engineering, Princeton Univer-

sity, Princeton NJ 08540 USA (e-mail: [email protected]).S. Shamai (Shitz) is with the Department of Electrical Engineering, Tech-

nionIsrael Institute of Technology, Haifa 32000, Israel ([email protected]).Communicated by V. Saligrama, Associate Editor for Signal Processing.Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TIT.2013.2250578

3) is a BernoulliGaussian vector, with components;

4) is an diagonal matrix with iid diagonal elementsBernoulli- , i.e.,;

5) is an random matrix such that1

(3)

is free from any deterministic Hermitian matrix (see [1]and references therein).

6) is an iid complex Gaussian -vector with components;

7) , , , , and are mutually independent.8) The signal-to-noise ratio (SNR) of the observation model(1) is defined as

(4)

The nonzero elements of define the support of theBernoulliGaussian vector , whose sparsity (averagefraction of nonzero elements) equal to . The nonzero diagonalelements of define the components of the product forwhich a noisy measurement is acquired. In the literature, thenumber of nonzero diagonal elements of is commonly re-ferred to as the number of measurements. The sampling rate(average fraction of observed components) of the observationmodel (1) is equal to . The sensing matrix is known tothe detector, the goal of which is to recover the support of ,i.e., to find the position of the nonzero components of .In this paper, we are interested in the optimal performance

of the recovery of the sparse signal support. Denoting the re-covered support by , with , theobjective is to minimize the support recovery error rate:

(5)

where the expectation is with respect to , , , , and . Inparticular, this paper focuses on the large regime

(6)

under the optimal maximum a posteriori symbol-by-symbol(MAP-SBS) estimator, as well as under some popular subop-timal but practically implementable estimation algorithms.1Superscript indicates Hermitian transpose.

0018-9448/$31.00 2013 IEEE

4244 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 7, JULY 2013

B. Existing Results

Recovery of the sparsity pattern with vanishing error prob-ability is studied in a number of recent works such as [2][7].When , the number of nonzero coefficients in , isknown beforehand2 and their magnitude is bounded away fromzero, exact support recovery requires that a number of measure-ments grow as [4], [7]. If the support recovery error rateis allowed to be nonvanishing, fewer measurements are neces-sary. Under various assumptions, in [2], [3], and [8], the authorsshow that a number of measurements growing proportionally to

suffices. A more refined analysis is given by Reevesand Gastpar in [8][11], assuming that the entries of the mea-surement matrix are iid but without requiring the signal vectorto be Gaussian. They find tight bounds on the behavior of

the proportionality constant as a function of SNR and the targetsupport recovery error rate. In particular, Reeves and Gastpar[10] upper bound the required difference when using amaximum-likelihood (ML) estimator of the support. The com-parison given in [10] and [11] of computationally efficient algo-rithms such as linear MMSE (LMMSE) estimation and approx-imate message passing (AMP) to information-theoretic boundsreveals that the suboptimality of those algorithms increases withSNR. In contrast to (5), Reeves and Gastpar [11] consider a dis-tortion measure which is the maximum of the false-alarm andmissed detection probability.The recent work [12] gives results for iid Gaussian measure-

ment matrices, based on the analysis of a message passing al-gorithm rather than the replica method. A full rigorization ofthe decoupling principle introduced in [13] has been recentlyannounced in [14] for compressive sensing applications withiid measurement matrices. Another rigorous justification of pre-vious replica-based results is given in [15] which shows that iidGaussian sensing matrices incur no penalty on the phase transi-tion threshold with respect to an optimal nonlinear encoding.It is of considerable interest to explore the degree of im-

provement afforded by dropping the assumption that the mea-surement matrix has iid coefficients. Randomly sampled dis-crete Fourier transform (DFT) matrices (where rows/columnsare deleted independently, e.g., [16]) are one example of suchmatrices. The model considered in Section I-A allows a relevantgeneralization of the iid measurement model, which is analyti-cally tractable.

C. Organization

Section II gives expressions for the inputoutput mutual in-formation rate and shows how to use it in order to lower boundthe support recovery error rate. We write the mutual informationof interest as the difference of twomutual information rates. Thefirst term is obtained using the heuristic replica method, previ-ously applied in various problems involving iid matrices, e.g.,[13], [17][19]. The second term is given rigorously, using freeprobability and large random matrix theory.Upper and lower bounds on the inputoutput mutual in-

formation corroborating the replica analysis are developed inSection III. We also give a converse result that shows that (6) is2Note that in our model, the number of nonzero coefficients is not known a

priori but .

bounded away for zero if . Numerical examples illustratethe tightness of the bounds.Section IV extends the decoupling principle [13] to the model

in (1) and provides the analysis of three support estimators:optimal MAP-SBS, thresholded LMMSE, and relaxation(Lasso).Proofs and other technical details are given in the Ap-

pendixes.

II. MUTUAL INFORMATION RATE

In this section, we are concerned with the mutual informationrate

(7)

where

(8)

(9)

and the right-most equality in (7) follows from

(10)(11)

A. Error Rate Lower Bound Via Mutual Information

We can bound the minimal support recovery error ratedefined in (5) in terms of using the following

simple result.Theorem 1: Given a joint distribution on , a

reconstruction alphabet and a distortion measure, let

(12)

Then

(13)

where the infimum is over all conditional probability assign-ments such that .

Proof: See Appendix A

Since is a monotonically decreasing function, (13) givesan information-theoretic lower bound on the noninformation-theoretic quantity . In our case, using the rate-distortion function of a Bernoulli- source with Hamming dis-tortion, given by , Theorem 1results in

(14)

where denotes thebinary entropy function, and where we assume [noticethat by definition (7)].

TULINO et al.: SUPPORT RECOVERY WITH SPARSELY SAMPLED FREE RANDOM MATRICES 4245

B. Mutual Information Rate Via Replica MethodFor any , we denote the minimum mean-

square error for estimating from as

(15)

With this definition, we have the following claim dependent onthe validity of the replica method.Claim 1: Let be independent random variables,

with Bernoulli- , , and ,and define . Let denote the R-transform [1]of the random matrix defined in (3). Then

(16)where and are the nonnegative solutions of the system ofequations:

(17a)

(17b)

If the solution of (17a) and (17b) is not unique, then we selectthe solution that minimizes given in (16), which correspondsto the free energy (up to an irrelevant additive constant) of aphysical system with quenched disorder parameters ,state and unnormalized Boltzmann distribution

, where

(18)

is the conditional transition probability density of the observa-tion model (1), given .

Proof: See Appendix B.

The efficient calculation of and of

is addressed in Appendix H.

C. Mutual Information Rate Via FreenessTheorem 2: Let and denote the Shannon trans-

form and -transform (see [1] and definitions in Appendix C) ofdefined in (3). Then

(19)

where and are the unique nonnegative solutions of thesystem of equations

(20)

Proof: See Appendix C.

D. Special Cases1) is an iid random matrix: Assuming has iid entries

with mean zero and variance , according to [1, Th. 2.39], the-transform of satisfies the relation

(21)

with . Using the fact that is diagonal withBernoulli- iid diagonal elements

(22)

Using this in (21), we have that is the positive solutionof the quadratic equation

(23)

which corresponds to the -transform of a random matrix of theform , with of dimension and iid elements withzero mean and variance . The R-transform of such matrix iswell known (see [1, Example 2.27]) and takes on the form

(24)

Hence, the fixed-point equations (17a) and (17b) reduce to

(25)

and (16) takes on the form

(26)This is obtained from (16) using (24) for the R-transform andthe identity , from (17a). We notice that when

, (26) coincides with the result in [13]. The formula pro-vided by Claim 1 does not coincide with the result in [13] and[18] for general since in the model considered in [13] and [18],the channel matrix is normalized such that the columns(rather than the nonzero rows, as in our setting) have unit av-erage squared norm conditioned on . Instead, our formulasare consistent with those in [10], which uses the same row-en-ergy normalization as in this paper.In order to calculate , we use (20) and obtain

(27)

Using the definition of S-transform (see Definition 3 inAppendix C), we have that

(28)

from which, identifying terms, we obtain

(29)

where, for simplicity, we let and where the right-most equality follows from the well-known explicit expression

, valid when is an iid matrix. Replacing (29)in the equality in (20), we obtain

(30)

Defining , we can rewrite (30) as

(31)


Hence, is seen to satisfy a well-known fixed-point equationyielding , where is a matrix withiid with variance (see [1, Eq. (2.120)]). Using [1, Eq.(2.121)], can be obtained in closed form as

(32)where

(33)and the corresponding Shannon transform yields the desired ,in the form

(34)

In passing, we remark that the large SNR (i.e., large ) be-havior of (34) is

(35)

for and(36)

for , showing that the pre-log of is the asymptotic al-most sure normalized rank of the matrix , as ex-pected.2) is Haar-distributed: If is Haar-distributed, i.e., uni-

formly distributed on the manifold of unitary matrices, theeigenvalue distribution of coincides with that of ,i.e., with the Bernoulli- distribution. Using (22) and the re-lation between the -transform and the R-transform in [1, Eq.2.74], we obtain

(37)

This allows for the calculation of (16) with the correspondingfixed-point equations (17a) and (17b).As far as is concerned, we use

(38)

in (20) and solve for using the first equality, obtaining

(39)

Replacing in the second equality in (20), we obtain explicitlyas

(40)It can be checked that for any and

. Using (19), (39), and (40), we obtain

(41)

where

(42)

is the binary relative entropy. Expression (41) coincides withthe result given in [16] for the limit of the mutual informationrate

(43)of a vector Gaussian channel with iid Gaussian input , andchannel matrix with .3) , unitary : In this case, and

. Hence, (17a) and (17b) become

(44)

(45)

Since implies , (40) yields and (recalling(41)), we have

(46)(47)(48)(49)

where (48) follows because and are independent con-ditioned on . In fact, in this case, the single-letter expression

holds for all , not only in the limit of.

III. BOUNDS ON THE MUTUAL INFORMATION RATE

A. Upper BoundsWe start with the following result, which follows immediately

from first principles.Theorem 3: If is unitary, then (7) satisfies

(50)

where and are as defined in Claim 1. Equation (50) holdswith equality for .

Proof: It is sufficient to notice that the output in (1) isobtained by sampling the vector at the positions ofthe 1 elements of the diagonal of . From the data processinginequality and noticing that is given by (46),the result follows.

In the general case, we have the following upper bounds.Theorem 4:

(51)

(52)

where and are as defined in Claim 1, and where is arandom variable distributed as the limiting spectrum of .


Proof: See Appendix D

B. Lower BoundsIn order to corroborate the exact result of Claim 1 obtained

through the heuristic replica method, we also consider a lowerbound to the mutual information. Since is known exactly, itis sufficient to have a lower bound for . This is provided bythe following result.Theorem 5: The mutual information rate in (8) is lower

bounded by

(53)

where and are as defined in Claim 1 and where isdefined by

(54)

where .Proof: See Appendix D

It is interesting to notice that the quantity defined in (54) canbe interpreted as the asymptotic (in ) multiuser efficiency ofa CDMA system with input , output andspreading codes given by the columns of , where the re-ceiver uses LMMSE detection with successive decoding, andthe input symbols have been already decoded andsubtracted from the received signal (see [1] and [20]). Hence,the integral in (53) can be regarded as the mutual informationbetween the input and the output of a mismatched successiveinterference cancellation receiver that treats the symbols of asif they were Gaussian iid, instead of BernoulliGaussian.Explicit expressions for can be provided in several

cases of interest. For example, when has iid entries, using [1,Th. 2.52], we obtain , given by the solution of thefixed-point equation

(55)

namely

(56)

In the case of Haar-distributed , using [1, Eq. (3.112)] weobtain , given by the solution of the fixed-pointequation

(57)

namely

(58)Using themean-value theorem in (53), there exists somesuch that

(59)

which is in the same form as the upper bound (52) save for adifferent SNR between the BernoulliGaussian input and theGaussian noise.It is also immediate to notice that the upper and lower bounds

on hold for any fixed deterministic , provided that the limitsexist. For example, in the case of , a deterministic uni-tary DFT matrix, Tulino et al. [16] show that takes onthe same form (58) as well as the exact expression for is stillgiven by Theorem 2. Hence, it follows that while currently wecan develop the replica analysis only for random, satisfyingthe aforementioned freeness requirement, the mutual informa-tion for a deterministic DFTmatrix satisfies the same bounds. Infact, we have numerical evidence (see Section IV-F) that leadsus to conjecture that the replica result of Claim 1 applies also toa DFT sensing matrix, although the proofs of this paper do notextend to this case.

C. High-SNR RegimeTheorem 6: For the observation model (1) and any support

estimator, is bounded away from zero for, even in the noiseless case.Proof: From (14), it is evident that is bounded

away from zero if . From the definition of the mutualinformation rate [see (7)], it is immediate that forany finite . However, in the limit of high SNR, may or maynot converge to depending on the system parameters and. In the remainder of the proof, we show that

(60)

provided . The case is trivial.Recall from Theorem 2 that

(61)

(62)

(63)

where we have made explicit the dependence of on . Forthe purposes of the proof, it is important to elucidate the be-havior of , , and as , where anddepend on through (62). In principle, there are nine possi-bilities:1) and .2) and .3) and diverges.4) and .5) and .6) and diverges.7) diverges and .8) diverges and .9) diverges and diverges.The asymptotic behavior of (63) is

(64)

since with probability 1.


In view of (61), cannot diverge when , since

(65)

where the lower bound is the limit of ifwhile the upper bound is the limit of if .1) Impossible because it would contradict (64).2) Impossible because it would contradict (62).3) Impossible because it would contradict (62) since .4) Impossible because it would contradict (64).5) Impossible because then and (62) would becontradicted.

6) Impossible if since would beoutside the range established in (65). If then thelower limit in (65) would be achieved at a finite argumentof which is impossible due to the strictly monotonicnature of that function.

7) Impossible because it would contradict (62).8) Impossible if because it would contradict (62). Thecase is treated below.

9) Impossible if because it would contradict (62). Thecase is treated below.

We proceed to consider case 8) when . The solution of thefixed-point equations (61) and (62) yields

(66)

(67)

(68)

(69)

We can proceed to upper bound using Theorem 4 and(66)(69):

(70)

(71)

(72)

(73)

(74)

We now proceed to consider case 9) when . In this case,the solution of the fixed-point equations (61) and (62) yields

(75)

(76)

Fig. 1. Mutual information rate versus , for and. Upper and lower bounds are also shown for comparison.

As before, we can now proceed to upper bound using The-orem 4:

(77)

(78)

(79)

where (79) follows from (75), (76), and the fact that the first twoterms in the left side vanishes as .

Note that an achievability counterpart to Theorem 6 in thenoiseless case (under a more general signal model) is given in[21], showing that is the critical sampling rate thresholdfor exact reconstruction.

D. Examples

We provide a few numerical examples illustrating the resultsdeveloped before. Figs. 13 show the mutual information rateas a function of the sampling rate , for a Haar-distributed

sensingmatrix and a Gaussian-Bernoulli source signal withand is equal to 0, 20, and 50 dB, re-

spectively. Each figure shows also the corresponding lower andupper bounds provided by Theorems 35. We notice that thelower bound of Theorem 5 is close to the exact value of forlow SNR (in fact, it is tight for ). In contrast, for highSNR, the mutual information is very closely approximated bythe minimum of the two upper bounds provided by Theorem 3and (51) in Theorem 4.It is also interesting to observe that the asymptotic regime of

vanishing for any is approached very slowly,i.e., an impractically high SNR is required. For example, wenotice that at the mutual information in Fig. 3




Fig. 4. Mutual information upper bound [right-hand side of (70)] versus(dB), for and .

achieves the upper bound of Theorem 3 (very close to ) at, which is quite far from the threshold . Fig. 4

shows the mutual information upper bound in the right-hand

Fig. 5. (a) Mapping function for the fixed-point equations (17a) and (17b) for, and . (b) Detail in order to evidence

the unstable fixed point and the left-most fixed point. (c) Corresponding freeenergy. (d) Detail of the free energy for small exhibiting the minimumcorresponding to the left-most fixed point.

side of (70) evaluated at versus SNR indB. In order to reach the value bits, we needan SNR of about 340 dB. This gives an idea of how highthe high-SNR regime must be, in order to work closely to thenoiseless reconstruction threshold.Next, we take a closer look at the behavior of the solutions of

the fixed-point equations (17a) and (17b). Even in the iid case [inwhich the equation reduces to (25)] solved in [13] and [18], thequestion of how to choose among the multiple solutions has notbeen thoroughly addressed in the literature. Figs. 57 show thefixed-point mapping function obtained by eliminating from(17a) and (17b), and given by

(80)

for and . The intersections of this func-tion with the main diagonal are the solutions of the equation

. We explore the values of in the vicinity ofthe phase transition , for which the mutual infor-mation reaches a value very close to (corresponding to

). For (see Fig. 5), we have threesolutions. Two are stable fixed points and one is an unstablefixed point. The solution corresponding to the absolute min-imum of the free energy is the right-most fixed point [seeFig. 5(c)], corresponding to a large value of , which in turntranslates into a large support recovery error rate, as we will seein Section IV-F. For (see Fig. 6), we have also threesolutions of which two are stable fixed points. However, now thesolution corresponding to the absolute minimum of the free en-ergy is the left-most fixed point [see Fig. 6(c)], correspondingto a small value of , i.e., to a very small support recovery



the unstable fixed point and the left-most fixed point. (c) Corresponding freeenergy. (d) Detail of the free energy for small exhibiting the minimumcorresponding to the left-most fixed point.


the unstable fixed point and the left-most fixed point. (c ) Corresponding freeenergy. (d) Detail of the free energy for small $1/\eta$ exhibiting the minimumcorresponding to the left-most fixed point.

error rate. This jump from the right-most to the left-moststable fixed point corresponds to a phase transition of the under-lying statistical physics system. Notice that the phase transitionmay occur at finite SNR, as in this case, and the phase transi-tion threshold is, in general, strictly larger than the noiseless

perfect reconstruction threshold . Finally, for values of sig-nificantly larger than the phase transition threshold (see the ex-ample for given in Fig. 7), only one solution exists. Inthis case, the free energy has only one extremum point whichis its absolute minimum [see Fig. 7(c)]. For the Gaussian iidsensing matrix case it is known (see [10] and references therein)that the iterative algorithm known as AMP-MMSE achievesthe right-most fixed point of (17a) and (17b). This coincideswith the optimal MAP-SBS performance when this is the validfixed point, corresponding to the minimum of . Instead, whenthere are multiple fixed points and the left-most fixed point isthe valid one, the MAP-SBS estimator is strictly better thanAMP-MMSE. Our results lead us to believe that the same be-havior holds for a more general class of sensing matrices, asstudied in this paper. From the examples above, we notice thatthe right-most fixed point is the valid one for below the phasetransition threshold. Above that threshold, either there is onlyone fixed point, for sufficiently large , or one has to choose thesolution that minimizes the free energy.

IV. ANALYSIS OF ESTIMATORS USING THEDECOUPLING PRINCIPLE

A. Decoupling PrincipleThe decoupling principle introduced by Guo and Verd [13]

states that the marginal joint distribution of each input coor-dinate and the corresponding estimator coordinate of a classof, possibly mismatched, posterior-mean estimators (PMEs)converges, as the dimension grows, to a fixed inputoutputjoint distribution that corresponds to a decoupled (i.e., scalar)Gaussian observation model. The observation model treatedby Guo and Verd in [13] is , and the goal isto estimate from , while knowing and , where is an

iid vector with a given marginal distribution, is the iidGaussian noise vector, is a random matrix with iidelements with mean zero and variance , and is andiagonal matrix whose diagonal elements have an empiricaldistribution converging weakly to a given well-behaved distri-bution. Comparing the model of [13] with (1), we notice thatas far as the estimation of the BernoulliGaussian iid vector

the two models are similar, by identifying with, with and with , with the key difference that we

allow a more general class of matrices satisfying the freenesscondition given at the beginning of Section I-A. In contrast, asfar as the estimation of is concerned, our model differs from[13] in that in our case the diagonal iid Gaussian matrix isnot known to the estimator.In this section, we apply the decoupling principle to the es-

timation of for the observation model (1). This allows us toderive the minimum possible support recovery error rate for anyestimator, achieved by the MAP-SBS estimator. The main re-sults are summarized in the remainder of this section with thedetailed derivations relegated to Appendix E. We also considerLMMSE and Lasso [22], two popular estimators in the com-pressed sensing literature. These estimators first produce an esti-mate of and then recover an estimate of the support by com-ponentwise thresholding. In order to analyze the suboptimal es-timators, we resort to the decoupling principle for the estimation


of , which can be derived along the same lines as Appendix Eor, equivalently, by extending the analysis of [13] to the class ofsensing matrices considered in this paper. In [10], LMMSE andLasso estimators are studied for the case of iid sensing matricesas special cases of the AMP algorithm [23], the performanceof which is rigorously characterized for with iid Gaussianentries in the large-dimensional limit through the solution of astate evolution equation [12]. The current AMP rigorous anal-ysis does not go through for the more general class of matricesconsidered here. Therefore, we resort to the replica large de-viation approach of Rangan et al. [18] in order to obtain the de-coupled model corresponding to these estimators. Interestingly,when particularizing our results to the iid case, we recover thesame AMP state evolution equations as given in [10].For the sake of notational simplicity, we assume that all

random variables and vectors appearing in the following for-mulas have a density (possibly including Dirac distributions),indicated by with the appropriate subscripts and arguments.In order to limit the proliferation of symbols, we use the samesymbols to indicate random variables (or vectors) and the cor-responding dummy arguments in the probability distributions.The class of estimators for which the decoupling principle

holds are mismatched PMEs where the mismatch is reflected inan assumed channel transition probability and symbol a prioriprobabilities that may not correspond to the actual ones. We re-serve the letter with the appropriate subscripts and argumentsto indicate these assumed distributions. The true conditionalchannel transition probability of given of (1) isgiven by (18). The corresponding assumed channel transitionprobability is given by

(81)

where the assumed noise variance is instead of 1.We let alsodenote an assumed a priori distribution

for , not necessarily Bernoulli- . The mismatched estimatorfor given is given by

(82)where

(83)

and where is the -variateiid Complex Gaussian density with components .In thematched case, for and Bernoulli- , (82)

coincides with the MMSE estimator.3 By considering generaland , we can study of a whole family of mismatched PMEsthrough the same unified framework [13], [17].For the purpose of analysis, it is convenient to define a vir-

tual multivariate observation model involving the random vec-tors , Bernoulli- , the corresponding observationchannel output as in (1), and an intermediatevector , not corresponding to any physical quantity3This is the PME for the matched statistics, which effectively minimizes the

MSE.

present in the original model, such that the conditional joint dis-tribution of given is given by

(84)

with

(85)

Then, can be seen as the matched PME of givenwith respect to the joint probability distribution (84). Notice

also that (84) satisfies the conditional Markov chain, for given .The decoupling principle obtained in this paper and proved

in Appendix E can be stated as follows. Let denotethe th components of the random vectors ,obeying the joint conditional distribution (84) withgiven in (82). Then, in the limit of , under the assump-tion that the replica-symmetric analysis holds (see Appendix E),the joint distribution of converges to the joint distri-bution of the triple induced by

(86)

and by , where we define the decoupledchannel

(87)

with and , with ,Bernoulli- , and with , and where andare mutually independent. Also, we define with

and identically distributed asthe marginals of the assumed prior distribution . We let

denote the common density of and , and define thefollowing probability densities for the variables ,and :

(88)

(89)

(90)

(91)

(92)

where the parameters and are obtained by solving the systemof fixed-point equations4

(93a)(93b)(93c)

(93d)

4 denotes the first derivative of a single-variate function .


The expectations in (93a)(93d) are defined with respect to thejoint distribution of given by

(94)

where is the BernoulliGaussian distribution of, is given in (88)

(95)

with given in (90), is the distribution of, and

(96)

In passing, notice also that (86) and (94) satisfy the Markovchains and , respectively.If the solution to (93a)(93d) is not unique, then we have

to select the solution that minimizes the system free energy(expressed in nats):

(97)

As expected, by letting and Bernoulli- , we obtainand and (93a)(93d) reduce to (17a) and (17b). It is

also immediate to see that in this case we havewhere is given in (16).By particularizing our analysis to the case of with iid ele-

ments, using (24), we obtain the simpler fixed-point equations

(98a)

(98b)

which recovers the results of [13], [18], and [19] up to a differentnormalization as discussed in the first example of Section II-D.

B. Symbol-by-Symbol MAP Estimator

As an application of the decoupling principle, we can deter-mine the minimum achievable by particularizingthe above formulas for the MAP-SBS estimator of given

, operating according to the optimal decision rule

(99)

It is well known that the MAP-SBS minimizes the supportrecovery error rate over all possible estimators. A byproductof the decoupling principle is that, in the matched case,(86) yields immediately that the limiting posterior marginal

for a randomly chosen th component ofis given by , the posterior distribution of the

decoupled channel (87), marginalized with respect to . Inthe matched case, (93a)(93d) reduce to (17a) and (17b) in

Theorem 1, and is easily obtained by noticingthat given is conditionally distributed as

(100)

i.e., for andfor . Then

(101)

(obviously ) whereis obtained from (17a) and (17b) and where we define:

(102)

The resulting MAP-SBS estimator is

(103)

with decision if

(104)

(with randomization on the boundary). Taking the logarithm ofboth sides, we find the energy detector (analogous to nonco-herent ONOFF modulation with fading) given by

forelsewhere

(105)

with

(106)

We have , regardless of the value of , if, in which case . Otherwise

(107)

obtained from (105) by observing that , conditioned on ,is central chi-square with two degrees of freedom with mean

for and with mean for .For with iid elements, we can recover known results. In

this case, (17a) and (17b) reduce to (25), which corresponds tothe replica analysis of the MMSE estimator obtained in [13] andsummarized in [10] in the context of support recovery in com-pressed sensing. When the iterative solution of the fixed-pointequation (25) is initialized by , then the it-eration converges to the solution of the so-called AMP-MMSEstate equation given in [10, Th. 6]. In brief, by this initializa-tion, the iterative solution converges always to the right-mostfixed point of the mapping function (see Figs. 57 and relateddiscussion). Instead, if the valid fixed point is chosen, i.e., thesolution which minimizes the free energy , then we obtainthe so-called replica MMSE solution of [10, Th. 8].


Next, we discuss the threshold for perfect support recon-struction in the noiseless case, i.e., in the limit of ,and . From Theorem 6, we already know that vanishing

cannot be achieved for any . We now showthat vanishes for large for all . This haspreviously been shown for both optimal nonlinear measure-ment schemes and for Gaussian iid sensing matrices in [15].Therefore, the conclusion about the asymptotic optimality ofGaussian iid sensing matrices found in [15] extends to sparselysampled free random matrices. We start by recalling the fol-lowing general result from [24]:Theorem 7: Let is a discrete-continuous mixed distribu-

tion, i.e., such that its distribution can be represented as

(108)

where is a discrete distribution and is an absolutely con-tinuous distribution, and . Then, for ,we have

(109)

We are interested in the behavior of the SNR of the decoupledchannel (87) resulting from the MAP-SBS estimator, given by

, as . In particular, for given sparsity ,we are interested in determining the range of sampling ratesfor which , implying that . Letand be as defined in Claim 1. Then, using Theorem 7, we canwrite

(110)

(111)

where, for the time being, we assume that grows un-bounded as . Using (111) into (17a) and (17b), forsufficiently large , we have

(112)

(113)

For the case of with iid elements, using (24), we obtain

(114)

and solving (113) with respect to , we obtain

(115)

In the case of Haar-distributed , using (37), we obtain

(116)

(117)

For , in those two cases, the solutions are strictly posi-tive and, consequently, the support recovery error rate vanishesas SNR grows without bound. In fact, as we show next, thisconclusion holds for the general class of sparsely sample freerandom matrices.The goal is to show that for , without

relying on a closed-form expression for the R-transform. Thisimplies that vanishes for large for all . As-suming that (113) holds, using the definition of the R-transformas function of the -transform given in [1, Eq. (2.75), Sec. 2.2.5]and the definition of -transform as given in [1, Sec. 2.2.2], wecan rewrite the asymptotic equality as

(118)

where satisfies

(119)

and denotes a random variable distributed as the limitingspectrum of .By eliminating and solving for in (118) and (119), we

obtain

(120)

It is immediate to see that (120) is strictly positive for any finite(ranging from the mean to the harmonic mean of ). In viewof Property (240) of the -transform

(121)

we conclude that (118) admits a unique positive and finite solu-tion if and only if , i.e., for . Hence,(120) yields for , as we wanted to show.We conclude this section by providing expressions for the

MMSE in the estimation of the BernoulliGaussian signal forhigh SNR. For iid , we have

(122)

(123)

while for Haar-distributed , we have

(124)

(125)

Notice that (123) coincides with the result derived in [15] andthat the high-SNR MMSE diverges for . Since deletingsamples cannot improve the performance of the optimal MMSEestimator, it diverges for all .


C. Replica Analysis of A Class of Estimators Via theLarge-Deviation Limit

The classical noisy compressed sensing problem seeks theestimation of the sparse vector from in (1) for known

. Then, can be estimated by componentwise thresholdingthe estimate of .A number of suboptimal low-complexity estimators in the

compressed sensing literature take on the form

(126)

for some weighting parameter and cost function.The replica decoupling principle can be used to study the

large-dimensional limit performance of such class of estimatorsby following the large-deviation recipe given in [18]. Briefly,the approach of [18] considers a sequence of mismatched PMEsindexed by a parameter , where the assumed a prioridensity for takes on the form

(127)

(assuming that the integral converges for sufficiently large ),and where the assumed transition density is given by

(128)

Under a number of mild technical assumptions (see [18] for de-tails), in (126) can be obtained as the limit of the PME

(129)

for . Furthermore, for and assuming the va-lidity of the replica analysis, a decoupled scalar channel modelin the limit of can be established such that the jointdistribution of converges to the joint distribution of

, where the form of the joint distribution ofis again given by (94) and where is a function of . The formof the fixed-point equations yielding and and of as a func-tion of depend on the specific estimator considered, i.e., onthe value of and on the cost function in (126). In partic-ular, following in the footsteps of [18] with a few minor varia-tions in order to adapt to our case,5 it is not difficult to show that

, where we define

(130)

5Details are omitted since they can be easily worked out from [18].

and that the fixed-point equations yielding and in the limitof are given by

(131a)(131b)(131c)

(131d)

where

(132)

When has iid elements, from (98a) and (98b), we find

(133a)

(133b)

which coincide with [18, Eqs. (30a) and (30b)], up to a differentnormalization and the fact that we consider complex circularlysymmetric instead of real random variables as in [18].

D. Thresholded LMMSE Estimator

A simple suboptimal estimator for is the LMMSE esti-mator, given by

(134)

with and defined in (3). It is immediate to verify that(134) can be expressed in the form (126) by letting .Although the asymptotic performance and the decoupled

channel model of LMMSE estimation can be obtained directlyfrom classical results in large random matrix theory both foriid and for Haar-distributed (see [1] and references therein),it is instructive to apply the replica large-deviation approachoutlined before. In this way, we can recover known resultsobtained rigorously by other means, thus lending support to thevalidity of the replica-based large-deviation approach.Particularizing (130) and (132) to the case , we

obtain

(135)

and

(136)

yielding

(137)

(138)


where we used the fact that . Replacing(136) and (138) into (131a)(131d), we obtain the fixed-pointequations for the LMMSE estimator. In the iid case, using(133a) and (133b), we obtain that and

(139)

which coincides with the well-known expression of the mul-tiuser efficiency of the LMMSE detector for an iid matrix withaspect ratio and elements with mean 0 and variance(see [1] and expression (56) evaluated for ).In the Haar-distributed case, using (37), we can solve explic-

itly for by eliminating in (131a) and (131c). After somemore complicated algebra than in the iid case, we arrive at thesolution

(140)

We also find that, as in the iid case, . Hence, is givenin closed form as

(141)

which coincides with the well-known form of the multiuser effi-ciency of the LMMSE detector for a CDMA system with obser-vation model , where is Haar-distributed,given by the solution of (57) in the case (or, equiv-alently, by the limit of (58) for ).In order to calculate the performance of the thresholded

LMMSE estimator, notice that the estimator output convergesin distribution to where, accordingto the decoupled channel model, , and

. Thresholding or is clearly equivalent.Hence, the support recovery error rate in this case takes on thesame form already derived for the MAP-SBS [see (105)(107)],for a different value of calculated via (131a)(131d).

E. Thresholded Lasso Estimator

We now follow an approach similar to that in Section IV-D inorder to analyze the Lasso estimator, which so far has only beenanalyzed for iid sensing matrices.The Lasso estimator, widely studied in the compressed

sensing literature [25], [26], comes directly in the form (126)for . In this case, the parameter must be optimizeddepending on the target performance. For example, in theclassical noisy compressed sensing problem, we are interestedin the value of that minimizes .Particularizing (130) and (132) to the case , we

obtain

(142)

where takes the positive part of its argument, and

(143)

where is the indicator function of the event inside thebrackets. Notice that (142) and (143) generalize the expressionsfound in [18] to the complex case. In this case, we have

(144)

where , , and is defined in (102).The derivation of (144) is not completely straightforward and itis provided in Appendix H.From (143), we have

(145)

Replacing (144) and (145) into (131a)(131d), we obtain thefixed-point equation for calculating the decoupled channel pa-rameters for the analysis of the Lasso estimator for givenparameter . In the iid case, using (133a) and (133b), we obtainthe same system of equations given in [18], up to a different nor-malization and the fact that here we consider complex signals.Furthermore, it is immediate to recognize that (133a) corre-sponds to the state evolution of the AMP with soft-thresholding(AMP-ST) as described in [10], where the scalar soft-thresh-olding function is given by (142) for an arbitrary thresholdingparameter . The large-dimensional analysis leading to thestate evolution (133a) is rigorously proved in [12] for the casewhere is iid Gaussian. Based on this fact, it is tempting to con-jecture that the analysis is valid for the general iid case (subjectto usual mild conditions on the matrix element distribution) andthat the replica analysis yields correct results also for the moregeneral class of matrices considered in this paper.In order to obtain an estimate of (support of ), a nat-

ural approach consists of selecting the nonzero componentsof . However, this method yields rather poor results in theBernoulliGaussian case and in other cases where the magni-tudes of the nonzero components of are not bounded awayfrom zero. Instead, in an iterative implementation of the Lassosolver (e.g., using the method in [27], or the AMP-ST), it ispossible to generate a noisy version of the Lasso estimatebefore the soft-thresholding step (see Section IV-F and [10]).This noisy Lasso estimate corresponds to the decoupled channelmodel with marginal distribution , withgiven by the fixed-point equation in the Lasso case. Hence,the support recovery error rate takes on the same form alreadyderived for the MAP-SBS [see (105)(107)], for a differentvalue of , calculated via (131a)(131d) for the Lasso case asexplained above.


Fig. 8. Support recovery error rate versus , for andfor different estimators, asymptotic results, and

finite-dimensional simulations. Solid thick line: MAP-SBS, asymptotic. Dottedline: Information-theoretic lower bound. Dot-dashed line: Thresholded Lasso,asymptotic. Dashed line: Thresholded LMMSE, asymptotic. Thin solid line:Conjectured AMP-MMSE, corresponding to the right-most fixed point of(17a) and (17b). Some finite-dimensional simulations are shown for dimension

for the thresholded LMMSE estimator (asterisk: Haar sensing matrix;triangle: DFT sensing matrix) and for the thresholded Lasso (lozenge: Haarsensing matrix; star: DFT sensing matrix).

F. Support Recovery Error Rate Examples

In order to illustrate the above results and compare the be-havior of different support estimators, we show some numericalexamples and compare the theoretical asymptotic results withfinite-dimensional simulations. Figs. 8 and 9 show the supportrecovery error rate versus the sampling rate fora Haar-distributed sensing matrix and a GaussianBernoullisource signal with and equal to 20 and50 dB, respectively.A few remarks are in order:1) The MAP-SBS asymptotic distortion is obtained bychoosing the fixed-point solution of (17a) and (17b) thatminimizes the free energy , as discussed in Section III-D.Instead, if we choose only the right-most fixed point, weobtain the solution of the conjectured state evolutionequation corresponding to the AMP-MMSE applied toHaar-distributed sensing matrices. As previously re-marked, it is known that such state evolution equation isexact in the case of iid sensing matrices.

2) The information-theoretic lower bound is obtained bytaking the minimum of all the upper bounds on devel-oped in Theorems 4 and 5, and using it in (14).

3) We show the results of finite-dimensional simulations fordimension for the thresholded LMMSE andthresholded Lasso estimators. We considered both randomunitary (Haar distributed) and the case of a fixed de-terministic , where is the -dimensional uni-tary DFT matrix with elements .Interestingly, the simulations show that random unitary

Fig. 9. Support recovery error rate versus , for andfor different estimators, asymptotic results and finite-

dimensional simulations. Solid thick line: MAP-SBS, asymptotic. Dotted line:Information-theoretic lower bound. Dot-dashed line: thresholded Lasso, asymp-totic. Dashed line: Thresholded LMMSE, asymptotic. Thin solid line: Conjec-tured AMP-MMSE, corresponding to the right-most fixed point of (17a) and(17b). Some finite-dimensional simulations are shown for dimensionfor the thresholded LMMSE estimator (asterisk: Haar sensing matrix; triangle:DFT sensing matrix) and for the thresholded Lasso (lozenge: Haar sensing ma-trix; star: DFT sensing matrix).

and deterministic DFT yields essentially the same perfor-mance (up to Monte Carlo simulation fluctuations). Thiscorroborates our conjecture that the asymptotic analysis ofHaar-distributed carries over to the case of a DFT ma-trix. The case of DFT matrices is particularly relevant forapplications, since in many communication and signal pro-cessing problems, signals are sparse in the time (respec-tively, frequency) domain and are randomly sampled in thedual domain, so that a random selection of the rows of aDFT matrix arises as a sensing matrix naturally matchedto the problem.

4) As already noticed in several works, the gap between theoptimal MAP-SBS estimator and the suboptimal low-com-plexity estimators grows for high SNR (compare Figs. 8and 9). In contrast, the thresholded LMMSE estimatoryields poor performance for all , and this is quiteinsensitive to SNR.

5) In order to solve the complex Lasso, we used the iterativemethod of [27]. This scheme has slightly lower complexitythan AMP-ST, and provably converges to the Lasso solu-tion. By comparing the component-wise thresholding stepin [27] and the symbol-by-symbol estimator forthe decoupled channel model given in (142), it is naturalto identify the noisy Lasso solution with the vector

(146)

where is the solution of the iterative algo-rithm of [27] after convergence, is the matrixobtained by taking the nonzero rows of , and

where is the


th column of . The support recovery error rate shown inFigs. 8 and 9 for the finite-dimensional simulation of thethresholded Lasso is obtained by applying the thresholddetector given in (105), for calculated via the asymptoticfixed-point equations (131a)(131d), to the componentsof given in (146). The asymptotic analysis and thefinite-dimensional simulation were computed for the samevalue of the parameter , which must be chosen for eachcombination of system parameters and . Severalheuristic methods for the choice of are proposed in theliterature. Following [28], we used(the optimization of for the asymptotic case is an inter-esting topic for further investigation.)

V. CONCLUSION

In the standard compressed sensingmodel, the sensing matrixis such that is diagonal with independent com-

ponents and has iid coefficients. In addition to this model,we allow the square matrix to be Haar-distributed (uniformlydistributed among all unitary matrices) or, more generally, to befree from any Hermitian deterministic matrix.Motivated by applications, in this paper, we have carried out

a large-size analysis of1) the mutual information between the noisy observations andthe BernoulliGaussian input (conditioned on the sensingmatrix);

2) the mutual information between the noisy observations andthe Gaussian input prior to being subject to random hole-punching.

We have obtained asymptotic formulas using fundamentally dif-ferent approaches for both mutual informations: the first fol-lowing a replica-method analysis whose scope we enlarge to en-compass the desired class of random matrices, while the secondinvokes results from freeness and the asymptotic spectral distri-bution of random matrices.Depending on the case, the mutual informations are ex-

pressed either through the mutual information between a scalarBernoulliGaussian random variable and its Gaussian-contam-inated version, or explicitly, through the solution of couplednonlinear equations. We have also studied how to chooseamong the solutions of those equations.Our upper and lower bounds on the mutual informations do

not rely on the replica method. Yet, they turn out to give excel-lent agreement with the replica analysis. Through the analysis ofthe bounds, we also provide a simple converse which shows thatthe asymptotic distortion is bounded away from zero regardlessof SNR for . For , Wu and Verd [15] showed thatGaussian iid sensing matrices are asymptotically as effective forcompressed sensing as the best nonlinear measurement (or en-coder). Here, we have been able to extend that conclusion to theclass of sparsely sampled free random matrices.We have analyzed several decision rules such as the optimum

symbol-by-symbol rule, the Lasso, and the LMMSE estimator,followed by thresholding for support recovery. Those analysesfollow the decoupling principle, originally introduced in [13] foriid matrices. Specializing these new results we recover the iidformulas found in [10], [13], and [18], with the exception of the

ML detector analyzed in [10], which is tailored to the case whenthe number of nonzero coefficients is known at the estimator,while in our analysis that number is binomially distributed.The important case where is a deterministic DFT matrix

remains open. However, we have provided intuition and simu-lation evidence to buttress our conjecture that its solution in factcoincides with the case where is Haar distributed.

APPENDIX APROOF OF THEOREM 1

Let . For any , such that, and function , (12) and the data

processing inequality yield

(147)(148)

Supremizing over and in view of the fact that is a mono-tonically nonincreasing function, the result follows.It is worth emphasizing the totally elementary nature of the

proof of Theorem 1, and in particular the fact that it does notinvolve any type of operational characterization of information-theoretic fundamental coding limits. A different approach basedon those limits and Fanos inequality is taken in [10] to showLemma 5 therein.

APPENDIX BPROOF OF CLAIM 1

We let with and an iid Gaussianvector with andBernoulli- , with probability mass function . Notationsare as in Section IV-A. In particular, , as in(1). Consider the assumed conditional probability density

(149)

for some . We also consider an assumed iid prior den-sity on , denoted by for simplicity of notation, and let

denote the BernoulliGaussian density of . Removingthe conditioning with respect to , we obtain

(150)

We wish to calculate the mutual information rate defined in(8), which can be expressed as

(151)

(152)

(153)


where we define , and rec-ognize that (150) can be interpreted as the partition function(from which the notation ) of a statistical mechan-ical system with quenched disorder parameters ,state and unnormalized Boltzmann distribution

.6The condition and correspond to the case

where the assumed prior and noise variance in the observationmodel are matched, i.e., they coincide with the true priors andnoise variance. However, it is useful to consider the derivationfor general and , since this same derivation will apply tothe general class of mismatched PMEs defined in Section IV-A).The quantity

(154)

is the system per-component free energy of the underlying phys-ical system. In the following, we compute using the ReplicaMethod of statistical physics, under the so-called Replica Sym-metry (RS) assumption [13], [17], [29][31]. Summarizing, themethod comprises the following steps: since computing the ex-pectation of the log in (154) is usually complicated, we use theidentity

(155)

for . Then, exchanging limits, we can write

(156)

Finally, we evaluate the inner limit in (156) for positiveinteger, such that can be seen as the partitionfunction of a -fold Cartesian product system (i.e., parallelreplicas of the original system), with state vectors ,and the same quenched parameters . In particular, wecan write

(157)

The next step consists of calculating

(158)6In this case, plays the role

of the systems Hamiltonian, and is the inverse temperature [13].

Standard Gaussian integration (by completing the squares)yields

(159)

where , as defined in (3), and where is arank- matrix defined as follows: let for

, and let . Then

(160)

where denotes an all-ones column vector of appropriate di-mension. Next, we need to average with respect to , i.e.,with respect to . For this purpose, we apply the generalizedHarishChandraItzyksonZuber integral [32], [33] and, fol-lowing the approach of Guionnet and Mada [34], strengthenedby Tanaka [35], we can write

(161)

(162)

(163)

where denotes the R-transform of and denotesthe th eigenvalue of .Our goal now is to evaluate the limit in (163). In order to

proceed, we make the common RS assumption and define theempirical correlations

(164)

of vectors for . Noticing that the limit(163) is given as the limit of a normalized log-sum, Varadhanslemma yields that this limit is given by the dominant configura-tion of the vectors , defined in terms of their empir-ical correlation matrix . The RS assumption pos-tulates that this dominant configuration satisfies the followingsymmetric form:

(165)


In Appendix F, we show that, for in the form (165), the eigen-values of are given by

(166)

(167)(168)

Therefore, we define

(169)

(170)

The argument of the logarithm in (163) can be interpretedas an expectation with respect to , with joint pdf

. By the law of large numbers, this measuresatisfies a concentration property with respect to the empir-ical correlations (164). Hence, we can invoke Cramrs largedeviation theorem [36] as follows. Since is a function of

, the conditional pdf of given is justa multidimensional delta function (i.e., a product of deltafunctions); hence, we can write

(171)

(172)

(173)

(174)

where (174) holds in the sense that, when we consider the quan-tity (173) inside the logarithm in the limit (163), it can be re-placed by (174).The rate function of the measure defined

as

(175)

(176)

is given by the LegendreFenchel transform of the log-Mo-ment Generating Function (log-MGF) of the randomvector , where and

, are iid Gaussian RVs, and are

independent variables with and . The MGFof is given by

(177)

and the rate function is given by

(178)

Eventually, using this into (174) and the resulting expression inthe limit (163) and applying Varadhans lemma, we arrive at thesaddle-point condition

(179)

Now we focus on the calculation of the MGF. Under the RSassumption, the supremum in (179) is achieved for in theform

(180)

where are parameters. Using the RS form for , weobtain

(181)

We use the complex circularly symmetric version of the scalarHubbardStratonovich transform [37], [38]:

(182)

for and . Choosing , we obtain

(183)

Using (183) into (181), after some straightforward algebra, wefind

(184)


Notice that has a circularly symmetric distribution; there-fore, and are identically distributed. Hence, wecan write

(185)

Since (185) depends only on , without loss of generality, weredefine the parameter to be in . Also, notice from (185)that .Following the replica derivation steps outlined at the be-

ginning of this section, we have to determine the saddle-pointand achieving the extremal condition in (179),

for general , and finally replace the result in (161), differen-tiate with respect to and let . Since the function in(179) is differentiable and admits a minimum and a maximum,following the result of Appendix G, we have that determiningthe saddle-point , replacing it in (161), dif-ferentiating the resulting expression with respect to andletting yields the same result of replacing in (179)the saddle-point for , denoted by and

, differentiating the result with respect to andletting , where now are constants independentof .Differentiating (179) with respect to , we obtain

(186)

Since we evaluate the saddle-point conditions at , andsince the denominator in (186) is , which is equal to1 at , we can just disregard the denominator and focuson the numerator in the following. Using the expression (170)for , with eigenvalues and given by (166), andnoticing that the RS conditions (165) and (180) yield

(187)

we have that the whole exponent depends only on the real partof . Therefore, we redefine to be a real parameter and dif-ferentiate with respect to , , , and , and impose that thepartial derivatives are equal to zero. We find the conditions

(188)

(189)

(190)

(191)

Evaluating these conditions for and noticing that, asvanishes, , we find

(192)

(193)

(194)

(195)(196)(197)

where and denotes the first derivativeof .The conditions for and in terms of , andare obtained from (186), recalling that, by definition,

, and . In order to obtain moreuseful expressions for these parameters, we use (197) and (196)in (185) and write

(198)

(199)

(200)

where we define and .Focusing on the numerator in (186) and following steps sim-

ilar to the derivation of (198), we obtain the following expres-sions for the correlation coefficients :1) For , we have

(201)


2) For , we introduce a random variable(same distribution as any of the s) and independent of

. Then, we can write

(202)

3) For , we have

(203)

4) For , we have

(204)

Finally, we define a single-letter joint probability distribu-tion and restate the expectations appearing in (202)(204) interms of this new single-letter model. Let denote theBernoulliGaussian density of , induced by and by

, and let

(205)

denote the transition probability density of the complex (scalar)circularly symmetric additive white Gaussian noise (AWGN)channel

(206)

with . Also, define the conditional complex cir-cularly symmetric Gaussian pdf

(207)

and, using Bayes rule, consider the a posteriori probability dis-tribution

(208)

(209)

The joint single-letter probability distribution of interest for thevariables , and is given by

(210)

This explains the decoupled channel single-letter probabilitymodel (94).Now, we can define the conditional mean of given as

(211)

(212)

The corresponding conditional second moment is given by

(213)

(214)

At this point, it is easy to identify the terms and write the ex-pressions (202)(204) in terms of expectations with respect tothe single-letter joint probability measure defined in (210). Wehave

(215)(216)(217)

(218)


In order to obtain the desired fixed-point equations for thesaddle-point that defines the result in (179), we notice that

(219)

and that

(220)

(221)

Using (188), (189), the equality , and recallingthat and , we arrive at the system offixed-point equations (93a)(93d). In the matched case, where

and , we immediately obtain that ,and therefore, , and the fixed-point equations reduce to(17a) and (17b) in Claim 1.Using the values solution of (93a)(93d) into (179), using the

trace expression (187) and finally putting everything togetherinto (161) and taking the derivative w.r.t. evaluated at ,we eventually obtain the free energy in (154) as given by

(222)

(223)

(224)

(225)

Examining each term separately, we have

(226)

where we have used the definition of and in (93a) and (93b),respectively, and the relations (219) and (220). For the traceterm, recalling that , we have

(227)

Finally, for the log-MGF term, we use (198) and performing theexpectation with respect to (independent and identi-cally distributed as ) first, we obtain

(228)

Hence, after some algebra, we have

(229)

where we use (207) and define

(230)

(231)

It is understood that if (93a)(93d) have multiple solutions, thenthe solution that minimizes the free energy should be chosen.We conclude by showing that can be written in the form

(97). Putting together (227) and the last term of (229) and re-calling that and that , we have

(232)Adding and subtracting and using (219) and the definitionof [see (93b)], we have

(233)

Recalling that , we obtain

(234)

Recalling that , we get

(235)

Finally, using and , we arrive at

(236)

Next, we use (236), the remaining terms of (226), (229), and(222), together with the saddle-point (93a)(93d), to obtainin the form (97).For the case and , noticing that

and , with and given by (17a) and (17b), the freeenergy takes on the form

(237)where and where we used the fact thatwhen and , then

, so that

(238)

Using (237) in the mutual information expression (153), we ob-tain (16) in Claim 1.


APPENDIX CPROOF OF THEOREM 2

We start by recalling some transforms in random matrixtheory and some related results from [1].Definition 1: The -transform of a nonnegative random vari-

able is

(239)

with .Note that

(240)

with the lower bound asymptotically tight as .Definition 2: The Shannon transform of a nonnegative

random variable is defined as

(241)

with .Assuming that the logarithm in (241) is natural, and

Shannon transforms are related through

(242)

Also, it is useful to recall here the definition of the S-transformof free probability [1], which is used in some of the proofs thatfollow.Definition 3: The S-transform of a nonnegative random vari-

able is defined as

(243)

where denotes the inverse function of the -transform.It is common to denote the -transform, the Shannon trans-

form, and the S-transform of the spectral distribution of a se-quence of nonnegative-definite random matrices , for

, by , , and , respectively. In this case,the lower bound in (240) corresponds to the limiting fraction ofzero eigenvalues of .Theorem 8: Let and be nonnegative asymptotically free

random matrices; then, for

(244)

In addition, the following implicit relation is also useful:

(245)

The next two results are instrumental to the proofof Theorem 2.Theorem 9: Let and be nonnegative asymptotically free

random matrices. For , let be the solution of thesystem of equations:

(246)(247)

(248)

Then, the -transform of is given by

(249)

Proof: Letting for simplicity of notationand using (245), we have

(250)

where

(251)

which is equivalent, using Definition 3, to

(252)

Letting

from (250) and (252), Theorem 9 follows immediately.

As a consequence of Theorem 9, we have the following.Theorem 10: Let and be nonnegative asymptotically

free random matrices. The Shannon transform of is givenby

(253)

where and are the solutions of the system of (246)(248),which depend on .

Proof: The proof follows an idea originated in [20] to writethe Shannon transform when the -transform is given as the so-lution of a fixed-point equation: for any differentiable function, the definition of the Shannon transform of an arbitrary non-negative random variable leads to

(254)


Since both sides of (253) are equal to zero at , it is suffi-cient to show that the derivatives with respect to of both sidesof (253) coincide. Letting and denote random variables dis-tributed according to the spectral distribution of and , re-spectively, differentiating w.r.t. the difference of the right sideminus the left side of (253) yields

(255)

(256)

(257)

(258)

where used (242) to write the left side of (255); the right sideof (255) follows from the definition of the -transform; (257)follows from Theorem 9 for solutions of (246) (248);and (258) follows again from the equality in (248).

Theorem 2 now follows as an application of Theorem 10 byidentifying the terms. We write

(259)

(260)

(261)

(262)

(263)(264)

where are solutions of (246)(248) after replacing by, by and by . The final expressions (19) and (20)

follow by noticing that the spectral distribution of has onlytwo mass points at zero and at one, with probabilities and, respectively.

APPENDIX DPROOF OF THEOREMS 4 AND 5

Notations are as in Section I-A, following the observationmodel (1). In particular, we let and ,and .

Proof of Bound (51):

We have

(265)

where the inequality follows by the fact that, conditionally on, the differential entropy of for assigned

covariance

(266)

is maximized by a Gaussian complex circularly symmetric dis-tribution . Recalling the defini-tion of in (3), we have

(267)

from the definition of Shannon transform. Hence, (51) follows.

Proof of Bound (52):

This bound can be regarded as a matched filter bound onthe vector channel with input and output . We canwrite

(268)

(269)

(270)

(271)

(272)

(273)

where (271) follows from the fact that is iid, in (272) wedefine to be the th columns of and in (273) we define

, conditionally on .Dividing both sides by , letting defining the iid variables

and taking the limit, we obtain

(274)

(275)

(276)

where , where by definition, and where in (275) we used Jensens in-

equality and the fact that the mutual information, for any distribution of with bounded second moment, is

concave in [39].


Proof of Bound (53):Let where denotes the th column of. Then, we have

(277)

(278)

(279)

(280)

where (278) follows by the chain rule and by subtracting theconditioning term, preserving the mutual information, (279)holds for any linear projection defined by the vector , functionof , and (280) follows by noticing that the arguments ofthe mutual information do not depend any longer on .Next, we choose to be the LMMSE receiver for user ,

of the formally equivalent CDMA system

...(281)

In particular, using , we obtain

(282)

We indicate by

(283)the corresponding multiuser efficiency of the LMMSE detectorfor user . Noticing that, in the limit of , the residualnoise plus interference at the output of the LMMSE detector ismarginally Gaussian (we omit the explicit proof of this well-known fact, which holds under the assumptions of our model)[1]; letting , and denoting by the limitingmultiuser efficiency for , from (280), we arrive at (53)by dividing by and taking the limit.

APPENDIX EPROOF OF THE DECOUPLING PRINCIPLE

Notations and definitions are as in Section IV andAppendix B. We let denote the th components ofthe random vectors , obeying the joint -variate condi-tional distribution (84) for given , withgiven by (82). We are interested in showing that the asymp-totic joint marginal distribution of , for somegeneric index , converges to the joint distribution of the triple

given by (86) in Section IV, independent of .

To this purpose, we follow in the footsteps of [13] and con-sider the calculation of the joint moments for arbitraryintegers . Since the moments are uniformly bounded,the th joint marginal distribution is thus uniquely determineddue to Carlemans Theorem [40, p. 227]. The desired result willfollow upon showing that the moments converge to limits in-dependent of . Furthermore, as we will see, the form of theasymptotic moments yields explicitly the joint distribution of

given in (86).In order to proceed, we define the replicated model given by

the distribution of , for given , as

(284)All expectations in the following derivations are with respect tothe joint measure (284). For a function , wedefine

(285)

By [13, Lemma 1], if isand does not depend on , then

(286)

In our case, we let forgiven , and some replica index . Bythe symmetry with respect to the replica index and the indicesof the vector components, for any , we can write

(287)

(288)

Using the procedure outlined before, we need to calculate

(289)

As usual in replica derivations, we switch limits and calculatefirst

(290)

In passing, we notice that, so that the calculation of (289) is closely related

to the calculation of the free energy by the replica method inAppendix B, i.e., to the evaluation of the limit

(291)


Operating along the same steps leading to (163) in the deriva-tion of the free energy (see Appendix B), we arrive at

(292)

(293)

(294)

(295)

We notice that the second exponential term in (295) isidentical to what appears in the computation of (291) and,following the steps in Appendix B, yields an exponential term

given in (170), function of the empiricalcorrelations of the vectors as defined in (164), andcollected in the empirical correlation matrix whose form,under the RS assumption, is given in (165).Invoking the large deviation theorem, we can write

(296)

(297)

(298)

(299)

where the approximation step holds in the sense that whengets large we can replace the argument of the logarithm in (294)with the quantity in (299).Using Cramrs theorem, we have that the rate function

for the measure

(300)

(301)

is given by the LegendreFenchel transform

(302)

where the relevant MGF for the measure (301) is

(303)

where we define and, with (Bernoulli- ),

(marginal of the assumed prior distribution ) for, and for all

.Plugging (303) into (299) and the resulting expression in the

limit of appearing in (294) and applying Varadhanslemma, we arrive at the saddle-point condition

(304)

Following the replica derivation steps outlined at the beginningof this appendix, we have to determine the saddle-pointand achieving the extremal condition in (304), for gen-eral , and finally replace the result in (290), differentiate withrespect to and evaluate the result for . Using again theresult of Appendix G, since the function in (290) is differen-tiable and admits a minimum and a maximum, we can replacethe saddle-point of (304) for , denoted by and ,and then differentiate the result with respect to and let .Noticing that for , the saddle-point condition (304) co-incides with the saddle-point condition (179), we have thatand coincide with what derived in Appendix B for the freeenergy (291). In particular, under the RS assumption, these pa-rameters are given by the fixed-point equations (93a)(93d).Furthermore, since (304) and therefore the whole limit (290)depends on only through the log-MGF term, using (286) and(287), we arrive at

(305)

The denominator of (305) is identical to the MGFdefined in (177) and, as shown in Appendix B we have that


. As for the numerator, we follow stepssimilar to the derivation of (198) and obtain

(306)

(307)

(308)

(309)

where we define for , for, and where the parameters and are given by

the fixed-point equations (93a)(93d).Finally, we define a single-letter joint probability distribution

and restate the expectations appearing in (309) in terms of thisnew single-letter model. We let

(310)

denote the transition probability density of the complex circu-larly symmetric AWGN channel with Gaussian circularly sym-metric fading not known at the receiver

(311)

with and . Also, we define theconditional pdf

(312)

and, using the Bayes rule, consider the a posteriori probabilitydistribution

(313)

(314)

The joint single-letter probability distribution of interest for thevariables , and is given by

(315)

With these definitions, it is immediate to identify the momentexpression (309) as the joint moment of the single-letter proba-bility distribution (315), by writing

(316)

(317)

(318)

(319)

(320)

where the expectation in (320) is with respect to the probabilitydistribution (315).Summarizing, we have that as far as the joint probability dis-

tribution of each component of in (1) and the correspondingcomponent of the PME (matched or mismatched), the systemdecouples asymptotically into a bank of parallel AWGN chan-nels of the form (311), with symbol-by-symbol PME given by

(321)

for distributed as in (315), where the parameters andare given by (93a)(93d).

APPENDIX FEIGENVALUES OF THE MATRIX

The eigenvalues of are readily computed from (160). No-tice that the matrix has eigenvalues

(322)

corresponding to the (normalized) eigenvector , and, corresponding to eigenvectors

forming an orthonormal basis of the orthogonal complement ofin . It follows that

(323)

where

(324)


The nonzero eigenvalues of are the same as those of theflipped matrix with

Under the RS assumption, the empirical correlation matrix ofthe vectors takes on the form

(325)

Using the orthonormality properties of the columns of , wehave

(326)

Finally, we have that under the RS assumption and in the limitof large , the eigenvalues of are given by

(327)

(328)

Using the fact that , we have that

(329)

Therefore, the eigenvalues (327) can be expressed in terms ofthe correlations and in the form (166).

APPENDIX GPROPERTY OF STATIONARY POINTS OF

MULTIVARIATE FUNCTIONS

Let be a differentiable multivariate function with, and . Let with ,

with denote the th and th component of and, respectively. We are interested in evaluating

(330)Let

(331)

Then

(332)

(333)

(334)

with

(335)

(336)

(337)

Under the assumption that the supremum and the infimum areachieved by , by Fermats theorem, every local ex-tremum of a differentiable function is a stationary point; hence,by their definition and are such that, for all ,

(338)(339)

Hence, (334) becomes

(340)

Consequently

(341)

fromwhich it follows that we are allowed to compute the saddle-point (and hence the fixed-point equation) for , then re-place the result in the multivariate function, and differentiate theresult w.r.t. to , and then let .

APPENDIX HUSEFUL FORMULAS

This appendix is devoted to provide methods and explicit for-mulas to evaluate the quantities appearing in the main results.It is worthwhile to notice that the numerical evaluation of thefixed-point equations and the corresponding free energy is nottrivial from a numerical stability viewpoint, especially for largeSNR and small sparsity and sampling rate . Therefore,some care must be exercised in order to minimize brute-forcenumerical integration.We start by considering the calculation of ,

for BernoulliGaussian, and , whichis instrumental in evaluating (16) and the bounds (52) and (53),for suitable choices of the parameter . We can write

(342)


where . The expectation in (342) can be calcu-lated by integration in polar coordinates and, after some algebra,takes on the form

(343)

Finally, both the above integrals can be efficiently and accu-rately evaluated by using GaussLaguerre quadratures.Similarly, the MMSE term appearing in (17b) can be calcu-

lated as follows. Letting , we have

where

and where . Notice that for , the observa-tion model becomes jointly Gaussian, and we obtain the usualGaussian MMSE estimator . The resultingMMSE in the general BernoulliGaussian case is given by

(344)

(345)

Performing integration in polar coordinates and after some al-gebra, we obtain

(346)

where is known as the HurwitzLerch zeta function[41], defined as

that can also be efficiently evaluated by GaussLaguerrequadratures. It is immediate to check that for (jointlyGaussian case), we have

as expected.In order to evaluate in (16), it is useful to have the integral

of the R-transform in closed form. For the case ofwith iid elements, using (24), we find, trivially:

(347)

For the case of Haar-distributed , using (37), we find

(348)

where .We conclude by providing the derivation of the closed-form

expression of for the Lasso estimator, givenin (144). We have

(349)

Recalling the expression of in (142), we have

(350)

In order to solve the integrals in (350), we use

(351)

(352)

(353)

By applying the above integrals in (350) and after some manip-ulation, we obtain

(354)

with and .Next, we calculate the expectation as follows:

(355)


We notice that given are jointly Gaussian, withmean zero and covariance matrix

Then, given is Gaussian with mean

and variance

Using iterated expectation, we can calculate the expectation in(355) as

(356)

Using again the integrals (351)(353), we obtain

(357)

Finally, replacing all terms in (349), after some simplifications,we obtain (144).

REFERENCES[1] A. M. Tulino and S. Verd, Random matrix theory and wireless com-

munications, Found. Trends Commun. Inf. Theory, vol. 1, no. 1, pp.1184, 2004.

[2] S. Aeron, V. Saligrama, and M. Zhao, Information theoretic boundsfor compressed sensing, IEEE Trans. Inf. Theory, vol. 56, no. 10, pp.51115130, Oct. 2010.

[3] M. Akcakaya and V. Tarokh, Shannon-theoretic limits on noisycompressive sampling, IEEE Trans. Inf. Theory, vol. 56, no. 1, pp.492504, Jan. 2009.

[4] A. K. Fletcher, S. Rangan, and V. K. Goyal, Necessary and sufficientconditions for sparsity pattern recovery, IEEE Trans. Inf. Theory, vol.55, no. 12, pp. 57585772, Dec. 2009.

[5] K. R. Rad, Nearly sharp sufficient conditions on exact sparsity patternrecovery, IEEE Trans. Inf. Theory, vol. 57, no. 7, pp. 46724679, Jul.2011.

[6] M. J. Wainwright, Information theoretic limitations on sparsity re-covery in the high-dimensional and noisy setting, IEEE Trans. Inf.Theory, vol. 55, no. 12, pp. 57285741, Dec. 2009.

[7] W. Wang, M. J. Wainwright, and K. Ramchandran, Information-the-oretic limits on sparse signal recovery: Dense versus sparse measure-ment matrices, IEEE Trans. Inf. Theory, vol. 56, no. 6, pp. 29672979,Jun. 2010.

[8] G. Reeves, Sparsity pattern recovery in compressed sensing, Ph.D.dissertation, Dept. Electr. Eng. Comput. Sci., Univ. California,Berkeley, CA, USA, 2011.

[9] G. Reeves and M. Gastpar, Fundamental Tradeoffs for Sparsity Pat-tern Recovery, Jun. 2010 [Online]. Available: Arxiv 1006.3128v1

[10] G. Reeves and M. Gastpar, The sampling rate-distortion tradeoff forsparsity pattern recovery in compressed sensing, IEEE Trans. Inf.Theory, vol. 58, no. 5, pp. 30653092, May 2012.

[11] G. Reeves and M. Gastpar, Approximate Sparsity Pattern Recovery:Information-Theoretic Lower Bounds, Feb. 2010 [Online]. Available:arXiv:1002.4458v1 [cs.IT]

[12] M. Bayati and A. Montanari, The dynamics of message passing ondense graphs, with applications to compressed sensing, IEEE Trans.Inf. Theory, vol. 57, no. 2, pp. 764785, Feb. 2011.

[13] D. Guo and S. Verd, Randomly spread CDMA: Asymptotics viastatistical physics, IEEE Trans. Inf. Theory, vol. 51, no. 6, pp.19832010, Jun. 2005.

[14] D. L. Donoho, Precise optimality in compressed sensing: Rigoroustheory and ultra fast algorithms, presented at the Workshop Inf.Theory Appl., La Jolla, CA, USA, 2011.

[15] Y. Wu and S. Verd, Optimal phase transitions in compressedsensing, IEEE Trans. Inf. Theory, vol. 58, no. 10, pp. 62416263,Oct. 2012.

[16] A. M. Tulino, G. Caire, S. Shamai, and S. Verd, Capacity of channelswith frequency-selective and time-selective fading, IEEE Trans. Inf.Theory, vol. 56, no. 3, pp. 11871215, Mar. 2010.

[17] T. Tanaka, A statistical-mechanics approach to large-system analysisof CDMA multiuser detectors, IEEE Trans. Inf. Theory, vol. 48, no.11, pp. 28882910, Nov. 2002.

[18] S. Rangan, A. K. Fletcher, and V. K. Goyal, Asymptotic analysis ofMAP estimation via the replica method and applications to compressedsensing, IEEE Trans. Inf. Theory, vol. 58, no. 3, pp. 19021923, Mar.2012.

[19] D. Guo, D. Baron, and S. Shamai (Shitz), A single-letter character-ization of optimal noisy compressed sensing, presented at the 47thAnnu. Aller. Conf. Commun., Control, Comput., Monticello, IL, USA,Sep./Oct. 302, 2009.

[20] S. Shamai and S. Verd, The effect of frequency-flat fading on thespectral efficiency of CDMA, IEEE Trans. Inf. Theory, vol. 47, no. 4,pp. 13021327, May 2001.

[21] Y. Wu and S. Verd, Rnyi information dimension: Fundamentallimits of almost lossless analog compression, IEEE Trans. Inf.Theory, vol. 56, no. 8, pp. 37213747, Aug. 2010.

[22] R. Tibshirani, Regression shrinkage and selection via the Lasso, J.Roy. Statist. Soc. Ser. B (Methodological), vol. 58, no. 1, pp. 267288,1996.

[23] D. L. Donoho, A. Maleki, and A. Montanari, Message passing algo-rithms for compressed sensing, Proc. Nat. Acad. Sci., vol. 106, no. 45,pp. 1891418915, 2009.

[24] Y. Wu and S. Verd, MMSE dimension, IEEE Trans. Inf. Theory,vol. 57, no. 8, pp. 48574879, Aug. 2011.

[25] B. Wohlberg, Noise sensitivity of sparse signal representations: Re-construction error bounds for the inverse problem, IEEE Trans. SignalProcess., vol. 51, no. 12, pp. 30533060, Dec. 2003.

[26] D. L. Donoho, M. Elad, and V. N. Temlyakov, Stable recovery ofsparse overcomplete representations in the presence of noise, IEEETrans. Inf. Theory, vol. 52, no. 1, pp. 618, Jan. 2006.

[27] G. N. Lilis, D. Angelosante, and G. B. Giannakis, Sound Field Repro-duction using the Lasso, IEEE Trans. Audio, Speech Lang. Process.,vol. 18, no. 8, pp. 19021912, Nov. 2010.

[28] S. Sardy, A. Bruce, and P. Tseng, Block coordinate relaxationmethodsfor nonparametric wavelet denoising, J. Comput. Graph. Statist., vol.9, pp. 361379, 2000.

[29] D. Guo and T. Tanaka, Generic multiuser detection and statisticalphysics, in Advances in Multiuser Detection, M. L. Honig, Ed. NewYork, NY, USA: Wiley-IEEE Press, 2009.

[30] M. Mezard and A. Montanari, Information, Physics, and Computa-tion. Oxford, U.K.: Oxford Univ. Press, 2009.

[31] H. Nishimori, Statistical Physics of Spin Glasses and Information Pro-cessing: An Introduction. Oxford, U.K.: Oxford Univ. Press, 2001.

[32] Harish-Chandra, Differential operators on a semi-simple Lie al-gebra, Amer. J. Math., vol. 79, pp. 87120, 1957.


[33] C. Itzykson and J. B. Zuber, Planar approximation 2, J. Math. Phys.,vol. 21, pp. 411421, 1980.

[34] A. Guionnet and M. Mada, A Fourier view on the R-transform andrelated asymptotics of spherical integrals, J. Funct. Anal., vol. 222,pp. 435490, 2005.

[35] T. Tanaka, Asymptoti

support recovery with sparsely sampled free random matrices

Documents