the horseshoe estimator for sparse signals

19
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010

Upload: maitland

Post on 25-Feb-2016

53 views

Category:

Documents


0 download

DESCRIPTION

The horseshoe estimator for sparse signals. CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika ( 2010) Presented by Eric Wang 10/14/2010. Overview. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The horseshoe estimator for sparse signals

The horseshoe estimator for sparse signals

CARLOS M. CARVALHONICHOLAS G. POLSON

JAMES G. SCOTT

Biometrika (2010)

Presented by Eric Wang10/14/2010

Page 2: The horseshoe estimator for sparse signals

Overview• This paper proposes a highly analytically tractable horseshoe

estimator that is more robust and adaptive to different sparsity patterns than existing approaches.

• Two theorems are proved characterizing the proposed estimator’s tail robustness and demonstrating super-efficient rate of convergence to the correct estimate of the sampling density in sparse situation.

• The proposed estimator’s performance is demonstrated using both real and simulated data. The authors show its answer correspond quite closely to those obtained by Bayesian model averaging.

Page 3: The horseshoe estimator for sparse signals

• Consider a p-dimensional vector where is sparse, the authors propose the following model for estimation and prediction:

where is a standard half-Cauchy distribution with mean 0 and scale parameter a.

• The name horseshoe prior arises from the observation that, for fixed values

where and is the amount of shrinkage toward zero, a posteriori. has a horseshoe shaped prior .

The horseshoe estimator

Page 4: The horseshoe estimator for sparse signals

The horseshoe estimator

• The meaning of is as follows: yields virtually no shrinkage, and describes signals while yields near total shrinkage and (hopefully) describes noise.

•At right is the prior on the shrinkage coefficient .

Page 5: The horseshoe estimator for sparse signals

The horseshoe density function• An analytic density function lacks an analytic form, but very

tight bounds are available:

Theorem 1. The univariate horseshoe density satisfies the following: (a) (b) For

where

• Alternatively, it is possible to integrate over yielding

though the dependence among causes more issues. Therefore the authors do not take this approach.

Page 6: The horseshoe estimator for sparse signals

Horseshoe estimator for sparse signals

Page 7: The horseshoe estimator for sparse signals

Review of similar methods• Scott & Berger (2006) studied the discrete mixture

where

• Tipping (2001) studied the Student-t prior is defined by an inverse-gamma mixing density,

• The double-exponential prior (Bayesian lasso) has mixing density

Page 8: The horseshoe estimator for sparse signals

Review of similar methods• The normal-Jeffreys prior is an improper prior and is induced

by placing the Jeffreys’ prior on each variance term

leading to . This choice is commonly used in the absence of a global scale parameter.

• The Strawderman-Berger prior does not have an analytic form, but arises from assuming , with

• The normal-exponential-gamma family of priors generalizes the lasso specification using to mix over the exponential rate parameter, leading to

Page 9: The horseshoe estimator for sparse signals

Review of similar methods

Shrinkage of noiseTail robustness of prior

Page 10: The horseshoe estimator for sparse signals

Robustness to large signals• Theorem 2. Let be the likelihood, and suppose that

is a zero-mean scale mixture of normals: with having proper prior . Assume further that the likelihood and are such that the marginal density is finite for all . Define the following three pseudo-densities, which may be improper:

Then

Page 11: The horseshoe estimator for sparse signals

• If is a Gaussian likelihood, then the result of Theorem 2 reduces to

• A key result of Theorem 2 is that if the prior on is chosen such that the derivative of the log probability leads to the derivative of the log predictive probability that is bounded at 0 at large . This happens for heavy-tailed priors, including the proposed horseshoe prior. This yields

Robustness to large signals

Page 12: The horseshoe estimator for sparse signals

The horseshoe score function• Theorem 3. Suppose . Let denote the

predictive density under the horseshoe prior for known scale parameter , i.e. where and . Then for some that depends upon , and

• Corollary:

• Although the horseshoe prior has no analytic form, it does lead to the following posterior mean

where is a degenerate hypergeometric function of two variables.

Page 13: The horseshoe estimator for sparse signals

Estimating • The conditional posterior distribution of is approximately

if dimensionality p is large.

• This approximately yields a distribution for where .

• If most observations are shrunk toward 0, then will be small with high probability.

Page 14: The horseshoe estimator for sparse signals

Comparison to double exponential

Page 15: The horseshoe estimator for sparse signals

Super-efficient convergence• Theorem 4. Suppose the true sampling model is .

Then:(1) For under the horseshoe prior, the optimal rate of convergence of when is

where b is a constant. When , the optimal rate is .

(2) Suppose is any other density that is continuous, bounded above, and strictly positive on a neighborhood of the true value . For under , the optimal rate of convergence of , regardless of , is

Page 16: The horseshoe estimator for sparse signals

Example - simulated data

• Data generated from

Page 17: The horseshoe estimator for sparse signals

Example-Vanguard mutual-fund data• Here, the authors show how the horseshoe can provide a

regularized estimate of a large covariance matrix whose inverse may be sparse.

• Vanguard mutual funds dataset containing n = 86 weekly returns for p = 59 funds.

• Suppose the observation matrix is

with each p-dimensional vector is drawn from a zero-mean Gaussian with covariance matrix .

• We will model the Cholesky decomposition of .

Page 18: The horseshoe estimator for sparse signals

Example-Vanguard mutual-fund data• The goal is to estimate the ensemble of regression models in

the implied triangular system , where is the column of Y.

• The regression coefficients are assumed to have a Horseshoe prior, and posterior means were computed using MCMC.

Page 19: The horseshoe estimator for sparse signals

Conclusions• This paper introduces the horseshoe prior as a good default

prior for sparse problems.

• Empirically, the model performs similarly to Bayesian model averaging, the current standard.

• The model exhibits strong global shrinkage and robust local adaptation to signals.