covariance estimation with cholesky decomposition and ...bchang/gmrg/files/bo_05222015.pdf ·...

Post on 19-Feb-2020

16 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Covariance estimation with Cholesky decomposition andgeneralized linear model

Bo Chang

Graphical Models Reading Group

May 22, 2015

Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 1 / 21

Modified Cholesky decomposition

Goal: Find a re-parameterization of a covariance matrix that isunconstrained and statistically interpretable.

Assume Y = (Y1, . . . ,Yp)′ is an ordered (time-ordered) randomvector with mean 0 and covariance matrix Σ.

Yt =t−1∑j=1

φt,jYj + εt .

Let σ2t = Var(εt) and

Cov(ε) = diag(σ21, . . . , σ2p) = D.

Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 2 / 21

Modified Cholesky decomposition

Rearranging

Yt =t−1∑j=1

φt,jYj + εt ,

we have TY = ε, where

T =

1−φ2,1 1−φ3,1 −φ3,2 1

......

. . .

−φp,1 −φp,2 · · · −φp,p−1 1

.

Cov(TY ) = Cov(ε) = TΣT′ = D.

Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 3 / 21

Modified Cholesky decomposition

Definition: For a positive-definite covariance matrix Σ, its modifiedCholesky decomposition is

TΣT′ = D,

where T is a unique unit lower-triangular matrix having ones on itsdiagonal and D is a unique diagonal matrix.

Precision matrix can be written as

Σ−1 = T′D−1T.

T is unconstrained and statistically meaningful.

T and D can be fitted by regressing a variable Yt on its predecessors.

Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 4 / 21

Sparse estimation

k-banding:

AR(k) model.

Yt =k∑

i=1

φt,t−iYt−i + εt

The resulting estimate of the precision matrix is also k-banded.

Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 5 / 21

Sparse estimation

Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 6 / 21

Sparse estimation

k-banding:

Nonparametric estimation: Wu and Pourahmadi (2003) used localpolynomial estimators to smooth the subdiagonals of T.

k∑j=0

fj ,p(t/p)Yt−j = σp(t/p)εt ,

where f0,p(·) = 1, fj ,p(·) and σp(·) are continuous functions on [0, 1].εt are independent with mean 0 and variance 1.

φt,t−j = fj ,p(t/p), σt = σp(t/p).

Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 7 / 21

Sparse estimation

Lasso penalty: Huang et al. (2006)

Minimize

n log |Σ|+ ntr(D−1TST′) + λ

p∑t=2

t−1∑j=1

|φt,j |.

Zeros are placed in T with no regular patterns.

Sparsity of the precision matrix is not guaranteed.

Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 8 / 21

Sparse estimation

Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 9 / 21

Sparse estimation

Nested lasso penalty / Adaptive banding: Levina et al. (2008)

Minimize

n log |Σ|+ ntr(D−1TST′) + λ

p∑t=2

P(φt),

P(φt) = |φt,t−1|+|φt,t−2||φt,t−1|

+ · · ·+ |φt,1||φt,2|

,

where 0/0 is defined to be zero.

Select the best model that regresses the jth variable on its k closestpredecessors, where k = kj is dependent on j .

Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 10 / 21

Sparse estimation

Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 11 / 21

Sparse estimation

Forward adaptive banding: Leng and Li. (2011)

Minimize modified BIC:

n log |Σ|+ ntr(D−1TST′) + Cn log(n)

p∑j=1

kj ,

s.t. kj ≤ min{n/(log n)2, j − 1},

where kj is the band length.

Fit AR(kj) to obtain T and D.

Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 12 / 21

Cholesky decomposition: summary

Cholesky decomposition is dependent on the order in which thevariables appear in the random vector Y .

It works when the variables have a natural ordering.

Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 13 / 21

GLM for covariance matrices

Another way to reduce number of covariance parameters is to usecovariates, as in modeling the mean vector.

Path of development: linear → log-linear → GLM.

Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 14 / 21

Linear covariance models

Linear covariance models (LCM):

Σ± = α1U1 + · · ·+ αqUq,

where Ui ’s are some known symmetric basis matrices (covariates) andαi ’s are unknown parameters.

For q = p2, any covariance matrix can be written as:

Σ = (σij) =

p∑i=1

p∑j=1

σijUij ,

where Uij is matrix with 1 on (i , j)th position and 0 elsewhere.

Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 15 / 21

Linear covariance models

MLE: the score equation of αi is

tr(Σ−1Ui )− tr(SΣ−1UiΣ−1) = 0,

which can be solved by an iterative method.

Constraint: αi ’s are restricted so that the matrix is positive definite.

Lack of interpretation.

Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 16 / 21

Log-linear covariance models

Log-linear covariance models:

log Σ = α1U1 + · · ·+ αqUq,

αi ’s are now unconstrained.

Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 17 / 21

GLM via Cholesky decomposition

Pourahmadi (1999):

Cholesky decomposition: Σ−1 = T′D−1T.

T and log D are unconstrained.

Parametric models for φt,j and log σ2t :

log σ2t = z ′tλ, φt,j = w ′t,jγ,

where zt and wt,j are q × 1 and d × 1 vectors of covariates, λ and γare parameters.

Common covariates are powers of times and lags

zt = (1, t, t2, . . . , tq−1)′,

wt,j = (1, t − j , (t − j)2, . . . , (t − j)d−1)′.

Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 18 / 21

GLM via Cholesky decomposition

Number of parameters: q + d .

Computing MLE is relatively simple:

−2l(λ, γ) = n log |D|+ ntr(D−1TST′).

Given D, the MLE of T has a closed form. Similarly, given T, theMLE of D has a closed form.

Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 19 / 21

References

Pourahmadi, M. (2011). Covariance estimation: The GLM and regularizationperspectives. Statistical Science, 26(3), 369-387.

Pourahmadi, M. (2013). High-Dimensional Covariance Estimation: WithHigh-Dimensional Data. John Wiley & Sons.

Pourahmadi, M. (1999). Joint mean-covariance models with applications tolongitudinal data: Unconstrained parameterisation. Biometrika, 86(3), 677-690.

Huang, J. Z., Liu, N., Pourahmadi, M., & Liu, L. (2006). Covariance matrixselection and estimation via penalised normal likelihood. Biometrika, 93(1), 85-98.

Leng, C., & Li, B. (2011). Forward adaptive banding for estimating largecovariance matrices. Biometrika, 98(4), 821-830.

Levina, E., Rothman, A., & Zhu, J. (2008). Sparse estimation of large covariancematrices via a nested Lasso penalty. The Annals of Applied Statistics, 2(1),245-263.

Wu, W. B., & Pourahmadi, M. (2003). Nonparametric estimation of largecovariance matrices of longitudinal data. Biometrika, 90(4), 831-844.

Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 20 / 21

The End

Bo Chang (UBC) Cholesky decomposition and GLM May 22, 2015 21 / 21

top related