stat 579: generalized linear models and extensionsluyan/stat57918/week72.pdf · 2018. 3. 7. ·...

Stat 579: Generalized Linear Models andExtensions

Mixed models

Yan LuFeb, 2018, week 7 (2nd class)

1 / 26

General Gaussian linear mixed model

A general linear mixed model, may be expressed as

y = Xβ + Zα+ ε

y : n × 1 vector of observationX : n × p matrix of known covariatesβ : p × 1 fixed effects, vector of unknown regression coefficientsZ : n × q known matrixα : q × 1 vector of random effectsε : n × 1 noise term, a vector of errorsAssumptions:

I α ∼ N(0,G), ε ∼ N(0,R)—–G and R involve some unknown dispersion parameters orvariance components, α and ε are independent.

Var(y) = Var(Xβ + Zα+ ε)

= ZVar(α)Z′ + Var(ε)

= ZGZ′ + R = V2 / 26

Baysian approach

Recall Bayes formulas

P(A|B) =P(B|A)P(A)

P(B)=

P(B|A)P(A)

P(B|A)P(A) + P(B|AC )P(AC )

Related to the conditional density of X given Y = y ,

fX |Y=y (x) =fY |X=x(y)fX (x)∫fY |X=x(y)fX (x)dx

I The distribution with density fX (x) of X is called the priordistribution

I The conditional distribution with density function fX |Y=y (x) iscalled the posterior distribution.

I The conditional distribution with density fY |X=x(y) is calledthe likelihood function.

3 / 26

Recall also the rules relating conditional and marginal moments:

E [Y ] = EX {E [Y |X ]}

Var [Y ] = EX {Var [Y |X ]}+ VarX {E [Y |X ]}

Cov [Y ,Z ] = EX {Cov [Y ,Z ]|X}+ CovX {E [Y |X ],E [Z |X ]}

4 / 26

Consider mixed model

y = Xβ + Zα+ ε (1)

I β: fixed regression parameters

I θ: the vector of variance components involved in the model

I Under normality y ∼ N(Xβ,V)

Posterior density:

f (θ|y) =f (y|θ)f (θ)∫f (y|θ)f (θ)dθ

Model (1) can be specified in hierachical fashion as

1. y|θ ∼ f (y|θ),—–conditional distribution of y given θ through density f (y|θ)

2. θ ∼ f (θ), θ is random5 / 26

Posterior distribution for multivariate normal distributions

Let Y|µ ∼ Np(µ,Σ) and µ ∼ Np(µ∗,Σ0), where Σ and Σ0 are offull rank p, then the posterior distribution of µ after observation ofY = y is given by

µ|Y = y ∼ Np(Wµ∗ + (I−W)y, (I−W)Σ)

where W = Σ(Σ0 + Σ)−1 and I−W = Σ0(Σ0 + Σ)−1

6 / 26

Example: one way random effects modelVersion 1

yij = µi + εij , with µi = µ+ αi

I εijiid∼ N(0, σ2), µi and εijs are independent

I yij |µiindependent∼ N(µi , σ

2), µiiid∼ N(µ, σ2α)

I Marginally, yij ∼ N(µ, σ2 + σ2α)

I

Cov(yij , yik) =

{σ2α j 6= k distinct observation in same group

σ2α + σ2 j = k

I Cov(yij , yi ′j ′) = 0, observations are independent if they arefrom different group

7 / 26

The posterior distribution of µi after observations ofyi1, yi2, · · · , yini is a normal distribution with mean and variance

E [µi |Yi · = yi ·] =µ/σ2α + ni yi ·/σ

2

1/σ2α + ni/σ2= wµ+ (1− w)yi ·

Var [µi |Yi · = yi ·] =1

1/σ2α + ni/σ2

where w =1/σ2α

1/σ2α + ni/σ2=

1

1 + niγwith γ = σ2α/σ

2

Yi ·|µi ∼ N(µi , σ2/ni )

8 / 26

Mixed model approach

Model is specified as a hierachical model, but it is allowed to havenonrandom parameters β

y|θ ∼ L(y|θ,β),θ ∼ f (θ,β)

L(β) =

∫L(y|θ,β)f (θ,β)dθ (2)

I Random effects are unobservable and are integrated out in (2).

I β is estimated

I the posterior mean, for example, in one way random effectmodel, E [µi |y], is called the estimate of the random effect

9 / 26

Summary

I Mixed model = Bayesian + frequentist

I As in Bayesian approach, mixed model assumes a hierarchicalmodel where the parameter is treated as random

I On the other hand, the hyperparameter β is not arbitraitlyspecified as in the bayesian approach, but is estimated fromthe data

10 / 26

Matrix differentiation

A =

(a11 a12a21 a22

)with aij = f (θ)

∂A

∂θ=

∂a11∂θ

∂a12∂θ

∂a21∂θ

∂a22∂θ

11 / 26

Let

a =

a1a2...ak

,θ =

θ1θ2...θl

ai = f (θ1, θ2, · · · , θl)

∂a

∂θ′=

∂a1∂θ1

∂a1∂θ2

· · · ∂a1∂θl

∂a2∂θ1

∂a2∂θ2

· · · ∂a2∂θl

...∂ak∂θ1

∂ak∂θ2

· · · ∂ak∂θl

k×l

12 / 26

Let

a =

a1a2...ak

,θ =

θ1θ2...θl

ai = f (θ1, θ2, · · · , θl)

(∂a

∂θ′

)′=∂a′

∂θ=

∂a1∂θ1

· · · ∂ak∂θ1

∂a1∂θ2

· · · ∂ak∂θ2

...∂a1∂θl

· · · ∂ak∂θl

13 / 26

1. inner product

∂(a′b)

∂θ=

(∂a′

∂θ

)b +

(∂b′

∂θ

)a

Example: let

a =

(a1a2

),b =

(b1b2

),θ =

(θ1θ2

)a′b = a1b1 + a2b2

∂(a′b)

∂θ=

∂(a1b1 + a2b2)

∂θ1∂(a1b1 + a2b2)

∂θ2

=

∂a1∂θ1

b1 + a1∂b1∂θ1

+∂a2∂θ1

b2 + a2∂b2∂θ1

∂a1∂θ2

b1 + a1∂b1∂θ2

+∂a2∂θ2

b2 + a2∂b2∂θ2

14 / 26

∂a′

∂θ=∂(a1, a2)

∂θ=

∂a1∂θ1

∂a2∂θ1

∂a1∂θ2

∂a2∂θ2

∂b′

∂θ=∂(b1, b2)

∂θ=

∂b1∂θ1

∂b2∂θ1

∂b1∂θ2

∂b2∂θ2

∂a′

∂θb =

∂a1∂θ1

∂a2∂θ1

∂a1∂θ2

∂a2∂θ2

( b1b2

)=

∂a1∂θ1

b1 +∂a2∂θ1

b2

∂a1∂θ2

b1 +∂a2∂θ2

b2

15 / 26

2. A symmetric,∂

∂xx′Ax = 2Ax

3. Inverse, |A| 6= 0

∂A−1

∂θi= −A−1

(∂A

∂θi

)A−1

4. Log-determinant, if the matrix A above is also positivedefinite, then, for any component θi of θ

∂

∂θilog(|A|) = tr

(A−1

∂A

∂θi

)

16 / 26

5.∂a′x

∂x= a,

dAx

dx′= A,

∂x′A

∂x= A

Prove: since

∂(a′b)

∂θ=

(∂a′

∂θ

)b +

(∂b′

∂θ

)a

so∂(a′x)

∂x=

(∂a′

∂x

)x +

(∂x

∂x

)a = 0 + a

17 / 26

ExampleLet

a =

(a1a2

), x =

(x1x2

),

a′x = a1x1 + a2x2

∂(a′x)

∂x=∂(a1x1 + a2x2)

∂x=

∂(a1x1 + a2x2)

∂x1∂(a1x1 + a2x2)

∂x2

=

(a1a2

)= a

18 / 26

Estimation in Gaussian models

I Maximum likelihood

I Restricted maximum likelihood

I Method of moments

19 / 26

y = Xβ + Zα+ ε

Assumptions: α ∼ N(0,G), ε ∼ N(0,R)—–G and R involve some unknown dispersion parameters orvariance components, α and ε are independent.

Var(y) = Var(Xβ + Zα+ ε)

= ZVar(α)Z′ + Var(ε)

= ZGZ′ + R = V

Marginally,y ∼ N(Xβ,V)

f (y) =1

(2π)n/2|V|1/2exp

{−1

2(y − Xβ)′V−1(y − Xβ)

}20 / 26

Maximum likelihood

f (y) =1

(2π)n/2|V|1/2exp

{−1

2(y − Xβ)′V−1(y − Xβ)

}lnf (y) = c − 1

2ln(|V|)− 1

2(y − Xβ)′V−1(y − Xβ)

θ: the vector of all the variance components (involved in V),c : constantsβ: regression parameters

lnf (y) = c − 1

2ln(|V|)− 1

2(y − Xβ)′V−1(y − Xβ)

= c − 1

2ln(|V|)− 1

2(y′V−1y − y′V−1Xβ

−β′X′V−1y + β′X′V−1Xβ)

21 / 26

∂lnf (y)

∂β=

∂1

2(y′V−1Xβ + β′X′V−1y − β′X′V−1Xβ)

∂β

=1

2X′V−1y +

1

2X′V−1y − 1

2× 2X′V−1Xβ

= X′V−1y − X′V−1Xβ (3)

Set equation (3) equal to 0, suppose rank(X) = p (full rank),

X′V−1y − X′V−1Xβ = 0

therefore,β = (X′V−1X)−1X′V−1y

Need to estimate V.Recall general regression

β = (X′X)−1X′y

22 / 26

∂lnf (y)

∂θr= −1

2tr

(V−1

∂V

∂θr

)+

1

2(y − Xβ)′V−1

(∂V

∂θr

)V−1(y − Xβ)

= −1

2

{(y − Xβ)′V−1

∂V

∂θrV−1(y − Xβ)− tr

(V−1

∂V

∂θr

)}(4)

By (3) and (4), we can show that

y′P∂V

∂θrPy = tr

(V−1

∂V

∂θr

), r = 1, · · · , q (5)

whereP = V−1 − V−1X(X′V−1X)−1X′V−1

Set (5) equal to 0, solve for θr , then solve for β

23 / 26

Estimation of one way random effects model

yij = µ+ αi + εij , i = 1, · · · ,m, j = 1, · · · , k .αi ∼ N(0, σ2α), εij ∼ N(0, σ2)

I

y = Xµ+ Zα+ ε

I

X = 1m ⊗ 1k = 1mk ,Zmk×m = Im ⊗ 1k

R = σ2Imk

V = ZGZ′ + R

24 / 26

l(µ, σ2α, σ2) = c − 1

2(n −m)log(σ2)− 1

2

m∑i=1

log(σ2 + kσ2α)

− 1

2σ2

m∑i=1

k∑j=1

(yij − µ)2 +σ2α2σ2

m∑i=1

k2

σ2 + kσ2α(yi · − µ)2

I find∂l

∂µ,∂l

∂σ2and

∂l

∂σ2αI set them to zero to find µ, σ2α and σ2.

25 / 26

Asymptotic covariance matrix

Under suitable conditions, the MLE is consistent andasymptotically normal with the asymptotic covariance matrix equalto the inverse of the Fisher information matrix.Let ψ = (β′,θ′)′, then under regularity conditions, the Fisherinformation matrix has the following expressions

−E(

∂2l

∂ψψ′

)

26 / 26

stat 579: generalized linear models and extensionsluyan/stat57918/week72.pdf · 2018. 3. 7. ·...

Documents