stat 579: generalized linear models and extensionsluyan/stat57918/week72.pdf · 2018. 3. 7. ·...
TRANSCRIPT
Stat 579: Generalized Linear Models andExtensions
Mixed models
Yan LuFeb, 2018, week 7 (2nd class)
1 / 26
General Gaussian linear mixed model
A general linear mixed model, may be expressed as
y = Xβ + Zα+ ε
y : n × 1 vector of observationX : n × p matrix of known covariatesβ : p × 1 fixed effects, vector of unknown regression coefficientsZ : n × q known matrixα : q × 1 vector of random effectsε : n × 1 noise term, a vector of errorsAssumptions:
I α ∼ N(0,G), ε ∼ N(0,R)—–G and R involve some unknown dispersion parameters orvariance components, α and ε are independent.
Var(y) = Var(Xβ + Zα+ ε)
= ZVar(α)Z′ + Var(ε)
= ZGZ′ + R = V2 / 26
Baysian approach
Recall Bayes formulas
P(A|B) =P(B|A)P(A)
P(B)=
P(B|A)P(A)
P(B|A)P(A) + P(B|AC )P(AC )
Related to the conditional density of X given Y = y ,
fX |Y=y (x) =fY |X=x(y)fX (x)∫fY |X=x(y)fX (x)dx
I The distribution with density fX (x) of X is called the priordistribution
I The conditional distribution with density function fX |Y=y (x) iscalled the posterior distribution.
I The conditional distribution with density fY |X=x(y) is calledthe likelihood function.
3 / 26
Recall also the rules relating conditional and marginal moments:
E [Y ] = EX {E [Y |X ]}
Var [Y ] = EX {Var [Y |X ]}+ VarX {E [Y |X ]}
Cov [Y ,Z ] = EX {Cov [Y ,Z ]|X}+ CovX {E [Y |X ],E [Z |X ]}
4 / 26
Consider mixed model
y = Xβ + Zα+ ε (1)
I β: fixed regression parameters
I θ: the vector of variance components involved in the model
I Under normality y ∼ N(Xβ,V)
Posterior density:
f (θ|y) =f (y|θ)f (θ)∫f (y|θ)f (θ)dθ
Model (1) can be specified in hierachical fashion as
1. y|θ ∼ f (y|θ),—–conditional distribution of y given θ through density f (y|θ)
2. θ ∼ f (θ), θ is random5 / 26
Posterior distribution for multivariate normal distributions
Let Y|µ ∼ Np(µ,Σ) and µ ∼ Np(µ∗,Σ0), where Σ and Σ0 are offull rank p, then the posterior distribution of µ after observation ofY = y is given by
µ|Y = y ∼ Np(Wµ∗ + (I−W)y, (I−W)Σ)
where W = Σ(Σ0 + Σ)−1 and I−W = Σ0(Σ0 + Σ)−1
6 / 26
Example: one way random effects modelVersion 1
yij = µi + εij , with µi = µ+ αi
I εijiid∼ N(0, σ2), µi and εijs are independent
I yij |µiindependent∼ N(µi , σ
2), µiiid∼ N(µ, σ2α)
I Marginally, yij ∼ N(µ, σ2 + σ2α)
I
Cov(yij , yik) =
{σ2α j 6= k distinct observation in same group
σ2α + σ2 j = k
I Cov(yij , yi ′j ′) = 0, observations are independent if they arefrom different group
7 / 26
The posterior distribution of µi after observations ofyi1, yi2, · · · , yini is a normal distribution with mean and variance
E [µi |Yi · = yi ·] =µ/σ2α + ni yi ·/σ
2
1/σ2α + ni/σ2= wµ+ (1− w)yi ·
Var [µi |Yi · = yi ·] =1
1/σ2α + ni/σ2
where w =1/σ2α
1/σ2α + ni/σ2=
1
1 + niγwith γ = σ2α/σ
2
Yi ·|µi ∼ N(µi , σ2/ni )
8 / 26
Mixed model approach
Model is specified as a hierachical model, but it is allowed to havenonrandom parameters β
y|θ ∼ L(y|θ,β),θ ∼ f (θ,β)
L(β) =
∫L(y|θ,β)f (θ,β)dθ (2)
I Random effects are unobservable and are integrated out in (2).
I β is estimated
I the posterior mean, for example, in one way random effectmodel, E [µi |y], is called the estimate of the random effect
9 / 26
Summary
I Mixed model = Bayesian + frequentist
I As in Bayesian approach, mixed model assumes a hierarchicalmodel where the parameter is treated as random
I On the other hand, the hyperparameter β is not arbitraitlyspecified as in the bayesian approach, but is estimated fromthe data
10 / 26
Matrix differentiation
A =
(a11 a12a21 a22
)with aij = f (θ)
∂A
∂θ=
∂a11∂θ
∂a12∂θ
∂a21∂θ
∂a22∂θ
11 / 26
Let
a =
a1a2...ak
,θ =
θ1θ2...θl
ai = f (θ1, θ2, · · · , θl)
∂a
∂θ′=
∂a1∂θ1
∂a1∂θ2
· · · ∂a1∂θl
∂a2∂θ1
∂a2∂θ2
· · · ∂a2∂θl
...∂ak∂θ1
∂ak∂θ2
· · · ∂ak∂θl
k×l
12 / 26
Let
a =
a1a2...ak
,θ =
θ1θ2...θl
ai = f (θ1, θ2, · · · , θl)
(∂a
∂θ′
)′=∂a′
∂θ=
∂a1∂θ1
· · · ∂ak∂θ1
∂a1∂θ2
· · · ∂ak∂θ2
...∂a1∂θl
· · · ∂ak∂θl
13 / 26
1. inner product
∂(a′b)
∂θ=
(∂a′
∂θ
)b +
(∂b′
∂θ
)a
Example: let
a =
(a1a2
),b =
(b1b2
),θ =
(θ1θ2
)a′b = a1b1 + a2b2
∂(a′b)
∂θ=
∂(a1b1 + a2b2)
∂θ1∂(a1b1 + a2b2)
∂θ2
=
∂a1∂θ1
b1 + a1∂b1∂θ1
+∂a2∂θ1
b2 + a2∂b2∂θ1
∂a1∂θ2
b1 + a1∂b1∂θ2
+∂a2∂θ2
b2 + a2∂b2∂θ2
14 / 26
∂a′
∂θ=∂(a1, a2)
∂θ=
∂a1∂θ1
∂a2∂θ1
∂a1∂θ2
∂a2∂θ2
∂b′
∂θ=∂(b1, b2)
∂θ=
∂b1∂θ1
∂b2∂θ1
∂b1∂θ2
∂b2∂θ2
∂a′
∂θb =
∂a1∂θ1
∂a2∂θ1
∂a1∂θ2
∂a2∂θ2
( b1b2
)=
∂a1∂θ1
b1 +∂a2∂θ1
b2
∂a1∂θ2
b1 +∂a2∂θ2
b2
15 / 26
2. A symmetric,∂
∂xx′Ax = 2Ax
3. Inverse, |A| 6= 0
∂A−1
∂θi= −A−1
(∂A
∂θi
)A−1
4. Log-determinant, if the matrix A above is also positivedefinite, then, for any component θi of θ
∂
∂θilog(|A|) = tr
(A−1
∂A
∂θi
)
16 / 26
5.∂a′x
∂x= a,
dAx
dx′= A,
∂x′A
∂x= A
Prove: since
∂(a′b)
∂θ=
(∂a′
∂θ
)b +
(∂b′
∂θ
)a
so∂(a′x)
∂x=
(∂a′
∂x
)x +
(∂x
∂x
)a = 0 + a
17 / 26
ExampleLet
a =
(a1a2
), x =
(x1x2
),
a′x = a1x1 + a2x2
∂(a′x)
∂x=∂(a1x1 + a2x2)
∂x=
∂(a1x1 + a2x2)
∂x1∂(a1x1 + a2x2)
∂x2
=
(a1a2
)= a
18 / 26
Estimation in Gaussian models
I Maximum likelihood
I Restricted maximum likelihood
I Method of moments
19 / 26
y = Xβ + Zα+ ε
Assumptions: α ∼ N(0,G), ε ∼ N(0,R)—–G and R involve some unknown dispersion parameters orvariance components, α and ε are independent.
Var(y) = Var(Xβ + Zα+ ε)
= ZVar(α)Z′ + Var(ε)
= ZGZ′ + R = V
Marginally,y ∼ N(Xβ,V)
f (y) =1
(2π)n/2|V|1/2exp
{−1
2(y − Xβ)′V−1(y − Xβ)
}20 / 26
Maximum likelihood
f (y) =1
(2π)n/2|V|1/2exp
{−1
2(y − Xβ)′V−1(y − Xβ)
}lnf (y) = c − 1
2ln(|V|)− 1
2(y − Xβ)′V−1(y − Xβ)
θ: the vector of all the variance components (involved in V),c : constantsβ: regression parameters
lnf (y) = c − 1
2ln(|V|)− 1
2(y − Xβ)′V−1(y − Xβ)
= c − 1
2ln(|V|)− 1
2(y′V−1y − y′V−1Xβ
−β′X′V−1y + β′X′V−1Xβ)
21 / 26
∂lnf (y)
∂β=
∂1
2(y′V−1Xβ + β′X′V−1y − β′X′V−1Xβ)
∂β
=1
2X′V−1y +
1
2X′V−1y − 1
2× 2X′V−1Xβ
= X′V−1y − X′V−1Xβ (3)
Set equation (3) equal to 0, suppose rank(X) = p (full rank),
X′V−1y − X′V−1Xβ = 0
therefore,β = (X′V−1X)−1X′V−1y
Need to estimate V.Recall general regression
β = (X′X)−1X′y
22 / 26
∂lnf (y)
∂θr= −1
2tr
(V−1
∂V
∂θr
)+
1
2(y − Xβ)′V−1
(∂V
∂θr
)V−1(y − Xβ)
= −1
2
{(y − Xβ)′V−1
∂V
∂θrV−1(y − Xβ)− tr
(V−1
∂V
∂θr
)}(4)
By (3) and (4), we can show that
y′P∂V
∂θrPy = tr
(V−1
∂V
∂θr
), r = 1, · · · , q (5)
whereP = V−1 − V−1X(X′V−1X)−1X′V−1
Set (5) equal to 0, solve for θr , then solve for β
23 / 26
Estimation of one way random effects model
yij = µ+ αi + εij , i = 1, · · · ,m, j = 1, · · · , k .αi ∼ N(0, σ2α), εij ∼ N(0, σ2)
I
y = Xµ+ Zα+ ε
I
X = 1m ⊗ 1k = 1mk ,Zmk×m = Im ⊗ 1k
R = σ2Imk
V = ZGZ′ + R
24 / 26
l(µ, σ2α, σ2) = c − 1
2(n −m)log(σ2)− 1
2
m∑i=1
log(σ2 + kσ2α)
− 1
2σ2
m∑i=1
k∑j=1
(yij − µ)2 +σ2α2σ2
m∑i=1
k2
σ2 + kσ2α(yi · − µ)2
I find∂l
∂µ,∂l
∂σ2and
∂l
∂σ2αI set them to zero to find µ, σ2α and σ2.
25 / 26
Asymptotic covariance matrix
Under suitable conditions, the MLE is consistent andasymptotically normal with the asymptotic covariance matrix equalto the inverse of the Fisher information matrix.Let ψ = (β′,θ′)′, then under regularity conditions, the Fisherinformation matrix has the following expressions
−E(
∂2l
∂ψψ′
)
26 / 26