2006 hopkins epi-biostat summer institute1 module 2: bayesian hierarchical models instructor:...

40
2006 Hopkins Epi-Biostat Summer Institute 1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and Michael Griswold The Johns Hopkins University Bloomberg School of Public Health

Upload: clara-powell

Post on 28-Dec-2015

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 1

Module 2: Bayesian Hierarchical Models

Instructor: Elizabeth Johnson

Course Developed: Francesca Dominici and Michael Griswold

The Johns Hopkins University

Bloomberg School of Public Health

Page 2: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 2

Key Points from yesterday

“Multi-level” Models: Have covariates from many levels and their interactions Acknowledge correlation among observations from

within a level (cluster)

Random effect MLMs condition on unobserved “latent variables” to describe correlations

Random Effects models fit naturally into a Bayesian paradigm

Bayesian methods combine prior beliefs with the likelihood of the observed data to obtain posterior inferences

Page 3: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 3

Bayesian Hierarchical Models

Module 2:Example 1: School Test Scores

The simplest two-stage model WinBUGS

Example 2: Aww Rats A normal hierarchical model for repeated

measures WinBUGS

Page 4: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 4

Example 1: School Test Scores

Page 5: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 5

Testing in Schools Goldstein et al. (1993) Goal: differentiate between `good' and `bad‘

schools Outcome: Standardized Test Scores Sample: 1978 students from 38 schools

MLM: students (obs) within schools (cluster)

Possible Analyses:1. Calculate each school’s observed average score

2. Calculate an overall average for all schools

3. Borrow strength across schools to improve individual school estimates

Page 6: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 6

Testing in Schools Why borrow information across schools?

Median # of students per school: 48, Range: 1-198 Suppose small school (N=3) has: 90, 90,10 (avg=63) Suppose large school (N=100) has avg=65 Suppose school with N=1 has: 69 (avg=69) Which school is ‘better’? Difficult to say, small N highly variable estimates For larger schools we have good estimates, for

smaller schools we may be able to borrow information from other schools to obtain more accurate estimates

How? Bayes

Page 7: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 7

0 10 20 30 40

02

04

06

08

01

00

school

sco

reTesting in Schools: “Direct Estimates”

Model: E(Yij) = j = + b*j

Mean Scores & C.I.s for Individual Schools

b*j

Page 8: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 8

Standard Normal regression models: ij ~ N(0,2)

1. Yij = + ij

2. Yij = j + ij

= + b*j + ij

Fixed and Random Effects

j = X (overall avg)

j = Xj (school avg)

= X + b*j = X + (Xj – X) Fixed Effects

Page 9: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 9

Standard Normal regression models: ij ~ N(0,2)

1. Yij = + ij

2. Yij = j + ij

= + b*j + ij

A random effects model:

3. Yij | bj = + bj + ij, with: bj ~ N(0,2) Random

Effects

Fixed and Random Effects

j = X (overall avg)

j = Xj (shool avg)

= X + b*j = X + (Xj – X) Fixed Effects

Represents Prior beliefs about similarities between schools!

Page 10: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 10

Standard Normal regression models: ij ~ N(0,2)

1. Yij = + ij

2. Yij = j + ij

= + b*j + ij

A random effects model:

3. Yij | bj = + bj + ij, with: bj ~ N(0,2) Random

Effects

Estimate is part-way between the model and the data Amount depends on variability () and underlying truth ()

Fixed and Random Effects

j = X (overall avg)

j = Xj (shool avg)

j = X + bjblup = X + b*j = X + (Xj – X)

= X + b*j = X + (Xj – X) Fixed Effects

Page 11: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 11

Testing in Schools: Shrinkage Plot

0 10 20 30 40

02

04

06

08

01

00

school

sco

re

Direct Sample EstsBayes Shrunk Ests

b*j

bj

Page 12: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 12

Testing in Schools: Winbugs

Data: i=1..1978 (students), s=1…38 (schools) Model:

Yis ~ Normal(s , 2y)

s ~ Normal( , 2) (priors on school avgs)

Note: WinBUGS uses precision instead of

variance to specify a normal distribution! WinBUGS:

Yis ~ Normal(s , y) with: 2y = 1 / y

s ~ Normal( , ) with: 2 = 1 /

Page 13: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 13

Testing in Schools: Winbugs WinBUGS Model:

Yis ~ Normal(s , y) with: 2y = 1 / y

s ~ Normal( , ) with: 2 = 1 /

y ~ (0.001,0.001) (prior on precision)

HyperpriorsPrior on mean of school means

~ Normal(0 , 1/1000000)

Prior on precision (inv. variance) of school means ~ (0.001,0.001)

Using “Vague” / “Noninformative” Priors

Page 14: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 14

Testing in Schools: Winbugs

Full WinBUGS Model: Yis ~ Normal(s , y) with: 2

y = 1 / y

s ~ Normal( , ) with: 2 = 1 /

y ~ (0.001,0.001)

~ Normal(0 , 1/1000000)

~ (0.001,0.001)

Page 15: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 15

Testing in Schools: Winbugs WinBUGS Code:model

{for( i in 1 : N ) {

Y[i] ~ dnorm(mu[i],y.tau)mu[i] <- alpha[school[i]] }

for( s in 1 : M ) {alpha[s] ~ dnorm(alpha.c, alpha.tau)}

y.tau ~ dgamma(0.001,0.001)sigma <- 1 / sqrt(y.tau)alpha.c ~ dnorm(0.0,1.0E-6)alpha.tau ~ dgamma(0.001,0.001)

}

Page 16: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 16

Lets fit this one together! All the “model”, “data” and “inits” files are

now posted on the course webpage for you to use for practice!

Testing in Schools: Winbugs

Page 17: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 17

Example 2: Aww, Rats…A normal hierarchical model for

repeated measures

Page 18: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 18

Improving individual-level estimates Gelfand et al (1990) 30 young rats, weights measured weekly for five weeks

Dependent variable (Yij) is weight for rat “i” at week “j”

Data:

Multilevel: weights (observations) within rats (clusters)

Page 19: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 19

Individual & population growth

Pop line(average growth)

Individual Growth Lines

Rat “i” has its own expected growth line:

E(Yij) = b0i + b1iXj

There is also an overall, average population growth line:

E(Yij) = 0 + 1Xj

Wei

ght

Study Day (centered)

Page 20: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 20

Improving individual-level estimates Possible Analyses

1. Each rat (cluster) has its own line:

intercept= bi0, slope= bi1

2. All rats follow the same line:

bi0 = 0 , bi1 = 1

3. A compromise between these two:

Each rat has its own line, BUT…

the lines come from an assumed distribution

E(Yij | bi0, bi1) = bi0 + bi1Xj

bi0 ~ N(0 , 02)

bi1 ~ N(1 , 12)“Random Effects”

Page 21: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 21

Pop line(average growth)

Bayes-Shrunk Individual Growth Lines

A compromise: Each rat has its own line, but information is borrowed across rats to tell us about individual rat growth

Wei

ght

Study Day (centered)

Page 22: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 22

Rats: Winbugs (see help: Examples Vol I)

WinBUGS Model:

Page 23: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 23

Rats: Winbugs (see help: Examples Vol I) WinBUGS Code:

Page 24: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 24

Rats: Winbugs (see help: Examples Vol I) WinBUGS Results: 10000 updates

Page 25: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 25

Interpretation of the results: Primary parameter of interest is beta.c Our estimate is 6.185

(95% Interval: 5.975 – 6.394) We estimate that a “typical” rat’s weight will

increase by 6.2 gm/day Among rats with similar “growth influences”, the

average weight will increase by 6.2 gm/day 95% Interval for the expected growth for a rat is

5.975 – 6.394 gm/day

Page 26: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 26

WinBUGS Diagnostics:

MC error tells you to what extent simulation error contributes to the uncertainty in the estimation of the mean.

This can be reduced by generating additional samples.

Always examine the trace of the samples. To do this select the history button on the Sample Monitor

Tool. Look for:

Trends Correlations

mean

iteration

1 250 500 750 1000

110.0

120.0

130.0

140.0

150.0

Page 27: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 27

Rats: Winbugs (see help: Examples Vol I) WinBUGS Diagnostics: history

alpha0

iteration

1001 2500 5000 7500 10000

90.0

100.0

110.0

120.0

130.0

beta.c

iteration

1001 2500 5000 7500 10000

5.5

5.75

6.0

6.25

6.5

6.75

sigma

iteration

1001 2500 5000 7500 10000

4.0

5.0

6.0

7.0

8.0

9.0

Page 28: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 28

WinBUGS Diagnostics:

Examine sample autocorrelation directly by selecting the ‘auto cor’ button.

If autocorrelation exists, generate additional samples and thin more.

mean

lag

0 20 40

-1.0 -0.5 0.0 0.5 1.0

Page 29: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 29

Rats: Winbugs (see help: Examples Vol I) WinBUGS Diagnostics: autocorrelation

alpha0

lag

0 20 40

-1.0 -0.5 0.0 0.5 1.0

beta.c

lag

0 20 40

-1.0 -0.5 0.0 0.5 1.0

Page 30: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 30

Bayes-Shrunk Growth Lines

WinBUGS provides machinery for Bayesian paradigm “shrinkage estimates” in MLMs

Pop line(average growth)

Wei

ght

Study Day (centered)

Pop line(average growth)

Study Day (centered)

Wei

ght

Individual Growth Lines

Bayes

Page 31: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 31

School Test Scores Revisited

Page 32: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 32

Testing in Schools revisited Suppose we wanted to include covariate

information in the school test scores example Student-level covariates

Gender London Reading Test (LRT) score Verbal reasoning (VR) test category (1, 2 or 3, where 1

represents the highest level of understanding)

School -level covariates Gender intake (all girls, all boys or mixed) Religious denomination (Church of England, Roman

Catholic, State school or other)

Page 33: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 33

Testing in Schools revisited Model

Wow! Can YOU fit this model? Yes you can! See WinBUGS>help>Examples Vol II for data,

code, results, etc. More Importantly: Do you understand this model?

Page 34: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 34

Additional Comments: Y is actually standardized score

(difference from expected norm in standard deviations)

What are the fixed effects in the model?The β are the fixed effects (measured both at

the school and student level)Assume these are independent normal

Page 35: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 35

What are the random effects in the model? The α are the random effects (at the school level) Assume these are multivariate normal These may represent a) inherent school differences

(random intercept) b) inherent school difference in terms of LRT and c) inherent school differences in terms of VR test

Fixed effects interpretations are conditional on schools where these random effects are similar.

In this example we also put a model on the overall variance: we assume that the inverse of the between-pupil variance will increase linearly with LRT score

Additional Comments:

Page 36: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 36

Some results: node mean sd MC error 2.50% median 97.50%beta[1] 2.62E-04 9.87E-05 2.73E-06 6.95E-05 2.63E-04 4.58E-04beta[2] 0.4163 0.06504 0.00332 0.2875 0.4182 0.537beta[3] 0.1715 0.04775 0.001163 0.07816 0.1714 0.2663beta[4] 0.1192 0.134 0.006156 -0.1459 0.1206 0.3731beta[5] 0.06045 0.1044 0.004469 -0.15 0.06354 0.2612beta[6] -0.2839 0.1818 0.005977 -0.6371 -0.2868 0.07477beta[7] 0.1497 0.1062 0.00392 -0.05925 0.1487 0.3657beta[8] -0.1574 0.1763 0.006249 -0.4984 -0.1595 0.1949gamma[1] -0.6726 0.1003 0.006384 -0.8611 -0.674 -0.4734gamma[2] 0.03135 0.01022 1.31E-04 0.01128 0.03127 0.05167gamma[3] 0.9511 0.09027 0.004472 0.7763 0.9532 1.119max.var 0.6228 0.06987 7.49E-04 0.4967 0.6186 0.7709min.var 0.5138 0.05349 6.45E-04 0.4181 0.5113 0.6276phi -0.00266 0.002843 3.28E-05 -0.00831 -0.00265 0.002981theta 0.5792 0.03313 3.67E-04 0.5154 0.5795 0.6435

Page 37: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 37

Gamma[1] to Gamma[3] represent the means of the random effects distributions

Gamma[1] is the mean of the random intercept distribution; hard to interpret in this case

Gamma[2] is the mean of the random effect of LRT Among children from schools with similar latent

effects, a one unit increase in LRT yeilds a 0.03 standard deviation increase in the child’s test score.

Some results:

Page 38: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 38

Gamma[3] is the mean of the random effect for the VR test.

Among children from schools with similar latent effects, children with the highest VR scores have test scores that are on average 0.95 standard deviations greater than children with the lowest VR scores (95% CI: 0.78 – 1.12)

Among children from schools with similar latent effects, children with the “moderate” VR scores have test scores that are on average 0.42 standard deviations greater than children with the lowest VR scores (95% CI: 0.29 – 0.54).

Some results:

Page 39: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 39

Among children from similar schools, girls have average test scores that are 0.17 standard deviation greater than boys (95% CI: 0.08 – 0.27)

Among similar schools, all girls schools have average test scores that are 0.12 standard deviations greater than mixed schools (95% CI: -0.15 – 0.37)

Some results:

Page 40: 2006 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Instructor: Elizabeth Johnson Course Developed: Francesca Dominici and

2006 Hopkins Epi-Biostat Summer Institute 40

Bayesian Concepts

Frequentist: Parameters are “the truth”

Bayesian: Parameters have a distribution

“Borrow Strength” from other observations

“Shrink Estimates” towards overall averages

Compromise between model & data

Incorporate prior/other information in estimates

Account for other sources of uncertainty

Posterior Likelihood * Prior