bayesian methods in multilevel regression
TRANSCRIPT
![Page 1: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/1.jpg)
1
Bayesian Methodsin Multilevel Regression
Joop HoxMuLOG, 15 september 2000
mcmc
![Page 2: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/2.jpg)
2
2
What is Statistics?
! Statistics is about uncertainty
“To err is human, to forgive divine, but toinclude errors in your design is statistical”
Leslie Kish, 1977Presidential address A.S.A.
![Page 3: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/3.jpg)
3
3
Uncertainty inClassical Statistics
! Uncertainty = sampling distribution! Estimate population parameter θ by! Imagine drawing an infinity of samples! Distribution of over samples
! We have only one sample! Estimate and its sampling distribution! Estimate 95% confidence interval
θ
θ
θ
![Page 4: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/4.jpg)
4
4
Inference inClassical Statistics
! What does 95% confidence intervalactually mean?! Over an infinity of samples, 95% of these
contain the true population value θ! But we have only one sample! We never know if our present estimate and
confidence interval is one of those 95% or notθ
![Page 5: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/5.jpg)
5
5
Inference inClassical Statistics
! What does 95% confidence interval NOTmean?
! We have a 95% probability that the truepopulation value θ is within the limits ofour confidence interval
! We only have an aggregate assurance thatin the long run 95% of our confidenceintervals contain the true population value
![Page 6: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/6.jpg)
6
6
Uncertainty inBayesian Statistics
! Uncertainty = probability distribution forthe population parameter! In classical statistics the population parameter
θ has one single true value! Only we happen to not know it
! In Bayesian statistics we imagine a distributionof possible values of population parameter θ
! Each unknown parameter must have anassociated probability distribution
![Page 7: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/7.jpg)
7
7
Uncertainty inBayesian Statistics
! Each unknown parameter must have anassociated probability distribution! Before we have data: prior distribution! After we have data:
posterior distribution = f (prior + data)! Posterior distribution used to find estimate for
θ and confidence interval! = mode! confidence interval = central 95% regionθ
![Page 8: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/8.jpg)
8
8
Inference inBayesian Statistics
! Posterior distribution to estimate θ andconfidence interval! Posterior = f (prior + data)! Prior distribution influences posterior! Bayesian statistical inference depends partly
on the prior! Which does not depend on the data
! (in empirical Bayes it does…)
![Page 9: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/9.jpg)
9
9
Inference inBayesian Statistics
! Bayesian statistical inference dependspartly on the prior, so: which prior?! Technical considerations
! Conjugate prior (posterior belongs to samedistribution family)
! Proper prior (real probability distribution)
! Fundamental consideration! Informative prior or ignorance prior?
! Total ignorance does not exist … all priors add someinformation to the data
![Page 10: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/10.jpg)
10
10
Computational Issues inBayesian Statistics
! Posterior distribution used to find estimatefor θ and confidence interval! = mode! confidence interval = central 95% region
! Assumes simple posterior distribution, sowe can compute its characteristics
! In complex models, the posterior is oftenintractable
θ
![Page 11: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/11.jpg)
11
11
Computational Issues inBayesian Statistics
! In complex models, the posterior is oftenintractable
! Solution: approximate posterior bysimulation! Simulate many draws from posterior
distribution! Compute mode, mean, 95% interval et cetera
from simulated draws
![Page 12: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/12.jpg)
12
12
Why Bayesian Statistics?! Can do some things that cannot be done in
classical statistics! Valid in small samples
! Maximum Likelihood is not! “Asymptotically we are all dead …” (Novick)
! Always proper estimates! No negative variances
![Page 13: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/13.jpg)
13
13
Why Bayesian Statistics?
! But prior information induces bias! Biased estimates! But hopefully more precise
“In a corner of the forest, Dwells alone my HiawathaPermanently cogitating, On the normal law of errorWondering in idle moments, Whether an increased precisionMight perhaps be rather better, Even at the risk of biasIf thereby one, now and then, Could register upon the target
Kendall, 1959, ‘Hiawatha designs an experiment’(Italics mine)
![Page 14: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/14.jpg)
14
14
Simulatingthe Posterior Distribution
! Markov Chain Monte Carlo (MCMC)! Gibbs sampling! Metropolis-Hastings
! Given a draw from a specific probabilitydistribution, MCMC produces a newpseudorandom draw from that distribution! Distributions typically multivariate
![Page 15: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/15.jpg)
15
15
MCMC Issues: Burn In! Sequence of draws Z(1) → Z(2) →,…, → Z(t)
! From target distribution f(Z)! Even if Z(1) not from f(Z), the distribution of Z(t)
is f(Z), as t → ∞! So, for arbitrary Z(1), if t is sufficiently large, Z(t)
is from target distribution f(Z)! But having good starting values helps
! MCMC must run t iterations ‘burn in’ beforewe reach target distribution f(Z)
![Page 16: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/16.jpg)
16
16
MCMC Issues: Burn In! MCMC must run t iterations ‘burn in’
before we reach target distribution f(Z)! How many iterations are needed to
converge on the target distribution?
! Diagnostics! examine graph of burn in! try different starting values
![Page 17: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/17.jpg)
17
17
MCMC Issues: Monitoring! How many iterations must be monitored?
! Depends on required accuracy! Problem: successive draws are correlated
! Diagnostics! Graph successive draws! Compute autocorrelations! Raftery-Lewis: nhat = minimum for quantile! Brooks-Draper: nhat = minimum for mean
![Page 18: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/18.jpg)
18
18
Bayesian Methodsin Multilevel Regression
! Software! BUGS
! Bayesian inference Using Gibbs Sampling! Very general, difficult to use
! MLwiN! Special implementation for multilevel regression! Limitations
! No complex 1st level variances! No multivariate models! No extrabinomial variation
![Page 19: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/19.jpg)
19
19
Example Data! Popularity data
! Pupil popularity (0 … 10)! Pupil sex (0=boy, 1=girl)! Teacher experience (years)
! 2000 pupils, 100 classes
(artificial data)
![Page 20: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/20.jpg)
20
20
Popularity Example:FML Estimates
![Page 21: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/21.jpg)
21
21
Popularity Example:MLwiN MCMC Window
MCMCdefaults
![Page 22: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/22.jpg)
22
22
Popularity Example:Last 500 Iterations
What we want to see is stabledistributions, no trends
β0 looks suspicious, β1 looks fine
![Page 23: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/23.jpg)
23
23
Popularity Example:Diagnostics for Posterior of β0
High autocorrelation,Raftery-Lewis nhatvery highBurn in too short?
Try again:burn in = 5000iterations = 50000thinning = 10
![Page 24: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/24.jpg)
24
24
Popularity Example:Diagnostics for Posterior of β1
Everythinglooks fine!
![Page 25: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/25.jpg)
25
25
Popularity Example:Diagnostics for Posterior of β0
This one withburn in = 5000iterations = 50000thinning = 10
Looks better,Raftery-Lewis nhatstill higher than nrof iterations
![Page 26: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/26.jpg)
26
26
Comparison of PopularityExample Estimates
Different estimates popularity example
Model: FML MCMC,50000/5000
MCMC,5000/500
Fixed part Predictor
coefficients.e.
posteriormode s.d.
posteriormode s.d.
intercept 3.56 .17 3.57 .18 3.58 .18
pupil sex 0.85 .03 0.85 .03 0.85 .03
teacherexp.
0.09 .01 0.09 .01 0.09 .01
Randompart
intercept1 0.46 .02 0.46 .02 0.46 .02
intercept2 0.48 .07 0.50 .08 0.50 .08
![Page 27: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/27.jpg)
27
27
Monitoring Burn In
Set burn in to zero, monitor first500 iterationsSee if running mean flattens outβ0 looks suspicious, β1 looks fine
![Page 28: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/28.jpg)
28
28
Monitoring Convergence! Different starting values for the chain
! MLwiN: try IGLS & RIGLS! Run both, should give same results
! Illustration: enter starting values manually! Popularity example
! Intercept 5, sex 0.5, experience 0.1! Variance school level 0.2, pupil level 0.8
! Then set burn in to zero, monitor 500 iterations
![Page 29: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/29.jpg)
29
29
Manual Starting Values(poor estimates)
Most running means plotsflatten out before the endof the 500 iterationsWith these starting values,burn in = 1000 is safer
![Page 30: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/30.jpg)
30
30
Priors in MLwiN! Default: uninformative priors
! Flat, diffuse priors, ignorance prior! Fixed coefficients:
! Uniform prior: P(β) ∝ 1! Single variance
! Diffuse: P(1/σ2) ~ Gamma(ε,ε), ε small! Which is close to uniform prior for log(σ2)
! Covariance matrix! P(Ω−1) ~ Wishart ( )! Which is somewhat informative (N=# variances)
Ω
![Page 31: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/31.jpg)
31
31
Example withInformative Priors
! For fixed coefficients the prior is N(µ,σ2)
! For variances enter a value and thenumber of observations it is worth
![Page 32: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/32.jpg)
32
32
Results withInformative Priors
! For fixed coefficients the prior is N(µ,σ2)! Intercept N(5,5), sex N(.2,5), age N(.1,5)! No prior on variances
! Diagnostics for β0:
![Page 33: Bayesian Methods in Multilevel Regression](https://reader038.vdocuments.site/reader038/viewer/2022110221/5871aa691a28abf7458ba5f9/html5/thumbnails/33.jpg)
33
33
Copies on
http://www.fss.uu.nl/ms/jh