applied bayesian inference, ksu, april 29, 2012 §  / §❹ the bayesian revolution: markov chain...

Download Applied Bayesian Inference, KSU, April 29, 2012 §  / §❹ The Bayesian Revolution: Markov Chain Monte Carlo (MCMC) Robert J. Tempelman 1

Post on 16-Dec-2015

215 views

Category:

Documents

1 download

Embed Size (px)

TRANSCRIPT

  • Slide 1
  • Applied Bayesian Inference, KSU, April 29, 2012 / The Bayesian Revolution: Markov Chain Monte Carlo (MCMC) Robert J. Tempelman 1
  • Slide 2
  • Applied Bayesian Inference, KSU, April 29, 2012 / Simulation-based inference Suppose youre interested in the following integral/expectation: You can draw random samples x 1,x 2,,x n from f(x). Then compute With Monte Carlo Standard Error: As n 2 f(x): density g(x): function.
  • Slide 3
  • Applied Bayesian Inference, KSU, April 29, 2012 / Beauty of Monte Carlo methods You can determine the distribution of any function of the random variable(s). Distribution summaries include: Means, Medians, Key Percentiles (2.5%, 97.5%) Standard Deviations, Etc. Generally more reliable than using Delta method especially for highly non-normal distributions. 3
  • Slide 4
  • Applied Bayesian Inference, KSU, April 29, 2012 / Using method of composition for sampling (Tanner, 1996). Involve two stages of sampling. Example: Suppose Y i | i ~Poisson( i ) In turn., i | , ~ Gamma( , ) Then 4 negative binomial distribution with mean / and variance ( / )(1+ -1 ).
  • Slide 5
  • Applied Bayesian Inference, KSU, April 29, 2012 / Using method of composition for sampling from negative binomial: data new; seed1 = 2; alpha = 2; beta = 0.25; do j = 1 to 10000; call rangam(seed1,alpha,x); lambda = x/beta; call ranpoi(seed1,lambda,y); output; end; run; proc means mean var; var y; run; 5 1.Draw i | , ~ Gamma( , ). 2.Draw Y i ~Poisson( i ) The MEANS Procedure VariableMeanVariance y7.974939.2638 E(y) = / / Var(y) = ( / )(1+ -1 ) = 8*(1+4)=40
  • Slide 6
  • Applied Bayesian Inference, KSU, April 29, 2012 / Another example? Student t. data new; seed1 = 29523; df=4; do j = 1 to 100000; call rangam(seed1,df/2,x); lambda = x/(df/2); t = rannor(seed1)/sqrt(lambda); output; end; run; proc means mean var p5 p95; var t; run; data new; t5 = tinv(.05,4); t95 = tinv(.95,4); run; proc print; run; 6 1.Draw i | ~ Gamma( , ). 2.Draw t i | i ~Normal(0,1/ i ) Then t ~ Student t VariableMeanVariance5th Pctl95th Pctl t-0.005242.011365-2.13762.122201 Obst5t95 1-2.13192.13185
  • Slide 7
  • Applied Bayesian Inference, KSU, April 29, 2012 / Expectation-Maximization (EM) Ok, I know that EM is NOT a simulation-based inference procedure. However, it is based on data augmentation. Important progenitor of Markov Chain Monte Carlo (MCMC) methods Recall the plant genetics example 7
  • Slide 8
  • Applied Bayesian Inference, KSU, April 29, 2012 / Data augmentation Augment data by splitting first cell into two cells with probabilities and /4 for 5 categories: Looks like a Beta Distribution to me! 8
  • Slide 9
  • Applied Bayesian Inference, KSU, April 29, 2012 / Data augmentation (contd) So joint distribution of complete data: Consider the part just including the missing data binomial 9
  • Slide 10
  • Applied Bayesian Inference, KSU, April 29, 2012 / Expectation-Maximization. Start with complete log-likelihood: 1. Expectation (E-step) 10
  • Slide 11
  • Applied Bayesian Inference, KSU, April 29, 2012 / 2. Maximization step Use first or second derivative methods to maximize Set to 0: 11
  • Slide 12
  • Applied Bayesian Inference, KSU, April 29, 2012 / Recall the data ProbabilityGenotypeData (Counts) Prob(A_B_)y 1 =1997 Prob(aaB_)y 2 =906 Prob(A_bb)y 3 =904 Prob(aabb)y 4 =32 0 1 0: close linkage in repulsion 1: close linkage in coupling 12
  • Slide 13
  • Applied Bayesian Inference, KSU, April 29, 2012 / PROC IML code: proc iml; y1 = 1997; y2 = 906; y3 = 904; y4 = 32; theta = 0.20; /*Starting value */ do iter = 1 to 20; Ex2 = y1*(theta)/(theta+2); /* E-step */ theta = (Ex2+y4)/(Ex2+y2+y3+y4);/* M-step */ print iter theta; end; run; itertheta 10.1055303 20.0680147 30.0512031 40.0432646 50.0394234 60.0375429 70.036617 80.0361598 90.0359338 100.0358219 110.0357666 120.0357392 130.0357256 140.0357189 150.0357156 160.0357139 170.0357131 180.0357127 190.0357125 200.0357124 13 Slower than Newton-Raphson/Fisher scoringbut generally more robust to poorer starting values.
  • Slide 14
  • Applied Bayesian Inference, KSU, April 29, 2012 / How derive an asymptotic standard error using EM? From Louis (1982): Given: 14
  • Slide 15
  • Applied Bayesian Inference, KSU, April 29, 2012 / Finish off Now Hence: 15
  • Slide 16
  • Applied Bayesian Inference, KSU, April 29, 2012 / Stochastic Data Augmentation (Tanner, 1996) Posterior Identity Predictive Identity Implies Transition function for Markov Chain 16 Suggests an iterative method of composition approach for sampling
  • Slide 17
  • Applied Bayesian Inference, KSU, April 29, 2012 / Sampling strategy from p( |y) Start somewhere: (starting value = ) Sample x [1] from Sample ] from Sample x [2] from Sample ] ] from etc. Its like sampling from E-steps and M-steps Cycle 1 Cycle 2 17
  • Slide 18
  • Applied Bayesian Inference, KSU, April 29, 2012 / What are these Full Conditional Densities (FCD) ? Recall complete likelihood function Assume prior on is flat : FCD: Beta( =(y 1 -x +y 4 +1), =(y 2 +y 3 +1)) Binomial(n=y 1, p = 2/( +2)) 18
  • Slide 19
  • Applied Bayesian Inference, KSU, April 29, 2012 / IML code for Chained Data Augmentation Example proc iml; seed1=4; ncycle = 10000; /* total number of samples */ theta = j(ncycle,1,0); y1 = 1997; y2 = 906; y3 = 904; y4 = 32; beta = y2+y3+1; theta[1] = ranuni(seed1); /* initial draw between 0 and 1 */ do cycle = 2 to ncycle; p = 2/(2+theta[cycle-1]); xvar= ranbin(seed1,y1,p); alpha = y1+y4-xvar+1; xalpha = rangam(seed1,alpha); xbeta = rangam(seed1,beta); theta[cycle] = xalpha/(xalpha+xbeta); end; create parmdata var {theta xvar }; append; run; data parmdata; set parmdata; cycle = _n_; run; 19 Starting value
  • Slide 20
  • Applied Bayesian Inference, KSU, April 29, 2012 / Trace Plot proc gplot data=parmdata; plot theta*cycle; run; Burn -in? bad starting value Should discard the first few samples to ensure that one is truly sampling from p( |y) Starting value should have no impact. Convergence in distribution. How to decide on this stuff? Cowles and Carlin (1996) 20 Throw away the first 1000 samples as burn-in
  • Slide 21
  • Applied Bayesian Inference, KSU, April 29, 2012 / Histogram of samples post burn-in proc univariate data=parmdata ; where cycle > 1000; var theta ; histogram/normal(color=red mu=0.0357 sigma=0.0060); run; Bayesian inference N9000 Posterior Mean0.03671503 Post. Std Deviation0.00607971 Quantiles for Normal Distribution PercentQuantile Observed (Bayesian) Asymptotic (Likelihood) 5.00.027020.02583 95.00.047280.04557 Asymptotic Likelihood inference 21
  • Slide 22
  • Applied Bayesian Inference, KSU, April 29, 2012 / Zooming in on Trace Plot Hints of autocorrelation. Expected with Markov Chain Monte Carlo simulation schemes. Number of drawn samples is NOT equal number of independent draws. 22 The greater the autocorrelationthe greater the problemneed more samples!
  • Slide 23
  • Applied Bayesian Inference, KSU, April 29, 2012 / Sample autocorrelation Autocorrelation Check for White Noise To LagChi- Square DFPr > ChiSq Autocorrelations 63061.396 1000; identify var= theta nlag=1000 outcov=autocov ; run; 23
  • Slide 24
  • Applied Bayesian Inference, KSU, April 29, 2012 / How to estimate the effective number of independent samples (ESS) Consider posterior mean based on m samples: Initial positive sequence estimator (Geyer, 1992; Sorensen and Gianola, 1995): Lag-m autocovariance 24 Sum of adjacent lag autocovariances variance
  • Slide 25
  • Applied Bayesian Inference, KSU, April 29, 2012 / Initial positive sequence estimator Choose t such that all SAS PROC MCMC chooses a slightly different cutoff (see documentation). 25 Extensive autocorrelation across lags..leads to smaller ESS
  • Slide 26
  • Applied Bayesian Inference, KSU, April 29, 2012 / SAS code %macro ESS1(data,variable,startcycle,maxlag); data _null_; set &data nobs=_n;; call symputx('nsample',_n); run; proc arima data=&data ; where iteration > &startcycle; identify var= &variable nlag=&maxlag outcov=autocov ; run; proc iml; use autocov; read all var{'COV'} into cov; nsample = &nsample; nlag2 = nrow(cov)/2; Gamma = j(nlag2,1,0); cutoff = 0; t = 0; do while (cutoff = 0); t = t+1; Gamma[t] = cov[2*(t-1)+1] + cov[2*(t-1)+2]; if Gamma[t] < 0 then cutoff = 1; if t = nlag2 then do; print "Too much autocorrelation"; print "Specify a larger max lag"; stop; end; varm = (-Cov[1] + 2*sum(Gamma)) / nsample; ESS = Cov[1]/varm; /* effective sample size */ stdm = sqrt(varm); parameter = "&variable"; /* Monte Carlo standard error */ print parameter stdm ESS; run; %mend ESS1; 26 Recall: 9000 MCMC post burnin cycles.
  • Slide 27
  • Applied Bayesian Inference, KSU, April 29, 2012 / Executing %ESS1 %ESS1(parmdata,theta,1000,1000); 27 Recall: 1000 MCMC burnin cycles. parameterstdmESS theta0.00011162967.1289 i.e. information equivalent to drawing 2967 independent draws from density.
  • Slide 28
  • Applied Bayesian Inference, KSU, April 29, 2012 / How large of an ESS should I target? Routinelyin the thousands or greater. Depends on what you want to estimate. Recommend no less than

Recommended

View more >