introducation to bayesian methods...bayesian estimation the posterior represents all the information...
TRANSCRIPT
![Page 1: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/1.jpg)
Introduction to Bayesian methods - II
Rita Almeida
2nd of February, 2016
![Page 2: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/2.jpg)
Overview
Previous part:
Bayes theorem
Examples with discrete and continuous normal variables
Posteriors, priors and likelihood
Comparison with frequentist approaches
This part:
Bayesian estimation
Bayesian testing
Bayesian model comparison
Methods to calculate the posterior
Bayesian linear and hierarchical linear models
![Page 3: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/3.jpg)
Overview
Bayesian estimation
Bayesian testing
Bayesian model comparison
Methods to calculate the posterior
Bayesian linear and hierarchical linear models
![Page 4: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/4.jpg)
Bayesian estimation
The posterior represents all the information for θ.
Sometimes one wants to provide a single value - estimator.
For δ(Y ) to be a good estimator of θ the probability of
δ(Y )− θ being close to 0 must be high.
Let L(θ,a) be the loss when the estimate is a and the truevalue is θ.
• A common loss function is L(θ, a) = (θ − a)2.
The Bayes estimator δ∗(y) of θ is:
E [L(θ, δ∗(y))|y ] = mina ∈Ω
E [L(θ,a)|y ]
![Page 5: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/5.jpg)
Bayesian estimation - example
Someone answers a test with 12 true-or-false equally difficult
questions. The answers are independent. 9 correct answers.
Estimate the probability of answering a question correctly.
Likelihood: binomial n = 12, y = 9
p(y |θ) =
(n
y
)
θy (1 − θ)n−y
Prior: uniform - Beta distribution with
parameters α0 = 1, β0 = 1
• Mean of a Beta distribution with
α, β is αα+β
Figure adapted from Wagenmakers 2007
![Page 6: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/6.jpg)
Bayesian estimation - example
Posterior: Beta with parameters
α = α0 + y = 10, β = β0 + n − y = 4
and n = 12
p(θ|y = 9) = 13
(12
9
)
θ9(1 − θ)3
Considering L(θ, a) = (θ − a)2:
δ∗(y) =α0 + y
α0 + y + β0 + n − y= 0.71
Figure adapted from Wagenmakers 2007
The estimator is calculated from the posterior.
![Page 7: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/7.jpg)
Bayesian estimation - example with normal
Suppose we have a random sample y1, y2, . . . , yn from a
normal distribution with unknown mean θ and known
variance σ2.
Considering a normal prior with mean µ and variance ν2:
p(θ) is N(µ, ν2)
The posterior becomes:
θ|y1, y2, . . . , yn ∼ N(µ1, ν21) with µ1 = w µ+ (1 − w) yn
Considering L(θ,a) = (θ − a)2: δ∗(y) = µ1
![Page 8: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/8.jpg)
Overview
Bayesian estimation
Bayesian testing
Bayesian model comparison
Methods to calculate the posterior
Bayesian linear and hierarchical linear models
![Page 9: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/9.jpg)
Bayesian testing Hypotheses can be described as prior beliefs:
p(H0), p(H1), p(H2),...
Posteriors after observing the data (2 hypotheses):
p(H0|y), p(H1|y)
Posterior odds in favor of H0 comparing to H1
p(H0|y)
p(H1|y)︸ ︷︷ ︸
posterior odds
=p(y |H0)
p(y |H1)︸ ︷︷ ︸
Bayes factor
p(H0)
p(H1)︸ ︷︷ ︸
prior odds
If H0 is complement of H1:
p(H0|y) =
(1
B
p(H1)
p(H0)+ 1
)−1
If p(H0) = p(H1) one can report Bayes factor.
![Page 10: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/10.jpg)
Bayesian testing - example
Someone answers a test with 12 true-or-false equally difficult
questions. The answers are independent. 9 correct answers.
What is the probability of random gessing?
H0 : θ = 1/2, θ is fixed
H1 : θ 6= 1/2, θ ∼ Beta(1,1)
The Bayes factor is:
B =p(y |θ = 1/2)
p(y |θ ∼ Beta(1,1))=
(129
)(1
2)12
∫ 10
p(y |θ)p(θ)dθ= 0.7
The data is 1/0.7 = 1.4 times more likely under H1.
If the priors are equal p(H0|y) = 0.41.
One determines p(H0|y) instead of p(y |H0)!
![Page 11: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/11.jpg)
Summary
Advantages of Bayesian methods:
Focus on the data collected, not on the average if one
would collect many sets of data.
Logic / more intuitive
Frequentist hypotheses testing does not necessarily do
what one would like:
• Does not give the probability of the null hypothesis.
• If the sample is large enough a small effect becomessignificant.
Principled way to take into consideration prior knowledge
and incorporate new evidence.
Unified flexible approach
![Page 12: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/12.jpg)
Overview
Bayesian estimation
Bayesian testing
Bayesian model comparison
Methods to calculate the posterior
Bayesian linear and hierarchical linear models
![Page 13: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/13.jpg)
Bayesian model comparison
In general, for a model m:
p(m|y) =p(y |m)p(m)
p(y) p(y |m) is the model evidence
Hypotheses can be seen as models.
One can compute a Bayes factor (Bij ) for two models mi
and mj :
Bij =p(y |mi)
p(y |mj)
With equal priors for models i and j , Bij is enough to
compare the models.
If Bij is large model i is more likely than model j .
![Page 14: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/14.jpg)
Bayesian model comparison
For a model m with parameters θ:
p(θ|y ,m) =p(y |θ,m) · p(θ|m)
p(y |m)
the model evidence is:
p(y |m) =
∫
p(y |θ,m)p(θ|m)dθ
![Page 15: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/15.jpg)
Overview
Bayesian estimation
Bayesian testing
Bayesian model comparison
Methods to calculate the posterior
Bayesian linear and hierarchical linear models
![Page 16: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/16.jpg)
Posterior distribution calculations
Bayesian inference is based on the posterior distribution
and functions of the posterior distribution.
These calculations are often difficult.
Methods used:
Monte Carlo/ sampling approximations asymptotically exact simple idea computationally expensive
• Markov Chain Monte Carlo (MCMC) sampling
Deterministic approximations not exact computationally efficient
• Variational Bayes
![Page 17: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/17.jpg)
Monte Carlo methods
Idea: instead of calculating the posterior analytically,
approximate it (or a function of it) by sampling.
Example:
g(θ) is a function of θ.
The aim is to estimate E [g(θ)|y ], but p(θ|y) is not possibleto integrate analytically.
Example from estimation: E [L(θ, a)|y ], L(θ, a) = (θ − a)2.
One generates a sample of size n of θ: set of independent and identically distributed θ1, θ2, . . . , θn
E(g(θ)|y) is estimated by:
g =1
n
n∑
i=1
g(θi)
![Page 18: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/18.jpg)
Markov Chain Monte Carlo
Problem: How to generate a sample θ1, θ2, . . . , θn from a
posterior distribution p(θ|y)?
In general it is not simple to generate random numbers
according to a given distribution.
Markov Chain Monte Carlo (MCMC) methods provide away to generate the samples.
MCMC methods are based on constructing a Markov chain
on the space of θ whose steady distribution as the desiredprobability p(θ|y).
The Metropolis-Hastings algorithm and Gibbs sampling are
MCMC methods.
Computationally expensive.
![Page 19: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/19.jpg)
Variational Bayes
Idea: approximate the posterior distribution by a distribution
that is easier to use.
Variational Bayes (VB) is a particular approach to
approximate a distribution.
It is not exact.
It is less computationally expensive - fast.
Often difficult to derive.
It has been very used in neuroimaging.
![Page 20: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/20.jpg)
Overview
Bayesian estimation
Bayesian testing
Bayesian model comparison
Methods to calculate the posterior
Bayesian linear and hierarchical linear models
![Page 21: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/21.jpg)
General linear model
Y = β1X1 + β2X2 + . . .+ βpXp + ǫ
y1...
yn
= β1
x11...
xn1
+ β2
x12...
xn2
+ . . .+
ǫ1...
ǫn
Y = Xβ + ǫ ǫ ∼ N(0, σ2I)
One finds parameters β’s so that y = β1X1 + β2X2 + . . . is
as close as possible to y .
The ordinary least square estimator of β is:
β = (X ′X )−1X ′y
If ǫ ∼ N(0,Σ) generalized least squares estimator is:
βGLS = (X ′Σ−1X )−1X ′Σ−1y
![Page 22: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/22.jpg)
Bayesian analysis of the general linear model
Y = Xβ + ǫ, ǫ ∼ N(0,Σ)
Likelihood: p(y |β) ∼ N(Xβ,Σ)
Assuming a normal prior: p(β) ∼ N(β0,Σ0)
The posterior is normal: p(β|y) ∼ N(β∗,Σp)
β∗ = (X ′Σ−1X +Σ−10 )−1(X ′Σ−1y +Σ−1
0 β0)
If Σ0 is large one gets GLS: β∗ ≈ (X ′Σ−1X )−1X ′Σ−1y
![Page 23: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/23.jpg)
Bayesian analysis of the general linear model
Bayesian approach: calculate probability φ that
β exceeds some specific threshold γ given the
data.
φ = p(β > γ|y)
Frequentist approach: p values reflect how
probable the data is given that there is no
effect.
p = p(f (y) > u|β = 0)
γ
β
φ
t=f(y) u
p
![Page 24: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/24.jpg)
Multilevel/hierarchical models
Multilevel / hierarchical data: data with a clusteredstructure.
• Example: repeated measures nested in subjects, subjects
nested in medical doctor, doctor nested in hospital.
For repeated measures in subjects one could consider:
taking one measure per subject
y1 y2 yk
...θ
analyzing each subject separately
1,ny
θ1
y1,1... m,ny
θm...
ym,1...
![Page 25: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/25.jpg)
Multilevel/hierarchical models Multilevel / hierarchical models can be used to analyze
such data. Some parameters will model the data within a cluster and
others across clusters. Combine information across groups, but allow group
specific characteristics.
• Example: each subject can have its specific response, but
there is also a common aspect.
1,ny
θ1
y1,1... m,nyym,1...
θm
π
...
![Page 26: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/26.jpg)
Hierarchical Bayesian linear models
Linear models are very commonly used.
For clustered data linear models can be written as
hierarchical models.
Bayesian approach - hierarchical Bayesian linear models
using priors and estimating posterior distributions.
![Page 27: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/27.jpg)
Hierarchical Bayesian linear models- example
Single-subject level:
Y = Xβ + ǫ, ǫ ∼ N(0,Σ)
p(y |β) ∼ N(Xβ,Σ)
Group level:
β = βg + η, η ∼ N(0,Σg)
p(β|βg) ∼ N(βg ,Σg)
βg ∼ f
Subject 1
Subject 3
Subject 5
Subject 2
Subject 4
Population
Shrinkage: the cluster specific estimates get shrunktowards the common mean.
More uncertainty implies more shrinkage.
• Example: unreliable subjects will count less.
![Page 28: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/28.jpg)
Hierarchical Bayesian linear models - exampleExample: Response measured across time for each subject.
yj |i response for subject i at time xj |i
yj|i ∼ N(ωj|i , λ)
ωj|i = ϕi + ξixj|i
ϕi individual intercept
ξi individual slope
ϕi ∼ N(κ, δ)
ξi ∼ N(ζ, γ)
κ, ζ ∼ N(M,H)
λ, δ, γ ∼ Γ(K , I)
Figure adapted from Kruschke 2014.
![Page 29: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/29.jpg)
Empirical Bayes methods
If a prior distribution is known - full Bayesian approach.
If a prior distribution is not known:
The data can be used to estimate:
the parameters of the prior distribution - hyperparameters
or
the prior distribution.
Empirical Bayes is used in inference with hierarchical
Bayesian models.
![Page 30: Introducation to Bayesian methods...Bayesian estimation The posterior represents all the information for θ. Sometimes one wants to provide a single value - estimator. For δ(Y) to](https://reader030.vdocuments.site/reader030/viewer/2022041108/5f0b9d987e708231d431613f/html5/thumbnails/30.jpg)
References
M. DeGroot and M. Schervish, Probability and Statistics,
2002.
A. Gelman et al., Bayesian Data Analysis, 2014.
J.K. Kruschke, Doing Bayesian Data Analysis, 2014.
E.-J. Wagenmakers, A practical solution to the pervasive
problems of p values, Psychonomic Bulletin & Review,
2007.