applied bayesian inference, ksu, april 29, 2012 § ❷ / §❷ an introduction to bayesian inference...

25
Applied Bayesian Inference, KSU, April 29, 2012 §/ § An Introduction to Bayesian inference Robert J. Tempelman 1

Upload: bertram-booth

Post on 13-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 1

§❷ An Introduction to Bayesian inference

Robert J. Tempelman

Page 2: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 2

Bayes Theorem

• Recall basic axiom of probability:– f(q,y) = f(y|q) f(q)

• Also– f(q,y) = f(q|y) f(y)

• Combine both expressions to get:

or

Posterior Likelihood * Prior

||y θ

θ yy

θf

f f

f

||θ y θy θff f

Page 3: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/

Prior densities/distributions• What can we specify for ?

– Anything that reflects our prior beliefs.– Common choice: “conjugate” prior.

• is chosen such that is recognizeable and of same form.

– “Flat” prior: . Then

– flat priors can be dangerous…can lead to improper ; i.e.

θf

|θ yf θf

constantθf | |

| |

θ y y θ θ

y θ y θ

f f f

f constant f

|θ yf |θ

θ y θf d

3

Page 4: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 4

Prior information / Objective?

• Introducing prior information may somewhat "bias" sample information; nevertheless, ignoring existing prior information is inconsistent with – 1) human rational behavior – 2) nature of the scientific method. – Memory property: past inference (posterior) can be

used as updated prior in future inference.• Nevertheless, many applied Bayesian data analysts

try to be as “objective” as possible using diffuse (e.g., flat) priors.

Page 5: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 5

Example of conjugate prior

• Recall the binomial distribution:

• Suppose we express prior belief on p using a beta distribution:

– Denoted as Beta(a,b)

!Prob | , (1 )

!( )!y n yn

n pY yn

p py y

1 1(1| , )pf p p

Page 6: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 6

Examples of different beta densities

0.0 0.2 0.4 0.6 0.8 1.0

02

46

8

p

Be

ta D

en

sitie

s

=9,=1=1,=1=2,=18

| ,pE

2var | ,

1p

Diffuse (flat) bounded prior(but it is proper since it is bounded!)

Page 7: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 7

Posterior density of p

• Posterior Likelihood * Prior

• i.e. Beta(y+a,n-y+b)

• Beta is conjugate to the Binomial

1

1 1

1(1 ) (1 )

| , , , Prob | , | ,

(1 )

y n y

y n y

f p n y n f

p p

Y y

p

p p

p p

p

Page 8: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 8

Suppose we observe data

• y = 10, n = 15.• Consider

three alternative priors:– Beta(1,1)– Beta(9,1)– Beta(2,18)

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

p

Be

ta D

en

sitie

s=19,=6=11,=6=12,=23

Posterior densities:Beta(y+a,n-y+b)

Page 9: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 9

Suppose we observed a larger dataset

• y = 100, n = 150.• Consider same alternative priors:

– Beta(1,1)– Beta(9,1)– Beta(2,18)

Posterior densities

0.0 0.2 0.4 0.6 0.8 1.0

02

46

81

0

p

Be

ta D

en

sitie

s

=109,=51=101,=51=102,=68

Page 10: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 10

Posterior information

• Given:

• Posterior information = likelihood information + prior information.

• One option for point estimate: joint posterior mode of q using Newton Raphson.– Also called MAP (maximum a posteriori) estimate of q.

fff || yy

ln | constant+ ln | lnθ y y θ θf f f

'

ln

'

|ln

'

|ln 222

fff yy

Page 11: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 11

Recall the plant genetic linkage example

• Recall

Suppose• Then

1 2 3 4

1 2 3 4

! 2 1 1|

! ! ! ! 4 4 4 4y

y y y yn

py y y y

1 1| , (1 )f

1 2 3 4

1 2 3 4

1 2 3 4

1 1

1 1

1 1

| , , | | ,

2 1(1 )

4 4 4

2 1 (1 )

2 1

y yy y y y

y y y y

y y y y

f p f

Almost as if you increased the number of plants in genotypes 2 and 3 by b-1…in genotype 4 by a-1.

Page 12: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 12

Plant linkage example cont’d.Suppose data newton; y1 = 1997; y2 = 906; y3 = 904; y4 = 32; alpha = 50; beta=500; theta = 0.01; /* try starting value of 0.50 too */ do iterate = 1 to 10; logpost = y1*log(2+theta) + (y2+y3+beta-1)*log(1-theta) + (y4+alpha-1)*log(theta); firstder = y1/(2+theta) - (y2+y3+beta-1)/(1-theta) + (y4+alpha-1)/theta; secndder = (-y1/(2+theta)**2 - (y2+y3+beta-1)/(1-theta)**2 - (y4+alpha-1)/theta**2); theta = theta + firstder/(-secndder); output; end; asyvar = 1/(-secndder); /* asymptotic variance of theta_hat at convergence */ poststd = sqrt(asyvar); call symputx("poststd",poststd); output;run;title "Posterior Standard Error = &poststd";proc print; var iterate theta logpost;run;

| , 50, 500Beta

Posterior standard error

1

2

ln ||

yy

fsd f

Page 13: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 13

OutputPosterior Standard Error = 0.0057929339

Obs iterate theta logpost

1 1 0.018318 997.95

2 2 0.030841 1035.74

3 3 0.044771 1060.65

4 4 0.053261 1071.06

5 5 0.054986 1072.79

6 6 0.055037 1072.84

7 7 0.055037 1072.84

8 8 0.055037 1072.84

9 9 0.055037 1072.84

10 10 0.055037 1072.84

11 11 0.055037 1072.84

Posterior Standard Error = 0.0057929339

Page 14: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 14

Additional elements of Bayesian inference

• Suppose that q can be partitioned into two components, a px1 vector q 1 and a qx1 vector q2,

• If want to make probability statements about q, use probability calculus:

• There is NO repeated sampling concept.– Condition on one observed dataset.– However, Bayes estimators typically do have very good

frequentist properties!

2

1

dpob yy ||Pr

Page 15: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 15

Marginal vs. conditional inference

• Suppose you’re primarily interested in q1:

– i.e. average over uncertainty on q2 (nuisance variables)

• Of course, if q2 was known, you would condition your inference on q1 accordingly:

y,yy,

yyy

y21

|2221

22121

|E||

|,||

2

2

22

pdpp

dpdpp

R

RR

y,21 | p

Page 16: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 16

Two-stage model example

• Given with yi ~ NIID (m, s2) where s2 is known. Wish to infer m. From Bayes theorem:

nyyy 21'y

2 22| , , | ,y y| a af f f

2,~ aaN

2

22

2

1exp

2

1,| a

aa

aaf

Suppose

i.e.

Page 17: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 17

Simplify likelihood

2 2

1

/2/2 222

1

, ,

12 exp

2

y| |n

ii

nnn

ii

f f y

y

2

21

1exp

2

n

ii

yy y

2

2

2

1

1exp 2

2

n

i ii

y y yy y y

n

i

y1

2

22

1exp

2

2

2

1exp y

n

Page 18: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 18

Posterior density

• Consider the following limit:

• Consistent with or

2

22

2

2

22

2| , , ,

e|1

xp,2

,y|

y

a

a

aa

a

a

yf f

n

f

n

yf aa

a2

222

2

1exp,,,|lim

2

y

constantf 1f

22 2| , , , ~ ,y a a N y

n

Page 19: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 19

Interpretation of Posterior Density with Flat Prior

• So

• Then

• i.e.

222 ,,,| |y|yy ffff

22 ,,|

|yy fArgMaxfArgMax

2 2Posterior mode | , ML | ,y y y

Page 20: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 20

Posterior density with informative prior

• Now

After algebraic simplication:

n

yf

a

aaa 2

2

2

222

2

1exp,,,|

y

n

n

n

ny

Nf

a

a

a

aa

aa 22

22

22

2

22

22 ~,~~,,,|

y

Page 21: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 21

• Note that

a

a

a

a

a

aa

n

n

y

nn

ny

2

2

2

2

2

222

22

11

1~

12

1 12 2

12

1 12 2

a

a a

a

n

n n

y

122

122

2

2

121

a

a

a

n

n

n

Posterior precision = prior precision + sample (likelihood) precisioni.e., weighted average of data

mean and prior mean

Page 22: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 22

Hierarchical models

• Given

• Two stage:

• Three stage:

– What’s the difference? When do you consider one over another?

2

1

1 2 1 1 2| , | |θ y θ y θ θ θp p p

1 2 1 1 2 2, | | |θ θ y y θ θ θ θp p p p

Page 23: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 23

Simple hierarchical model

• Random effects model– Yij = m + ai + eij

m: overall mean, ai ~ NIID(0,t2) ; eij ~ NIID(0,s2).

Suppose we knew m , s2, and t2:

| 1yi iBE y

2

| 1yi BVarn

2

22

nB

n

Shrinkage factor

Page 24: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 24

What if we don’t know m , s2, or t2?

• Option 1: Estimate them:

• Then “plug them” in.

• Not truly Bayesian.– Empirical Bayesian (EB) (next section).– Most of us using PROC MIXED/GLIMMIX are EB!

k

yy

k

ii

)1(ˆ ,

2

2

kn

yyji

iij

2

2

ii

n y y

kn

ˆ ˆ| 1 ˆyi iBE y 2ˆ

| 1 ˆyi BVarn

e.g.method of moments

Page 25: Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1

Applied Bayesian Inference, KSU, April 29, 2012

§❷/ 25

A truly Bayesian approach

• 1) Yij|qi ~ N(qi,s2) ; for all i,j

• 2) q1, q2, …, qk are iid N(m, t2)o Structural prior (exchangeable entities)

• 3) m ~ p(m); t2~ p(t2); s2 ~ p(s2)o Subjective prior

22

1 1

2221 |||,,,,...,, ppppypp

k

ii

n

jiijak

y

2 21 2

2 21 2 1 1

, ,..., , , , |... ...

,..., ,...,

yk a

i i k

p

d d d d d d d d

y|ip

Fully Bayesian inference (next section after that!)