applied bayesian inference, ksu, april 29, 2012 §. ❶ / §❶ review of likelihood inference...

Applied Bayesian Inference, KSU, April 29, 2012

§.❶/ 1

§❶ Review of Likelihood Inference

Robert J. Tempelman


§.❶/ 2

Likelihood Inference• Necessary prerequisites to understanding Bayesian

inference– Distribution theory– Calculus– Asymptotic theory (e.g, Taylor expansions)– Numerical Methods/Optimization– Simulation-based analyses– Programming Skills

• SAS PROC ???? or R package ???? is only really a start to understanding data analysis.

• I don’t think that SAS PROC MCMC (version 9.3)/WinBuGs is a fix to all of your potential Bayesian inference problems.

Data Analysts: Don’t throw away that Math Stats text just yet!!!

Meaningful computing skills is a plus!


§.❶/ 3

The “simplest” model

• Basic mean model:– Common distributional assumption:

• What does this mean? Think pdf!!!

; 1, 2,...,i iy e i n 2~ 0,i ee NIID

2

222

1~ | , exp

22

ii i e

ee

yy p y

1

222

3 221 1

1~ | , exp

22y

n ni

i ei i ee

n

y

yy

y p y

y

pdf: probability density function

joint pdf is product of independent pdf’s

Conditional independence


§.❶/ 4

Likelihood function• Simplify joint pdf further

• Regard joint pdf as function of parameters

22

/2 /1

22 221

2

2

1 1| exp exp

2 22,

2y

n

ni

i

ii

n n

yy

p

2 1

/2 22/2

2

2

1exp,

2y|

n

i

n

in

yL

2

/22 2 12

, exp2

|y

n

in

i

yL

‘proportional to’


§.❶/ 5

Maximum likelihood estimation• Maximize with respect to unknowns.

– Well actually, we directly maximize log likelihood

– One strategy: Use first derivatives:

• i.e., determine and and set to 0.

– Result?

2, |yL

l 2

l

1ˆ

n

ii

yML y

n

2

2 2 1

ˆˆ

n

iiML

n

y

2

2 2 2 12

, log , (constant) log2 2

|y |y

n

ii

yn

l L


§.❶/ 6

Example Data, Log Likelihood & Maximum Likelihood estimates

55

33

45

49

38

y=

44ML

2 60.8ML


§.❶/ 7

Likelihood inference for discrete data• Consider the binomial distribution:

!Prob | , (1 )

!( )!y n yn

n pY yn

p py y

| (1!

!() (

!1, )

)y n y y n yp

n

y n yL n p p py p

constant| , log log(1 )l p y n y p n y p

| ,( 1)

1

l p y n n yy

p p p

0)1(

ˆ1ˆ

p

yn

p

y

Set to zero

py

n→


§.❶/ 8

Sometimes iterative solutions are required

• First derivative based methods can be slow for some problems.

• Second-derivative methods are often desirable, e.g. Newton-Raphson– Generally faster

– Provide asymptotic standard errors as useful by-product

2

21

1

|ˆ ˆ | y y

ii

i i

ll


§.❶/ 9

Plant Genetics Example(Rao, 1971)

• y1, y2, y3, and y4 are the observed numbers of 4 different phenotypes involving genotypes different at two loci from the progeny of self-fertilized heterogygotes (AaBb). It is known that under genetic theory that the distribution of four different phenotypes (with complete dominance at each loci) is multinomial.


§.❶/10

ProbabilitiesProbability Genotype Data (Counts)

Prob(A_B_) y1=1997

Prob(aaB_) y2=906

Prob(A_bb) y3=904

Prob(aabb) y4=32

p3

1

4

p4 4

p2

1

4

p1

2

4

0 1 → 0: close linkage in repulsion → 1: close linkage in coupling


§.❶/11

Genetic Illustration of Coupling/Repulsion

Coupling Repulsion

A

B

a

b

A

b

a

B

= 1 = 0


§.❶/12

Likelihood function

• Given:

1 2 3 4

1 2 3 4

! 2 1 1

! ! ! ! 4 4 4 4|y

y y y yn

py y y y

1 2 3 4

1 2 3 4

2 1 1

4 4 4 4

2 1 1

| yy y y y

y y y y

L

log1log1log2log|log| 4321 yyyyLl yy


§.❶/13

First and second derivatives

• First derivative:

• Second derivative:

• Recall Newton Raphson algorithm:

4321

112

| yyyyl

y

231 2 4

2 2 22 2

|

2 1 1

yl yy y y

1

2

1 2

||ˆ ˆyy

ii

i i

ll


§.❶/14

Newton Raphson:SAS data step and output

data newton; y1 = 1997; y2 = 906; y3 = 904; y4 = 32; theta = 0.01; /* try starting value of 0.50 too */ do iterate = 1 to 5; loglike = y1*log(2+theta) + (y2+y3)*log(1-theta) + y4*log(theta); firstder = y1/(2+theta) - (y2+y3)/(1-theta) + y4/theta; secndder = (-y1/(2+theta)**2 - (y2+y3)/(1-theta)**2 - y4/theta**2); theta = theta + firstder/(-secndder); output; end; asyvar = 1/(-secndder); /* asymptotic variance of theta_hat at convergence */ output;run;proc print data=newton; var iterate theta loglike;run;

iterate theta loglike

1 0.034039 1228.62

2 0.035608 1247.07

3 0.035706 1247.10

4 0.035712 1247.10

5 0.035712 1247.10

ˆ 0.0357ML


§.❶/15

Asymptotic standard errors

• Given:

121

2ˆ

5

ˆ

ˆvar 3.6 10|

*( ) yl

xI

12

2

ˆ

|ˆ 0.0060yl

se

Observed information

proc print data=newton; var asyvar;run;


§.❶/16

Alternative to Newton Raphson

• Fisher’s scoring– Substitute for in

Newton Raphson .– Now

– Then

2

2

|Ey

yl

2

2

| yl

4

211

nnpyE

4

122

nnpyE

4

133

nnpyE

444

nnpyE

22222

24

14

1

14

1

24

2|

E

nnnnl

y

4141424

nnnn

Expected information


§.❶/17

Fisher scoring:SAS data step and output:

data newton; y1 = 1997; y2 = 906; y3 = 904; y4 = 32; theta = 0.01; /* try starting value of 0.50 too */ do iterate = 1 to 5; loglike = y1*log(2+theta) + (y2+y3)*log(1-theta) + y4*log(theta); firstder = y1/(2+theta) - (y2+y3)/(1-theta) + y4/theta; secndder = (n/4)*(-1/(2+theta) - 2/(1-theta) - 1/theta); theta = theta + firstder/(-secndder); output; end; asyvar = 1/(-secndder); /* asymptotic variance of theta_hat at convergence */ output;run;proc print data=newton; var iterate theta loglike;run;

iterate theta loglike

1 0.034039 1228.62

2 0.035608 1247.07

3 0.035706 1247.10

4 0.035712 1247.10

5 0.035712 1247.10 2

2ˆ

1 1ˆˆ 0.0058ˆ |

Ey

selI

In some applications, Fisher’s scoring is easier than Newton Raphson…but observed information probably more reliable than expected information(Efron and Hinckley, 1978 )


§.❶/18

Extensions to multivariate q.

• Suppose that q is p x 1 vector.• Newton Raphson

• Fisher’s scoring

• or

tt

lltt

ˆ

1

ˆ

2

1

|,

'

|Eˆˆ

yy

y

tt

lltt

ˆ

1

ˆ

2

1

|,

'

|ˆˆ

yy

12

1ˆˆ

E | |,ˆ ˆ'

θ θθ θ

θ y θ yθ θ

y θ θ θtt

t t

l l


§.❶/19

Generalized linear models• For multifactorial analysis of non-normal (binary,

count) data.• Consider the probit link binary model.

– Implies the existence of normally distributed latent (underlying) variables (i ).

– Could do something similarly for logistic link binary model

• Consider a simple population mean model:– i = m + ei ; ei ~ N(0, e

2 )

– Let = 10 and e = 2


§.❶/20

The liability (latent variable) concept

DENSITY

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.10

0.11

0.12

0.13

0.14

0.15

0.16

0.17

0.18

0.19

0.20

LIABLTY

4 5 6 7 8 9 10 11 12 13 14 15 16

Pr Pr Pr .ob ob ob z

12

10

2

12 10

21 1 1 1587

=12 (“THRESHOLD”)

= 10

e = 2

i.e. probability of “success” = 15.87%

i

pdf(i )

Y=1(“success”)

Y=0(“failure”)


§.❶/21

Inferential utopia!

• Suppose we’re able to measure the liabilities directly– Also suppose a more general multi-population(trt) model

= Xa + e; e ~ N(0, R); typically R = Is2

= ML(a) = OLS(a):

But (sigh…), we of course don’t generally observe l

1 2= 'n

11 RX'XRX ̂'

α̂


§.❶/22

Suppose there are 3 subclasses

Mean liabilities

• Use “corner parameterization”:

1 1

2 2

3 3

9

10

11

α

1

2

11

2

1

α

' 1 1 0xi

' 1 0 1xi

' 1 0 0xi

X

x

x

x

1

2

/

/

/

n

= Xa + e

Herd 1

Herd 2

Herd 3


§.❶/23

Probability of success as function of effects (can’t observe liabilities…just observed binary data)

• Shaded areas

6 8 10 12 14 16 18

0.0

00

.05

0.1

00

.15

0.2

0

liabilityd

en

sity

Herd 1Herd 2Herd 3

9 12 9Pr 12 | 1 Pr

2 2

Pr 1.5 1 1.5 0.067

ob herd ob

ob z

10 12 10Pr 12 | 2 Pr

2 2

Pr 1.0 1 1.5 0.1587

ob herd ob

ob z

11 12 11Pr 12 | 3 Pr

2 2

Pr 0.5 1 0.5 0.309

ob herd ob

ob z


§.❶/24

Reparameterize model

• Let

• d and xi'a = (m + xi'*a*) cannot be estimated separately from s2

e….i.e., s2e not identifiable.

'* 1 0xi

'* 0 1xi

'* 0 0xi

' *x* αi i i i ie e 1

2

*α

'

Prob Prob Pr

*Prob 1 1

x* α

i i ii

e e

ii i

ee e

ith animal is diseased ob

z

Herd 1

Herd 2

Herd 3


§.❶/25

Reparameterize the model again.

• consider the remaining parameters as standardized ratios: t = d/ se, x = m/se, and b = a*/se -> same as constraining se = 1. 'Prob 1 * x* β iith animal is diseased

3 4 5 6 7 8 9

0.0

0.1

0.2

0.3

0.4

liability

de

nsi

ty

Herd 1Herd 2Herd 3

Notice that the new threshold is now 12/2 = 6, whereas the mean responses for the three herds are now 9/2, 10/2 and 11/2


§.❶/26

There is still another identifiability problem

• Between t and x• One solution?

– “zero out” t.

'Prob 1 * x* β iith animal is diseased

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

liability

de

nsi

ty

Herd 1Herd 2Herd 3

'

'

Prob

1 0 *

1 0

x* β

x β

i

i

ith animal is diseased

Notice that the new threshold is now 0, whereas the mean responses for the three herds are now -1.5, -1 and -0.5


§.❶/27

• Note that

-4 -2 0 2 40

.00

.10

.20

.30

.4

liability

de

nsi

ty

Herd 1Herd 2Herd 3

higher values of translate into lower probabilities of disease

'Prob 1 1 x βi iith animal is healthy p

'ii x=

iip iip 11

'

' '

Prob

1 0

1 0

*x* β

x β x β

i

i

i i

p ith animal is diseased


§.❶/28

Deriving likelihood function

• Given: • i.e.,

• Suppose you have second animal (i’)

• Suppose animals i and i’ are conditionally independent

• Example

1Prob( ) 1yy

i i iy y p p y = 0,1

1 11Prob( 1) 1i i i iy p p p

1 00Prob( 0) 1 1i i i iy p p p

1' ' 'Pr ( ) 1yy

i i iob y y p p

11 221

' ' '

1

1 2 1Prob( 1, )zz

i

zzi i i i iy z p py z p p

1 11'

1

'

0

'0

'1Prob( , 1)0 1 1i i ii i i iiy p p py p p p


§.❶/29

Deriving likelihood function

• More general case

– conditional independence• So…likelihood function for probit model:

• Alternative: logistic model:

1 21 2

1 1 2 2

1 1 1

1 1 2 2

1

1

Prob( , ,..., )

1 1 ..... 1

1

nn

ii

n n

z z zzz zn n

nzz

i ii

y z y z y z

p p p p p p

p p

n

i

y

i

y

iiiL

1

1'' 1| xxy

n

i

y

i

y

i

i

ii

L1

1

''

'

exp1

1

exp1

exp|

xx

xy

'

'

exp1

exp

i

iip

x

x→


§.❶/30

Small probit regression example

• Data Yi 1 1 0 0 1 0 0 0 1 0 0 0

Xi 40 40 40 43 43 43 47 47 47 50 50 50

'1'2'3'4'5'6'7'8'9'10'11'12

1 40

1 40

1 40

1 43

1 43

1 43

1 47

1 47

1 47

1 50

1 50

1 50

x

x

x

x

x

xX

x

x

x

x

x

x

1

1

0

0

1

0

0

0

1

0

0

0

y

'1Prob 1 x βi i i o iy x

iooi xy 111 ,|E

Link function = probit

1

1

1 11

β ,β |

β β 1 β β

y

i i

o

ny y

o i o ii

L

x x


§.❶/31

Log likelihood

• Newton Raphson equations can be written as:

1

1 11

log β ,β |

log β β 1 log 1 β β

yo

n

i o i i o ii

L

y x y x

vXWXX 'ˆˆ' ][]1[ tt

21

2

log β ,β | yoii

i

Lw

1log β ,β | yoi

i

Lv

W iidiag w v iv

21

2

log β ,β |E Ey y

yo

i

ii

Lw

Fisher’s scoring: E

yW iidiag w

'

1β β

=x βi i

o ix


§.❶/32

A SAS programdata example_binary; input x y; cards; 40 1 40 1 40 0 43 0 43 1 43 0 47 0 47 0 47 1 50 0 50 0 50 0;

proc genmod data=example_binary descending; class y; model y = x /dist=bin link=probit; contrast 'slope ' x 1;run;


§.❶/33

Key outputCriteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Log Likelihood -6.2123

Analysis Of Maximum Likelihood Parameter Estimates

Parameter

DF Estimate Standard Error

Wald 95% Confidence Limits

Wald Chi-Square

Pr > ChiSq

Intercept 1 7.8233 5.2657 -2.4974 18.1439 2.21 0.1374

x 1 -0.1860 0.1194 -0.4199 0.0480 2.43 0.1192

Scale 0 1.0000 0.0000 1.0000 1.0000

Contrast Results

Contrast DF Chi-Square Pr > ChiSq Type

slope 1 2.85 0.0913 LR


§.❶/34

Wald test

• Asymptotic inference:

– Reported standard errors are square roots of diagonals.

• Hypothesis test: on K’b = 0:

When is n “large enough” for this to be trustworthy????

12

1

ˆ

|ˆvar ''

β β

β yβ X WX

β β

l

11 2( ')

ˆ ˆ' ~ KK'β K' X WX K K'β nrow


§.❶/35

Likelihood ratio test

proc genmod data=example_binary descending; class y; model y = /dist=bin link=probit;run;

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Log Likelihood -7.6382

11

1β ,β 0 | β 1 βy

n

o o oi

ii yyL

-2 (logLreduced - logLfull) = -2(-7.63 - -6.210) =2.84

Ho: 1 = 0 is Prob(21 >2.84) = .09.

Reduced Model:

Again..asymptotic


§.❶/36

A PROC GLIMMIX “fix” for uncertainty:use asymptotic F-tests rather than c2-tests

proc glimmix data=example_binary ; model y = x /dist=bin link=probit; contrast 'slope ' x 1;run;

Type III Tests of Fixed Effects

Effect Num DF Den DF F Value Pr > F

x 1 10 2.43 0.1503

Contrasts

Label Num DF Den DF F Value Pr > F

slope 1 10 2.43 0.1503

“less asymptotic?”


§.❶/37

Ordinal Categorical Data

• How I learned this?– “Sire evaluation for ordered categorical data with a

threshold model” by Dan Gianola and Jean Louis Foulley (1983) in Genetics, Selection, Evolution 15:201-224. (GF83)

– See also Harville and Mee (1984) Biometrics (HM84)

• Application:– Calving ease scores (0= unassisted, 5 = Caesarean)– Determined by underlying continuous liability

relative to set of thresholds: 0 1 2 1.... m m


§.❶/38

• Liabilities:

• Consider three different herds/subclasses:

1

1 2

1

1

2o i

ii

m i m

L

Ly

m L

L X eμ

~ ( , )e 0 I2 2e e| N

1 1

2 2

3 3

9

10

11

μ

1

1 2

2

1 8 2 8 12

3 12

o i

i i

i

Ly L

L

e = 2


§.❶/39

Underlying normal densities for each of three herds.

• Probabilities highlighted for Herd 2

5 10 15

0.0

00

.05

0.1

00

.15

0.2

0

liability

de

nsi

ty

Herd 1Herd 2Herd 3

1 2 31 2 3

1 2

2 1 2

Prob 8 | 10, 2

Prob

Prob 1.0 1.0 0.1584

e

e e

L

L

z

1 2

1 2 2 2 2

Prob 8 12 | 10, 2

Prob

Prob 1.0 1.0 1.0 1.0 0.6286

e e e

L

L

z

2 2 22Pr 12 | 10, 2 Pr Pr 1.0 1 1.0 0.1584

e e

Lob L ob ob z


§.❶/40

Constraints• Not really possible to separately estimate e from 1 , 2 , 1,

2, and 3. Define then L* = L/ e, 1 * = 1 /e, 1* = 1/e , 2* = 2/e , and 3* = 3/e .

2 4 6 8

0.0

0.1

0.2

0.3

0.4

liability

de

nsi

ty

Herd 1Herd 2Herd 3

1 2

2 1 2

8 10Prob * * | *

2 2

Prob * * * *

L

L

1 2

1 2 2 2 2

8 12 10Prob * * * | *

2 2 2

Prob * * * * * *

Prob 1.0 1.0

1.0 1.0 0.6286

L

L

z

22 2 2

12 10Pr * | * Pr * * * * Pr 1.0 1 1.0 0.1584

2 2ob L ob L ob z

2 4 6 8


§.❶/41

Yet another constraint requiredSuppose we use the corner parameterization:

when expressed as a ratio over se is

Such that t1* or t2* are not separately identifiable from m*

t1**= t1* - m* = 4.0- 5.5 = -1.5

t2**= t2* - m* = = 6.0- 5.5 = +0.5

1

2

11

2

1

1

2

* 5.5

* 1

* 0.5

1 1

2 2

3

* * * * 4.5

* * * * 5.0

* * * 5.5

i.e., zero out *


§.❶/

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

liability

de

nsi

ty

Herd 1Herd 2Herd 3

1

2

3

* 1.0

* 0.5

* 0.0

1**= 1* - * = 4.0- 5.5 = -1.5 2**= 2* - * = = 6.0- 5.5 = +0.5

42

m**


§.❶/43

Alternative constraint.

Estimate m but “zero out” one of t1 or t2 ,say t1

Start with

and t1* = 4.0 and t2

* = 6.0.

Then: m**= m*-t1

* = 5.5-4.0 = 1.5

t2** = t2

* -t1*= 6.0 - 4.0 = 2.0

1

2

* 5.5

* 1

* 0.5

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

liability

de

nsi

ty

Herd 1Herd 2Herd 3

1

2

** 1.5

* 1

* 0.5


§.❶/44

One last constraint possibility

• Setting t1 = 0 and t2 to arbitrary value > t1 and infer upon se

• Say se = 2.

• t1 fixed to 0; t2 fixed to 4

1

2

** 3.0

* 2

* 1

-5 0 5

0.0

00

.05

0.1

00

.15

0.2

0

liability

de

nsi

ty

Herd 1Herd 2Herd 3


§.❶/45

Likelihood function for Ordinal Categorical Data

Based on the multinomial (m categories)

whereandLikelihood:

Log Likelihood:

I 1 I 2 I I1 2

1

Prob i i i i

my y y m y k

i i i im ikk

Y y P P P P

1ik k i k iP 'x βi i

I

1 1 1

Pr i

n n mY k

i iki i k

L ob y y P

1 1

log I logn m

i iki k

L Y k P


§.❶/46

Hypothetical small example

• Ordinal outcome having 3 possible categories:• Two subjects in the dataset:

– first subject has a response of 1 whereas the second has a response of 3.

– Their contribution to the log likelihood:

' '3 2 2 2

'2

' '1 1 0

'2

1

1 1

log

lo

log

lo g 1g

x β x β x βx β

βx xβ


§.❶/48

Setting up Fisher’s scoring2nd derivatives(see GF83 or HM84 for details)

• now ( 1) 2

1 ( 1)

; 1,2,... 1 n

ik i kkk k i

i ik i k

P Pt k m

P P

11,

1 ( 1)

; 1,2,... 1 n

k i k ik k

i i k

t k mP

1 1,

1 1

nk i k i k i k i

j k k ii ik i k

lP P

2

1

1 1

; 1,2,...,n m

k i k iii

i k ik

w j pP

𝑗=1,2 , ... ,𝑝𝑘=1,2 , ..𝑚− 1


§.❶/49

Setting up Fisher’s scoring1nd derivatives (see GF83 for details)

• Now

• with

log ,|

log ,|

log ,| '

θ y

pθ y τθ y X vθ

β

L

L

L

1

1 1

11ni kik

ki ik i k

I YI Yp

P P

1

1

mk i k i

ik ik

vP


§.❶/50

Fisher’s scoring algorithm

• So

[ 1] [ ]ˆ ˆ'ˆ ˆ' ' '

τ τT L X p

X L X WX X vβ β

t t


§.❶/51

Data from GF (1983) H A G S Y H A G S Y H A G S Y 1 2 M 1 1 1 2 F 1 1 1 3 M 1 1 1 2 F 2 2 1 3 M 2 1 1 3 M 2 3 1 3 F 2 1 1 3 F 2 1 1 3 F 2 1 1 2 M 3 1 1 2 M 3 2 1 3 F 3 2 1 3 M 3 1 2 2 F 1 1 2 2 F 1 1 2 2 M 1 1 2 3 M 1 3 2 2 F 2 1 2 2 F 2 3 2 3 M 2 1 2 2 F 3 2 2 3 M 3 3 2 2 M 4 2 2 2 F 4 1 2 3 F 4 1 2 3 F 4 1 2 3 M 4 1 2 3 M 4 1

H: Herd (1 or 2)A: Age of Dam (2 = Young heifer, 3 = Older cow)G: Gender or sex (M and F)S: Sire of calf (1, 2, 3, or 4)Y: Ordinal Response (1,2, or 3)


§.❶/52

SAS code: Let’s just consider sex in model

proc glimmix data = gf83 ; model y = sex /dist=mult link=cumprobit solutions; estimate 'Category 1 Female ' intercept 1 0 sex 1 /ilink; estimate 'Category 1 Male ' intercept 1 0 sex 0 /ilink; estimate 'Category <=2 Female ' intercept 0 1 sex 1 /ilink; estimate 'Category <=2 Male ' intercept 0 1 sex 0 /ilink;run;

' '1x β x βik k i k iP

' '1x β x βik k i k iP

Subtle difference in parameterization:

Gianola &Foulley, 1983

PROC GLIMMIX

= 1 if females, 0 if males


§.❶/53

Parameter Estimates

Effect y Estimate Standard Error

DF t Value Pr > |t|

Interceptt1 - m

1 0.3007 0.3373 25 0.89 0.3812

Interceptt2 - m

2 0.9154 0.3656 25 2.50 0.0192

Sex b1 0.3290 0.4738 25 0.69 0.4938

Type III Tests of Fixed Effects

Effect Num DF Den DF F Value Pr > F

sex 1 25 0.48 0.4938


§.❶/54

Estimated Cumulative ProbabilitiesLabel Estimate Standard

ErrorDF t Value Pr > |t| Mean Standard

ErrorMean

Category 1 Female

0.6297 0.3478 25 1.81 0.0822 0.7355 0.1138

Category 1 Male

0.3007 0.3373 25 0.89 0.3812 0.6182 0.1286

Category <=2 Female

1.2444 0.3930 25 3.17 0.0040 0.8933 0.07228

Category <=2 Male

0.9154 0.3656 25 2.50 0.0192 0.8200 0.09594

1

2(

2

)

ˆˆ 1

malesP

2 1ˆˆ 1

Asymptotics?


§.❶/55

PROC NLINMIXED (fix b0, se)proc nlmixed data=gf83 ;parms beta1=0 thresh1=-1.5 thresh2 = 0.5; eta = beta1*sex ; if (y=1) then p = probnorm(thresh1-eta) - 0; else if (y=2) then p = probnorm(thresh2-eta) - probnorm(thresh1-eta); else if (y=3) then p = 1 - probnorm(thresh2-eta); if (p > 1e-8) then ll = log(p); else ll = -1e100; model y ~ general(ll); estimate 'Category 1 Female ' probnorm(thresh1-beta1);

estimate 'Category 1 Male ' probnorm(thresh1-0); estimate 'Category <=2 Female ' probnorm(thresh2-beta1);

estimate 'Category <=2 Male ' probnorm(thresh2-0);run;

Estimate b1, t1, t2

I

1

Prob i

my k

i ikk

Y y P

1 1

log I logn m

i iki k

L Y k P


§.❶/56

Key output from PROC NLINMIXED

Parameter Estimate Standard Error

DF t Value Pr > |t|

beta1 -0.3290 0.4738 28 -0.69 0.4931

thresh1 0.3007 0.3373 28 0.89 0.3803

thresh2 0.9154 0.3656 28 2.50 0.0184

Additional Estimates

Label Estimate Standard Error

Category 1 Female 0.7355 0.1138

Category 1 Male 0.6182 0.1286

Category <=2 Female 0.8933 0.07228

Category <=2 Male 0.8200 0.09594


§.❶/57

Yet another alternative (fix t1,t2)proc nlmixed data=gf83 ;parms beta1=0 sigmae= 1 mu = 0; thresh1 = 0; thresh2 = 0.5; eta = mu + beta1*sex ; if (y=1) then p = probnorm((thresh1-eta)/sigmae); else if (y=2) then p = probnorm((thresh2-eta)/sigmae) - probnorm((thresh1-eta)/sigmae); else if (y=3) then p = 1 - probnorm((thresh2-eta)/sigmae); if (p > 1e-8) then ll = log(p); else ll = -1e100; model y ~ general(ll); estimate 'Category 1 Female ' probnorm((thresh1-(mu+beta1))/sigmae);

estimate 'Category 1 Male ' probnorm((thresh1-mu)/sigmae); estimate 'Category <=2 Female ' probnorm((thresh2-(mu+beta1))/sigmae);

estimate 'Category <=2 Male ' probnorm((thresh2-mu)/sigmae);run;

Estimate b1, se, b0 (m)


§.❶/58

Parameter EstimatesParameter

Estimate Standard Error

beta1 -0.2676 0.3946

sigmae 0.8134 0.3327

mu -0.2446 0.3151

Additional EstimatesLabel Estimate Standard

ErrorCategory 1 Female

0.7356 0.1138

Category 1 Male

0.6182 0.1286

Category <=2 Female

0.8933 0.07228

Category <=2 Male

0.8200 0.09594This is not inference on overdispersion!!… it’s merely a reparameterization


§.❶/59

What is overdispersion from an experimental design perspective?

• No overdispersion identifiable for binary data…then why possible overdispersion for binomial data?– It’s merely a cluster (block) effect.

• Binomial responses.– Consists of y/n response.– Actually each “response” is a combined total for cluster

with n contributing binary responses; y of them being successes, n-y being failures.

• Similar arguments hold for overdispersion in Poisson and n=1 vs. n>1 multinomials.


§.❶/60

Hessian Fly Data Example (Gotway and Stroup, 1997)

Obs Y n block entry lat lng rep

1 2 8 1 14 1 1 11

2 1 9 1 16 1 2 12

3 9 13 1 7 1 3 13

4 9 9 1 6 1 4 14

5 2 9 1 13 2 1 21

6 7 14 1 15 2 2 22

7 6 8 1 8 2 3 23

8 8 11 1 5 2 4 24

9 7 12 1 11 3 1 31

10 8 11 1 12 3 2 32

Available from SAS PROC GLIMMIX documentation


§.❶/61

PROC GLIMMIX code

title "G side independence";proc glimmix data=HessianFly; class block entry rep; model y/n = entry ; random rep /subject =intercept ;run;

Much richer (e.g. spatial)

analysis provided by Gotway

and Stroup (1997); Stroup’s

workshop (2011)


§.❶/62

Key portions of output

Number of Observations Read 64

Number of Observations Used 64

Number of Events 396

Number of Trials 736

Covariance Parameter Estimates

Cov Parm Subject Estimate Standard Error

rep Intercept 0.6806 0.2612


§.❶/63

Hessian Fly Data in “individual” binary form:

Obs entry rep z

1 14 11 1

2 14 11 1

3 14 11 0

4 14 11 0

5 14 11 0

6 14 11 0

7 14 11 0

8 14 11 0

9 16 12 1

10 16 12 0

11 16 12 0

12 16 12 0

13 16 12 0

14 16 12 0

15 16 12 0

16 16 12 0

17 16 12 0

2/8

1/9


§.❶/64

PROC GLIMMIX code for “individual” data

title "G side independence";proc glimmix data=HessianFlyindividual ; class rep entry ; model z = entry / dist=bin; random intercept /subject =rep ;run;

random rep ;


§.❶/65

Key portions of output

Number of Observations Read 736

Number of Observations Used 736

Covariance Parameter Estimates

Cov Parm Subject Estimate Standard Error

Intercept rep 0.6806 0.2612

applied bayesian inference, ksu, april 29, 2012 §. ❶ / §❶ review of likelihood inference...

Documents

likelihood functiongiven

log2 theta y2 y3

log1theta y4

theta firstdersecn

derivative methods

regard joint pdf

outputdata newton y1

data analysts