applied bayesian inference, ksu, april 29, 2012 §. ❶ / §❶ review of likelihood inference...
TRANSCRIPT
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/ 1
§❶ Review of Likelihood Inference
Robert J. Tempelman
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/ 2
Likelihood Inference• Necessary prerequisites to understanding Bayesian
inference– Distribution theory– Calculus– Asymptotic theory (e.g, Taylor expansions)– Numerical Methods/Optimization– Simulation-based analyses– Programming Skills
• SAS PROC ???? or R package ???? is only really a start to understanding data analysis.
• I don’t think that SAS PROC MCMC (version 9.3)/WinBuGs is a fix to all of your potential Bayesian inference problems.
Data Analysts: Don’t throw away that Math Stats text just yet!!!
Meaningful computing skills is a plus!
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/ 3
The “simplest” model
• Basic mean model:– Common distributional assumption:
• What does this mean? Think pdf!!!
; 1, 2,...,i iy e i n 2~ 0,i ee NIID
2
222
1~ | , exp
22
ii i e
ee
yy p y
1
222
3 221 1
1~ | , exp
22y
n ni
i ei i ee
n
y
yy
y p y
y
pdf: probability density function
joint pdf is product of independent pdf’s
Conditional independence
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/ 4
Likelihood function• Simplify joint pdf further
• Regard joint pdf as function of parameters
22
/2 /1
22 221
2
2
1 1| exp exp
2 22,
2y
n
ni
i
ii
n n
yy
p
2 1
/2 22/2
2
2
1exp,
2y|
n
i
n
in
yL
2
/22 2 12
, exp2
|y
n
in
i
yL
‘proportional to’
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/ 5
Maximum likelihood estimation• Maximize with respect to unknowns.
– Well actually, we directly maximize log likelihood
– One strategy: Use first derivatives:
• i.e., determine and and set to 0.
– Result?
2, |yL
l 2
l
1ˆ
n
ii
yML y
n
2
2 2 1
ˆˆ
n
iiML
n
y
2
2 2 2 12
, log , (constant) log2 2
|y |y
n
ii
yn
l L
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/ 6
Example Data, Log Likelihood & Maximum Likelihood estimates
55
33
45
49
38
y=
44ML
2 60.8ML
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/ 7
Likelihood inference for discrete data• Consider the binomial distribution:
!Prob | , (1 )
!( )!y n yn
n pY yn
p py y
| (1!
!() (
!1, )
)y n y y n yp
n
y n yL n p p py p
constant| , log log(1 )l p y n y p n y p
| ,( 1)
1
l p y n n yy
p p p
0)1(
ˆ1ˆ
p
yn
p
y
Set to zero
py
n→
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/ 8
Sometimes iterative solutions are required
• First derivative based methods can be slow for some problems.
• Second-derivative methods are often desirable, e.g. Newton-Raphson– Generally faster
– Provide asymptotic standard errors as useful by-product
2
21
1
|ˆ ˆ | y y
ii
i i
ll
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/ 9
Plant Genetics Example(Rao, 1971)
• y1, y2, y3, and y4 are the observed numbers of 4 different phenotypes involving genotypes different at two loci from the progeny of self-fertilized heterogygotes (AaBb). It is known that under genetic theory that the distribution of four different phenotypes (with complete dominance at each loci) is multinomial.
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/10
ProbabilitiesProbability Genotype Data (Counts)
Prob(A_B_) y1=1997
Prob(aaB_) y2=906
Prob(A_bb) y3=904
Prob(aabb) y4=32
p3
1
4
p4 4
p2
1
4
p1
2
4
0 1 → 0: close linkage in repulsion → 1: close linkage in coupling
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/11
Genetic Illustration of Coupling/Repulsion
Coupling Repulsion
A
B
a
b
A
b
a
B
= 1 = 0
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/12
Likelihood function
• Given:
1 2 3 4
1 2 3 4
! 2 1 1
! ! ! ! 4 4 4 4|y
y y y yn
py y y y
1 2 3 4
1 2 3 4
2 1 1
4 4 4 4
2 1 1
| yy y y y
y y y y
L
log1log1log2log|log| 4321 yyyyLl yy
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/13
First and second derivatives
• First derivative:
• Second derivative:
• Recall Newton Raphson algorithm:
4321
112
| yyyyl
y
231 2 4
2 2 22 2
|
2 1 1
yl yy y y
1
2
1 2
||ˆ ˆyy
ii
i i
ll
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/14
Newton Raphson:SAS data step and output
data newton; y1 = 1997; y2 = 906; y3 = 904; y4 = 32; theta = 0.01; /* try starting value of 0.50 too */ do iterate = 1 to 5; loglike = y1*log(2+theta) + (y2+y3)*log(1-theta) + y4*log(theta); firstder = y1/(2+theta) - (y2+y3)/(1-theta) + y4/theta; secndder = (-y1/(2+theta)**2 - (y2+y3)/(1-theta)**2 - y4/theta**2); theta = theta + firstder/(-secndder); output; end; asyvar = 1/(-secndder); /* asymptotic variance of theta_hat at convergence */ output;run;proc print data=newton; var iterate theta loglike;run;
iterate theta loglike
1 0.034039 1228.62
2 0.035608 1247.07
3 0.035706 1247.10
4 0.035712 1247.10
5 0.035712 1247.10
ˆ 0.0357ML
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/15
Asymptotic standard errors
• Given:
121
2ˆ
5
ˆ
ˆvar 3.6 10|
*( ) yl
xI
12
2
ˆ
|ˆ 0.0060yl
se
Observed information
proc print data=newton; var asyvar;run;
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/16
Alternative to Newton Raphson
• Fisher’s scoring– Substitute for in
Newton Raphson .– Now
– Then
2
2
|Ey
yl
2
2
| yl
4
211
nnpyE
4
122
nnpyE
4
133
nnpyE
444
nnpyE
22222
24
14
1
14
1
24
2|
E
nnnnl
y
4141424
nnnn
Expected information
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/17
Fisher scoring:SAS data step and output:
data newton; y1 = 1997; y2 = 906; y3 = 904; y4 = 32; theta = 0.01; /* try starting value of 0.50 too */ do iterate = 1 to 5; loglike = y1*log(2+theta) + (y2+y3)*log(1-theta) + y4*log(theta); firstder = y1/(2+theta) - (y2+y3)/(1-theta) + y4/theta; secndder = (n/4)*(-1/(2+theta) - 2/(1-theta) - 1/theta); theta = theta + firstder/(-secndder); output; end; asyvar = 1/(-secndder); /* asymptotic variance of theta_hat at convergence */ output;run;proc print data=newton; var iterate theta loglike;run;
iterate theta loglike
1 0.034039 1228.62
2 0.035608 1247.07
3 0.035706 1247.10
4 0.035712 1247.10
5 0.035712 1247.10 2
2ˆ
1 1ˆˆ 0.0058ˆ |
Ey
selI
In some applications, Fisher’s scoring is easier than Newton Raphson…but observed information probably more reliable than expected information(Efron and Hinckley, 1978 )
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/18
Extensions to multivariate q.
• Suppose that q is p x 1 vector.• Newton Raphson
• Fisher’s scoring
• or
tt
lltt
ˆ
1
ˆ
2
1
|,
'
|Eˆˆ
yy
y
tt
lltt
ˆ
1
ˆ
2
1
|,
'
|ˆˆ
yy
12
1ˆˆ
E | |,ˆ ˆ'
θ θθ θ
θ y θ yθ θ
y θ θ θtt
t t
l l
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/19
Generalized linear models• For multifactorial analysis of non-normal (binary,
count) data.• Consider the probit link binary model.
– Implies the existence of normally distributed latent (underlying) variables (i ).
– Could do something similarly for logistic link binary model
• Consider a simple population mean model:– i = m + ei ; ei ~ N(0, e
2 )
– Let = 10 and e = 2
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/20
The liability (latent variable) concept
DENSITY
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.10
0.11
0.12
0.13
0.14
0.15
0.16
0.17
0.18
0.19
0.20
LIABLTY
4 5 6 7 8 9 10 11 12 13 14 15 16
Pr Pr Pr .ob ob ob z
12
10
2
12 10
21 1 1 1587
=12 (“THRESHOLD”)
= 10
e = 2
i.e. probability of “success” = 15.87%
i
pdf(i )
Y=1(“success”)
Y=0(“failure”)
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/21
Inferential utopia!
• Suppose we’re able to measure the liabilities directly– Also suppose a more general multi-population(trt) model
= Xa + e; e ~ N(0, R); typically R = Is2
= ML(a) = OLS(a):
But (sigh…), we of course don’t generally observe l
1 2= 'n
11 RX'XRX ̂'
α̂
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/22
Suppose there are 3 subclasses
Mean liabilities
• Use “corner parameterization”:
1 1
2 2
3 3
9
10
11
α
1
2
11
2
1
α
' 1 1 0xi
' 1 0 1xi
' 1 0 0xi
X
x
x
x
1
2
/
/
/
n
= Xa + e
Herd 1
Herd 2
Herd 3
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/23
Probability of success as function of effects (can’t observe liabilities…just observed binary data)
• Shaded areas
6 8 10 12 14 16 18
0.0
00
.05
0.1
00
.15
0.2
0
liabilityd
en
sity
Herd 1Herd 2Herd 3
9 12 9Pr 12 | 1 Pr
2 2
Pr 1.5 1 1.5 0.067
ob herd ob
ob z
10 12 10Pr 12 | 2 Pr
2 2
Pr 1.0 1 1.5 0.1587
ob herd ob
ob z
11 12 11Pr 12 | 3 Pr
2 2
Pr 0.5 1 0.5 0.309
ob herd ob
ob z
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/24
Reparameterize model
• Let
• d and xi'a = (m + xi'*a*) cannot be estimated separately from s2
e….i.e., s2e not identifiable.
'* 1 0xi
'* 0 1xi
'* 0 0xi
' *x* αi i i i ie e 1
2
*α
'
Prob Prob Pr
*Prob 1 1
x* α
i i ii
e e
ii i
ee e
ith animal is diseased ob
z
Herd 1
Herd 2
Herd 3
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/25
Reparameterize the model again.
• consider the remaining parameters as standardized ratios: t = d/ se, x = m/se, and b = a*/se -> same as constraining se = 1. 'Prob 1 * x* β iith animal is diseased
3 4 5 6 7 8 9
0.0
0.1
0.2
0.3
0.4
liability
de
nsi
ty
Herd 1Herd 2Herd 3
Notice that the new threshold is now 12/2 = 6, whereas the mean responses for the three herds are now 9/2, 10/2 and 11/2
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/26
There is still another identifiability problem
• Between t and x• One solution?
– “zero out” t.
'Prob 1 * x* β iith animal is diseased
-4 -2 0 2 4
0.0
0.1
0.2
0.3
0.4
liability
de
nsi
ty
Herd 1Herd 2Herd 3
'
'
Prob
1 0 *
1 0
x* β
x β
i
i
ith animal is diseased
Notice that the new threshold is now 0, whereas the mean responses for the three herds are now -1.5, -1 and -0.5
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/27
• Note that
-4 -2 0 2 40
.00
.10
.20
.30
.4
liability
de
nsi
ty
Herd 1Herd 2Herd 3
higher values of translate into lower probabilities of disease
'Prob 1 1 x βi iith animal is healthy p
'ii x=
iip iip 11
'
' '
Prob
1 0
1 0
*x* β
x β x β
i
i
i i
p ith animal is diseased
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/28
Deriving likelihood function
• Given: • i.e.,
• Suppose you have second animal (i’)
• Suppose animals i and i’ are conditionally independent
• Example
1Prob( ) 1yy
i i iy y p p y = 0,1
1 11Prob( 1) 1i i i iy p p p
1 00Prob( 0) 1 1i i i iy p p p
1' ' 'Pr ( ) 1yy
i i iob y y p p
11 221
' ' '
1
1 2 1Prob( 1, )zz
i
zzi i i i iy z p py z p p
1 11'
1
'
0
'0
'1Prob( , 1)0 1 1i i ii i i iiy p p py p p p
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/29
Deriving likelihood function
• More general case
– conditional independence• So…likelihood function for probit model:
• Alternative: logistic model:
1 21 2
1 1 2 2
1 1 1
1 1 2 2
1
1
Prob( , ,..., )
1 1 ..... 1
1
nn
ii
n n
z z zzz zn n
nzz
i ii
y z y z y z
p p p p p p
p p
n
i
y
i
y
iiiL
1
1'' 1| xxy
n
i
y
i
y
i
i
ii
L1
1
''
'
exp1
1
exp1
exp|
xx
xy
'
'
exp1
exp
i
iip
x
x→
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/30
Small probit regression example
• Data Yi 1 1 0 0 1 0 0 0 1 0 0 0
Xi 40 40 40 43 43 43 47 47 47 50 50 50
'1'2'3'4'5'6'7'8'9'10'11'12
1 40
1 40
1 40
1 43
1 43
1 43
1 47
1 47
1 47
1 50
1 50
1 50
x
x
x
x
x
xX
x
x
x
x
x
x
1
1
0
0
1
0
0
0
1
0
0
0
y
'1Prob 1 x βi i i o iy x
iooi xy 111 ,|E
Link function = probit
1
1
1 11
β ,β |
β β 1 β β
y
i i
o
ny y
o i o ii
L
x x
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/31
Log likelihood
• Newton Raphson equations can be written as:
1
1 11
log β ,β |
log β β 1 log 1 β β
yo
n
i o i i o ii
L
y x y x
vXWXX 'ˆˆ' ][]1[ tt
21
2
log β ,β | yoii
i
Lw
1log β ,β | yoi
i
Lv
W iidiag w v iv
21
2
log β ,β |E Ey y
yo
i
ii
Lw
Fisher’s scoring: E
yW iidiag w
'
1β β
=x βi i
o ix
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/32
A SAS programdata example_binary; input x y; cards; 40 1 40 1 40 0 43 0 43 1 43 0 47 0 47 0 47 1 50 0 50 0 50 0;
proc genmod data=example_binary descending; class y; model y = x /dist=bin link=probit; contrast 'slope ' x 1;run;
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/33
Key outputCriteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Log Likelihood -6.2123
Analysis Of Maximum Likelihood Parameter Estimates
Parameter
DF Estimate Standard Error
Wald 95% Confidence Limits
Wald Chi-Square
Pr > ChiSq
Intercept 1 7.8233 5.2657 -2.4974 18.1439 2.21 0.1374
x 1 -0.1860 0.1194 -0.4199 0.0480 2.43 0.1192
Scale 0 1.0000 0.0000 1.0000 1.0000
Contrast Results
Contrast DF Chi-Square Pr > ChiSq Type
slope 1 2.85 0.0913 LR
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/34
Wald test
• Asymptotic inference:
– Reported standard errors are square roots of diagonals.
• Hypothesis test: on K’b = 0:
When is n “large enough” for this to be trustworthy????
12
1
ˆ
|ˆvar ''
β β
β yβ X WX
β β
l
11 2( ')
ˆ ˆ' ~ KK'β K' X WX K K'β nrow
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/35
Likelihood ratio test
proc genmod data=example_binary descending; class y; model y = /dist=bin link=probit;run;
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Log Likelihood -7.6382
11
1β ,β 0 | β 1 βy
n
o o oi
ii yyL
-2 (logLreduced - logLfull) = -2(-7.63 - -6.210) =2.84
Ho: 1 = 0 is Prob(21 >2.84) = .09.
Reduced Model:
Again..asymptotic
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/36
A PROC GLIMMIX “fix” for uncertainty:use asymptotic F-tests rather than c2-tests
proc glimmix data=example_binary ; model y = x /dist=bin link=probit; contrast 'slope ' x 1;run;
Type III Tests of Fixed Effects
Effect Num DF Den DF F Value Pr > F
x 1 10 2.43 0.1503
Contrasts
Label Num DF Den DF F Value Pr > F
slope 1 10 2.43 0.1503
“less asymptotic?”
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/37
Ordinal Categorical Data
• How I learned this?– “Sire evaluation for ordered categorical data with a
threshold model” by Dan Gianola and Jean Louis Foulley (1983) in Genetics, Selection, Evolution 15:201-224. (GF83)
– See also Harville and Mee (1984) Biometrics (HM84)
• Application:– Calving ease scores (0= unassisted, 5 = Caesarean)– Determined by underlying continuous liability
relative to set of thresholds: 0 1 2 1.... m m
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/38
• Liabilities:
• Consider three different herds/subclasses:
1
1 2
1
1
2o i
ii
m i m
L
Ly
m L
L X eμ
~ ( , )e 0 I2 2e e| N
1 1
2 2
3 3
9
10
11
μ
1
1 2
2
1 8 2 8 12
3 12
o i
i i
i
Ly L
L
e = 2
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/39
Underlying normal densities for each of three herds.
• Probabilities highlighted for Herd 2
5 10 15
0.0
00
.05
0.1
00
.15
0.2
0
liability
de
nsi
ty
Herd 1Herd 2Herd 3
1 2 31 2 3
1 2
2 1 2
Prob 8 | 10, 2
Prob
Prob 1.0 1.0 0.1584
e
e e
L
L
z
1 2
1 2 2 2 2
Prob 8 12 | 10, 2
Prob
Prob 1.0 1.0 1.0 1.0 0.6286
e e e
L
L
z
2 2 22Pr 12 | 10, 2 Pr Pr 1.0 1 1.0 0.1584
e e
Lob L ob ob z
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/40
Constraints• Not really possible to separately estimate e from 1 , 2 , 1,
2, and 3. Define then L* = L/ e, 1 * = 1 /e, 1* = 1/e , 2* = 2/e , and 3* = 3/e .
2 4 6 8
0.0
0.1
0.2
0.3
0.4
liability
de
nsi
ty
Herd 1Herd 2Herd 3
1 2
2 1 2
8 10Prob * * | *
2 2
Prob * * * *
L
L
1 2
1 2 2 2 2
8 12 10Prob * * * | *
2 2 2
Prob * * * * * *
Prob 1.0 1.0
1.0 1.0 0.6286
L
L
z
22 2 2
12 10Pr * | * Pr * * * * Pr 1.0 1 1.0 0.1584
2 2ob L ob L ob z
2 4 6 8
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/41
Yet another constraint requiredSuppose we use the corner parameterization:
when expressed as a ratio over se is
Such that t1* or t2* are not separately identifiable from m*
t1**= t1* - m* = 4.0- 5.5 = -1.5
t2**= t2* - m* = = 6.0- 5.5 = +0.5
1
2
11
2
1
1
2
* 5.5
* 1
* 0.5
1 1
2 2
3
* * * * 4.5
* * * * 5.0
* * * 5.5
i.e., zero out *
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/
-4 -2 0 2 4
0.0
0.1
0.2
0.3
0.4
liability
de
nsi
ty
Herd 1Herd 2Herd 3
1
2
3
* 1.0
* 0.5
* 0.0
1**= 1* - * = 4.0- 5.5 = -1.5 2**= 2* - * = = 6.0- 5.5 = +0.5
42
m**
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/43
Alternative constraint.
Estimate m but “zero out” one of t1 or t2 ,say t1
Start with
and t1* = 4.0 and t2
* = 6.0.
Then: m**= m*-t1
* = 5.5-4.0 = 1.5
t2** = t2
* -t1*= 6.0 - 4.0 = 2.0
1
2
* 5.5
* 1
* 0.5
-4 -2 0 2 4
0.0
0.1
0.2
0.3
0.4
liability
de
nsi
ty
Herd 1Herd 2Herd 3
1
2
** 1.5
* 1
* 0.5
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/44
One last constraint possibility
• Setting t1 = 0 and t2 to arbitrary value > t1 and infer upon se
• Say se = 2.
• t1 fixed to 0; t2 fixed to 4
1
2
** 3.0
* 2
* 1
-5 0 5
0.0
00
.05
0.1
00
.15
0.2
0
liability
de
nsi
ty
Herd 1Herd 2Herd 3
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/45
Likelihood function for Ordinal Categorical Data
Based on the multinomial (m categories)
whereandLikelihood:
Log Likelihood:
I 1 I 2 I I1 2
1
Prob i i i i
my y y m y k
i i i im ikk
Y y P P P P
1ik k i k iP 'x βi i
I
1 1 1
Pr i
n n mY k
i iki i k
L ob y y P
1 1
log I logn m
i iki k
L Y k P
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/46
Hypothetical small example
• Ordinal outcome having 3 possible categories:• Two subjects in the dataset:
– first subject has a response of 1 whereas the second has a response of 3.
– Their contribution to the log likelihood:
' '3 2 2 2
'2
' '1 1 0
'2
1
1 1
log
lo
log
lo g 1g
x β x β x βx β
βx xβ
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/47
Solving for ML
• Let’s use Fisher’s scoring:
– For a three+ category problem:• Now
1
2[ 1] [ ]log ,| log ,|ˆ ˆE
'
θ y θ yθ θ
θ θ θt tL L
' ' 'θ β τ
2 2
2
2 2
log ,| log ,|E E
' 'log ,|E
' log ,| log ,|E E
' '
'
' '
θ y θ y
τ τ τ βθ y
θ θ θ y θ y
β τ β β
T L X
X L X WX
L L
L
L L
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/48
Setting up Fisher’s scoring2nd derivatives(see GF83 or HM84 for details)
• now ( 1) 2
1 ( 1)
; 1,2,... 1 n
ik i kkk k i
i ik i k
P Pt k m
P P
11,
1 ( 1)
; 1,2,... 1 n
k i k ik k
i i k
t k mP
1 1,
1 1
nk i k i k i k i
j k k ii ik i k
lP P
2
1
1 1
; 1,2,...,n m
k i k iii
i k ik
w j pP
𝑗=1,2 , ... ,𝑝𝑘=1,2 , ..𝑚− 1
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/49
Setting up Fisher’s scoring1nd derivatives (see GF83 for details)
• Now
• with
log ,|
log ,|
log ,| '
θ y
pθ y τθ y X vθ
β
L
L
L
1
1 1
11ni kik
ki ik i k
I YI Yp
P P
1
1
mk i k i
ik ik
vP
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/50
Fisher’s scoring algorithm
• So
[ 1] [ ]ˆ ˆ'ˆ ˆ' ' '
τ τT L X p
X L X WX X vβ β
t t
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/51
Data from GF (1983) H A G S Y H A G S Y H A G S Y 1 2 M 1 1 1 2 F 1 1 1 3 M 1 1 1 2 F 2 2 1 3 M 2 1 1 3 M 2 3 1 3 F 2 1 1 3 F 2 1 1 3 F 2 1 1 2 M 3 1 1 2 M 3 2 1 3 F 3 2 1 3 M 3 1 2 2 F 1 1 2 2 F 1 1 2 2 M 1 1 2 3 M 1 3 2 2 F 2 1 2 2 F 2 3 2 3 M 2 1 2 2 F 3 2 2 3 M 3 3 2 2 M 4 2 2 2 F 4 1 2 3 F 4 1 2 3 F 4 1 2 3 M 4 1 2 3 M 4 1
H: Herd (1 or 2)A: Age of Dam (2 = Young heifer, 3 = Older cow)G: Gender or sex (M and F)S: Sire of calf (1, 2, 3, or 4)Y: Ordinal Response (1,2, or 3)
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/52
SAS code: Let’s just consider sex in model
proc glimmix data = gf83 ; model y = sex /dist=mult link=cumprobit solutions; estimate 'Category 1 Female ' intercept 1 0 sex 1 /ilink; estimate 'Category 1 Male ' intercept 1 0 sex 0 /ilink; estimate 'Category <=2 Female ' intercept 0 1 sex 1 /ilink; estimate 'Category <=2 Male ' intercept 0 1 sex 0 /ilink;run;
' '1x β x βik k i k iP
' '1x β x βik k i k iP
Subtle difference in parameterization:
Gianola &Foulley, 1983
PROC GLIMMIX
= 1 if females, 0 if males
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/53
Parameter Estimates
Effect y Estimate Standard Error
DF t Value Pr > |t|
Interceptt1 - m
1 0.3007 0.3373 25 0.89 0.3812
Interceptt2 - m
2 0.9154 0.3656 25 2.50 0.0192
Sex b1 0.3290 0.4738 25 0.69 0.4938
Type III Tests of Fixed Effects
Effect Num DF Den DF F Value Pr > F
sex 1 25 0.48 0.4938
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/54
Estimated Cumulative ProbabilitiesLabel Estimate Standard
ErrorDF t Value Pr > |t| Mean Standard
ErrorMean
Category 1 Female
0.6297 0.3478 25 1.81 0.0822 0.7355 0.1138
Category 1 Male
0.3007 0.3373 25 0.89 0.3812 0.6182 0.1286
Category <=2 Female
1.2444 0.3930 25 3.17 0.0040 0.8933 0.07228
Category <=2 Male
0.9154 0.3656 25 2.50 0.0192 0.8200 0.09594
1
2(
2
)
ˆˆ 1
malesP
2 1ˆˆ 1
Asymptotics?
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/55
PROC NLINMIXED (fix b0, se)proc nlmixed data=gf83 ;parms beta1=0 thresh1=-1.5 thresh2 = 0.5; eta = beta1*sex ; if (y=1) then p = probnorm(thresh1-eta) - 0; else if (y=2) then p = probnorm(thresh2-eta) - probnorm(thresh1-eta); else if (y=3) then p = 1 - probnorm(thresh2-eta); if (p > 1e-8) then ll = log(p); else ll = -1e100; model y ~ general(ll); estimate 'Category 1 Female ' probnorm(thresh1-beta1);
estimate 'Category 1 Male ' probnorm(thresh1-0); estimate 'Category <=2 Female ' probnorm(thresh2-beta1);
estimate 'Category <=2 Male ' probnorm(thresh2-0);run;
Estimate b1, t1, t2
I
1
Prob i
my k
i ikk
Y y P
1 1
log I logn m
i iki k
L Y k P
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/56
Key output from PROC NLINMIXED
Parameter Estimate Standard Error
DF t Value Pr > |t|
beta1 -0.3290 0.4738 28 -0.69 0.4931
thresh1 0.3007 0.3373 28 0.89 0.3803
thresh2 0.9154 0.3656 28 2.50 0.0184
Additional Estimates
Label Estimate Standard Error
Category 1 Female 0.7355 0.1138
Category 1 Male 0.6182 0.1286
Category <=2 Female 0.8933 0.07228
Category <=2 Male 0.8200 0.09594
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/57
Yet another alternative (fix t1,t2)proc nlmixed data=gf83 ;parms beta1=0 sigmae= 1 mu = 0; thresh1 = 0; thresh2 = 0.5; eta = mu + beta1*sex ; if (y=1) then p = probnorm((thresh1-eta)/sigmae); else if (y=2) then p = probnorm((thresh2-eta)/sigmae) - probnorm((thresh1-eta)/sigmae); else if (y=3) then p = 1 - probnorm((thresh2-eta)/sigmae); if (p > 1e-8) then ll = log(p); else ll = -1e100; model y ~ general(ll); estimate 'Category 1 Female ' probnorm((thresh1-(mu+beta1))/sigmae);
estimate 'Category 1 Male ' probnorm((thresh1-mu)/sigmae); estimate 'Category <=2 Female ' probnorm((thresh2-(mu+beta1))/sigmae);
estimate 'Category <=2 Male ' probnorm((thresh2-mu)/sigmae);run;
Estimate b1, se, b0 (m)
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/58
Parameter EstimatesParameter
Estimate Standard Error
beta1 -0.2676 0.3946
sigmae 0.8134 0.3327
mu -0.2446 0.3151
Additional EstimatesLabel Estimate Standard
ErrorCategory 1 Female
0.7356 0.1138
Category 1 Male
0.6182 0.1286
Category <=2 Female
0.8933 0.07228
Category <=2 Male
0.8200 0.09594This is not inference on overdispersion!!… it’s merely a reparameterization
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/59
What is overdispersion from an experimental design perspective?
• No overdispersion identifiable for binary data…then why possible overdispersion for binomial data?– It’s merely a cluster (block) effect.
• Binomial responses.– Consists of y/n response.– Actually each “response” is a combined total for cluster
with n contributing binary responses; y of them being successes, n-y being failures.
• Similar arguments hold for overdispersion in Poisson and n=1 vs. n>1 multinomials.
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/60
Hessian Fly Data Example (Gotway and Stroup, 1997)
Obs Y n block entry lat lng rep
1 2 8 1 14 1 1 11
2 1 9 1 16 1 2 12
3 9 13 1 7 1 3 13
4 9 9 1 6 1 4 14
5 2 9 1 13 2 1 21
6 7 14 1 15 2 2 22
7 6 8 1 8 2 3 23
8 8 11 1 5 2 4 24
9 7 12 1 11 3 1 31
10 8 11 1 12 3 2 32
Available from SAS PROC GLIMMIX documentation
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/61
PROC GLIMMIX code
title "G side independence";proc glimmix data=HessianFly; class block entry rep; model y/n = entry ; random rep /subject =intercept ;run;
Much richer (e.g. spatial)
analysis provided by Gotway
and Stroup (1997); Stroup’s
workshop (2011)
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/62
Key portions of output
Number of Observations Read 64
Number of Observations Used 64
Number of Events 396
Number of Trials 736
Covariance Parameter Estimates
Cov Parm Subject Estimate Standard Error
rep Intercept 0.6806 0.2612
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/63
Hessian Fly Data in “individual” binary form:
Obs entry rep z
1 14 11 1
2 14 11 1
3 14 11 0
4 14 11 0
5 14 11 0
6 14 11 0
7 14 11 0
8 14 11 0
9 16 12 1
10 16 12 0
11 16 12 0
12 16 12 0
13 16 12 0
14 16 12 0
15 16 12 0
16 16 12 0
17 16 12 0
2/8
1/9
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/64
PROC GLIMMIX code for “individual” data
title "G side independence";proc glimmix data=HessianFlyindividual ; class rep entry ; model z = entry / dist=bin; random intercept /subject =rep ;run;
random rep ;
Applied Bayesian Inference, KSU, April 29, 2012
§.❶/65
Key portions of output
Number of Observations Read 736
Number of Observations Used 736
Covariance Parameter Estimates
Cov Parm Subject Estimate Standard Error
Intercept rep 0.6806 0.2612