bayesian multivariate logistic regression by sean o’brien and david dunson (biometrics, 2004 )...
TRANSCRIPT
Bayesian Multivariate Logistic Regressionby
Sean O’Brien and David Dunson
(Biometrics, 2004 )
Presented by Lihan He
ECE, Duke University
May 16, 2008
Univariate logistic regression
Multivariate logistic regression
Prior specification and convergence
Posterior computation
Experimental result
Conclusions
Outlines
Univariate Logistic Regression Model
ixii e
xy
1
1),|1Pr(
Equivalent:
)0(1 ii zy
)'|(~ ii xLz
zi: latent variable
L( ): logistic density
2)(
)(
]1[)|(
z
z
e
ezLlogistic density:
CDF:)(1
1)|(
zL ezF
'1
1)0(1)1Pr(
xLi ezFy
'
'
1)0()0Pr(
x
x
Li e
ezFy
Univariate Logistic Regression Model
Approximation using t distribution
,3/)2(22 3.7set
-8 -6 -4 -2 0 2 4 6 80
0.05
0.1
0.15
0.2
0.25
Logistic density
t density
Multivariate Logistic Regression Model
Binary variable for each output
with
-- marginal pdf has univariate logistic density
, F-1( ) is the inverse CDF of density
Multivariate Logistic Regression Model
Property
The marginal univariate densities of zj, for j=1,…,p, have
univariate logistic form
p=1, reduce to the univariate logistic density
R is a correlation matrix (with 1’s on the diagonal), reflecting the
correlations between zj, and hence the correlations between yj
R=diag(1,…,1), reduce to a product of univariate logistic densities,
and the elements of z are uncorrelated
Good convergence property for MCMC sampling
Multivariate Logistic Regression Model
Likelihood
M-ary variable for each output (ordered)
Assume
Define
)'(1
1),,|Pr(
ijk xiije
Xky
d
kkijkij zky
11 )(1
Prior specification and convergence
or
R: uniform density [-1,1] for each element in non-diagonal position
Posterior Computation
Posterior:
Prior and likelihood are not conjugate
Proposal distribution:
=
Use multivariate t distribution to approximate the multivariate logistic density in the likelihood part.
Importance sampling: sample from a proposal distribution to approximate samples from , and use importance weights for exact inference.
R,
Posterior Computation
Introduce latent variables and z, the proposal is expressed as
Sample and z from the full conditionals since the likelihood is conjugate to prior. ,
Update R using a Metropolis step (accept/reject)
z)
Set with probability
Set otherwise
Posterior Computation
Importance weights for inference
RRR
RR ddy
y
yg
)|,(
)|,(
)|,(),( *
*
weights
Application
Subject: 584 twin pregnancies
Output: small for gestational age (SGA), defined as a birthweight below the 10th percentile for a given gestational age in a reference population.
Binary output, yij={0,1}, i=1,…,584, j=1, 2
Covariates: xij for the ith pregnancy and the jth infant
Application
Obtain nearly identical estimates to the study of AP for the regression coefficients. Female gender (β1), prior preterm delivery (β4, β5) and smoking (β8) are associated
with an increased risk of SGA. Outcomes for twins are highly correlated, represented by R.
Conclusions
Propose a multivariate logistic density for multivariate logistic regression model.
The proposed multivariate logistic density is closely approximated by a multivariate t distribution.
Has properties that facilitate efficient sampling and guaranteed convergence.
The marginals are univariate logistic densities.
Embed the correlation structure within the model.