multivariate bayesian logistic regression for …of clinical safety data called multivariate...

22
arXiv:1210.0385v1 [stat.ME] 1 Oct 2012 Statistical Science 2012, Vol. 27, No. 3, 319–339 DOI: 10.1214/11-STS381 c Institute of Mathematical Statistics, 2012 Multivariate Bayesian Logistic Regression for Analysis of Clinical Study Safety Issues 1 William DuMouchel Abstract. This paper describes a method for a model-based analysis of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a set of medically related issues, or response variables, and MBLR allows information from the different issues to “borrow strength” from each other. The method is especially suited to sparse response data, as often occurs when fine-grained adverse events are collected from subjects in studies sized more for efficacy than for safety investigations. A combined anal- ysis of data from multiple studies can be performed and the method enables a search for vulnerable subgroups based on the covariates in the regression model. An example involving 10 medically related issues from a pool of 8 studies is presented, as well as simulations showing distributional properties of the method. Key words and phrases: Adverse drug reactions, Bayesian shrinkage, drug safety, data granularity, hierarchical Bayesian model, parallel lo- gistic regressions, sparse data, variance component estimation. 1. INTRODUCTION This paper introduces an analysis method for safe- ty data from a pool of clinical studies called multi- variate Bayesian logistic regression analysis (MBLR). The dependent or response variables in the MBLR are defined at the subject level, that is, for each sub- ject the response is either 0 or 1 for each safety issue, depending on whether that subject has been deter- mined to be affected by that issue based on the data William DuMouchel is Chief Statistical Scientist, Oracle Health Sciences, Van De Graaff Drive, Burlington, Massachusetts 01803, USA e-mail: [email protected]. 1 Discussed in 10.1214/12-STS381A, 10.1214/12-STS381B and 10.1214/12-STS381C; rejoinder at 10.1214/12-STS381REJ . This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in Statistical Science, 2012, Vol. 27, No. 3, 319–339. This reprint differs from the original in pagination and typographic detail. available at the time of the analysis. Safety issues can include occurrence of specific adverse events as well as clinically significant lab tests or other safety- related measurements. The predictor variables, as- sumed to be dichotomous or categorical, are all as- sumed to be observable at the time of subject ran- domization. The analysis is cross-sectional rather than longitudinal, and does not take into account the variability, if any, of the length of time different subjects have been observed. The primary predictor is study Arm, assumed to be dichotomous with val- ues “Treatment” or “Comparator.” Other subject- level covariates may be included, such as gender or age categories, or medical history variables. One fea- ture of the MBLR approach is that the interactions of treatment arm with each of the other covariates are automatically included in the analysis model. Data from a pool of multiple studies (having com- mon treatment arm definitions) may be included in the same analysis, in which case the study identifier would be considered a subject covariate. Analyses involving a pool of studies are similar in spirit to a full-data meta-analysis. 1

Upload: others

Post on 14-Aug-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multivariate Bayesian Logistic Regression for …of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a

arX

iv:1

210.

0385

v1 [

stat

.ME

] 1

Oct

201

2

Statistical Science

2012, Vol. 27, No. 3, 319–339DOI: 10.1214/11-STS381c© Institute of Mathematical Statistics, 2012

Multivariate Bayesian Logistic Regressionfor Analysis of Clinical Study SafetyIssues1

William DuMouchel

Abstract. This paper describes a method for a model-based analysisof clinical safety data called multivariate Bayesian logistic regression(MBLR). Parallel logistic regression models are fit to a set of medicallyrelated issues, or response variables, and MBLR allows informationfrom the different issues to “borrow strength” from each other. Themethod is especially suited to sparse response data, as often occurswhen fine-grained adverse events are collected from subjects in studiessized more for efficacy than for safety investigations. A combined anal-ysis of data from multiple studies can be performed and the methodenables a search for vulnerable subgroups based on the covariates inthe regression model. An example involving 10 medically related issuesfrom a pool of 8 studies is presented, as well as simulations showingdistributional properties of the method.

Key words and phrases: Adverse drug reactions, Bayesian shrinkage,drug safety, data granularity, hierarchical Bayesian model, parallel lo-gistic regressions, sparse data, variance component estimation.

1. INTRODUCTION

This paper introduces an analysis method for safe-ty data from a pool of clinical studies called multi-variate Bayesian logistic regression analysis (MBLR).The dependent or response variables in the MBLRare defined at the subject level, that is, for each sub-ject the response is either 0 or 1 for each safety issue,depending on whether that subject has been deter-mined to be affected by that issue based on the data

William DuMouchel is Chief Statistical Scientist,

Oracle Health Sciences, Van De Graaff Drive,

Burlington, Massachusetts 01803, USA e-mail:

[email protected] in 10.1214/12-STS381A, 10.1214/12-STS381B

and 10.1214/12-STS381C; rejoinder at10.1214/12-STS381REJ.

This is an electronic reprint of the original articlepublished by the Institute of Mathematical Statistics inStatistical Science, 2012, Vol. 27, No. 3, 319–339. Thisreprint differs from the original in pagination andtypographic detail.

available at the time of the analysis. Safety issuescan include occurrence of specific adverse events aswell as clinically significant lab tests or other safety-related measurements. The predictor variables, as-sumed to be dichotomous or categorical, are all as-sumed to be observable at the time of subject ran-domization. The analysis is cross-sectional ratherthan longitudinal, and does not take into accountthe variability, if any, of the length of time differentsubjects have been observed. The primary predictoris study Arm, assumed to be dichotomous with val-ues “Treatment” or “Comparator.” Other subject-level covariates may be included, such as gender orage categories, or medical history variables. One fea-ture of the MBLR approach is that the interactionsof treatment arm with each of the other covariatesare automatically included in the analysis model.Data from a pool of multiple studies (having com-mon treatment arm definitions) may be included inthe same analysis, in which case the study identifierwould be considered a subject covariate. Analysesinvolving a pool of studies are similar in spirit toa full-data meta-analysis.

1

Page 2: Multivariate Bayesian Logistic Regression for …of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a

2 W. DUMOUCHEL

Estimation of effects involves a hierarchical Bayesi-an algorithm as described below. There are two pri-mary rationales for the Bayesian approach. First,data concerning safety issues are often sparse, lead-ing to high variability in relative rates of rare eventsamong subject subgroups, and the smoothing inher-ent in empirical Bayes shrinkage estimates can al-leviate problems with estimation of ratios of smallrates and the use of multiple post-hoc comparisonswhen encountering unexpected effects. Second,MBLR fits the same analytical model to each re-sponse variable and then allows the estimates of ef-fects for the different responses to “borrow strength”from each other, to the extent that the patterns ofcoefficient estimates across different responses aresimilar. This implies that the different safety issuesshould be medically related, so that it is plausiblethat the different issues have related mechanismsof causation or are different expressions of a broadsyndrome, such as being involved in the same bodysystem, or different MedDRA terms in nearby lo-cations in the MedDRA hierarchy of adverse eventdefinitions. The goal is to assist with the problemof uncertain granularity of analysis. The questionof how to classify and group adverse drug reactionreports can be controversial because different assign-ments can change the statistical significance of countdata treatment effects, and methods and definitionsfor comparing adverse drug event rates are not wellstandardized (Dean, 2003). Sometimes the amountof data available for each of the related safety is-sues is too little for reliable comparisons, whereasdoing a single analysis on a transformed response,defined as present when any of the original issuesare present, risks submerging a few potentially sig-nificant issues among others having no treatment as-sociation. The Bayesian approach is a compromisebetween these two extremes.The proposed methodology is not intended to re-

place or replicate other processes for evaluating safe-ty risk but rather to support and augment them.In spite of the formal modeling structure, its spiritis more a mixture of exploratory and confirmatoryanalysis, a way to get a big picture review whenthere are very many parameters of interest. The re-sulting estimates with confidence intervals can pro-vide a new approach to the problem of how to bestevaluate safety risk from clinical studies designed totest efficacy.This paper describes the statistical model and the

estimation algorithms used in a commercial imple-mentation of MBLR. There is also some discussion

of alternate models and algorithms with reasons forour choices. An example analysis utilizes data froma set of clinical studies generously provided by anindustry partner, and a simulation provides infor-mation on the statistical properties of the method.

2. THE BAYESIAN MODEL FOR MBLR

As with standard logistic regression, MBLR pro-duces parameter estimates interpretable as log odds,and provides upper and lower confidence bounds forthese estimates. The method is based on the hier-archical Bayesian model described below. Identicalregression models (i.e., the same predictor variablesfor different response variables) are estimated as-suming that the relationships being examined areall based on the same underlying process. The re-sponse variables represent issues comprising a po-tentially common safety problem and the underlyingprocess is an adverse reaction caused by the treat-ment compound. The regression models are variousexaminations of relationships between subgroups de-fined by the covariates and the response issues. TheBayesian estimates of treatment-by-covariate inter-actions are conservative (estimates are “shrunk” to-ward null hypothesis values), in order to reduce thefalse alarm due to high variance in small samplesizes. This conservatism is a form of adjustment formultiple comparisons.It is natural to desire a comparison of MBLR with

a more standard analysis, which, for the present pur-pose, means a logistic regression model where theestimates for the different responses are not shrunktoward each other, and where interactions betweentreatment and other covariates are not being esti-mated. However, it can often happen, with sparsesafety data involving rare adverse events and the useof other predictors in addition to the treatment ef-fect, that standard logistic regression estimation canfail, because the likelihood function has no uniquefinite maximizing set of parameters. Gelman et al.(2008) discuss this problem, caused by what theycall separation and sparsity, and suggest the auto-matic use of weakly informative prior distributionsas a default choice for such analyses. Along the linesof the Gelman et al. (2008) suggestion, we will com-pare MBLR to a “weak Bayes” method that corre-sponds to setting certain variance components (thatare estimated by MBLR) to values selected to be solarge that the resulting estimates would be virtuallythe same as those of standard logistic regression ifthe data are not so sparse as to be unidentifiable.

Page 3: Multivariate Bayesian Logistic Regression for …of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a

MBLR FOR CLINICAL SAFETY DATA 3

This comparison method will be denoted regularized

logistic regression (RLR).The event data can be considered as a K-column

matrix Y , with a row for each subject and a columnfor each issue, and where Ysk = 1 if subject s experi-enced issue k, 0 otherwise. Since all subject covari-ates are assumed categorical, we will use a grouped-data approach, where there are ni subjects (i =1, . . . ,m) that have identical covariates and treat-ment allocation in the ith group, and where Nik ofthese subjects experienced issue k.MBLR requires the inclusion of the treatment arm

and of one or more additional predictors in the model,where all predictors are categorical.Across the set of issues a single regression model is

used. If there are J predictors excluding Treatment,and the jth predictor has gj categories, j = 1, . . . , J ,then there will be G =

gj subgroups analyzed.The model will usually have 2G+2− 2J degrees offreedom and estimation is performed by constrain-ing sums of coefficients involving the same covari-ate to add to 0. The Bayesian methods presentedhere allow estimation in the presence of additionalcollinearity of predictors, in which case the com-puted posterior standard deviations would then re-flect the uncertainty inherent in a deficient design.For the ith group of subjects, the modeled prob-

ability of experiencing issue k is Pik, where

Pik = 1/[1 + exp(−Zik)] and where(1)

Zik = α0k +∑

1≤g≤G

Xigαgk

(2)

+ Ti

(

β0k +∑

1≤g≤G

Xigβgk

)

.

The G columns of X define the G dummy vari-ables for the J covariates, and Ti is an indicator forthe treatment status of the ith group. The valuesof αgk (g = 0, . . . ,G; k = 1, . . . ,K) define the risk ofissue k for the comparator subjects. As mentionedabove, the sums

g αgk = 0, where the sums areover the categories of each covariate for each k. Themore natural quantities (α0k+αgk) are the log oddsthat a comparator subject in subgroup g will ex-perience issue k, g = 1, . . . ,G, averaged across thecategories of other predictors not defined by sub-group g.Concerning treatment effects, the quantities (β0k+

βgk) are the estimated log odds ratios for the riskof issue k (treatment versus comparator) that sub-jects in group g experience, g = 1, . . . ,G, averaged

across the categories of other predictors not definedby subgroup g. The sums

g βgk are constrainedjust as the α’s were.If G is large, there will be many possible subgroup

comparisons, and, since these confidence intervalshave not been adjusted for multiple comparisons,caution is advised in interpreting the largest few ofsuch observed subgroup estimates. The MBLR es-timates of these quantities are designed to be morereliable in the presence of multiple comparisons be-cause subgroup-by-treatment interaction effects are“shrunk” toward 0 in a statistically appropriate way,and there is also a partial averaging across issues, sothat subgroup and treatment effects and subgroup-by-treatment interactions can “borrow strength” ifthere is an observed similar pattern of treatmentand subgroup effects in most of the K issues beinganalyzed. When configuring a multivariate Bayesianlogistic regression, the analyst should try to selectthose issues for which there is some suspicion ofa common medical mechanism involved. If the Bayes-ian algorithm does not detect a common pattern ofsubgroup effects, then the Bayesian algorithm willperform little partial averaging across issues, be-cause corresponding variance component estimateswill be large.The Bayesian model is a two-stage hierarchical

prior specification:

αgk|Ag ∼N(Ag, σ2A),

(3)k = 1, . . . ,K;g = 1, . . . ,G,

β0k|B0 ∼N(B0, σ20), k = 1, . . . ,K,(4)

βgk|Bg ∼N(Bg, σ2B),

(5)k = 1, . . . ,K;g = 1, . . . ,G,

Bg ∼N(0, τ2), g = 1, . . . ,G.(6)

The prior distributions of α0k, k = 1, . . . ,K, andof Ag, g = 1, . . . ,G, and of B0 are assumed uniformwithin (−∞,+∞). Equations (3)–(5) embody theassumption that coefficients for the same predictoracross multiple issues cluster around the predictor-specific values (A1, . . . ,AG,B0, . . . ,BG) with the de-gree of clustering dependent on the magnitude ofthree variances (σ2

A, σ20, σ

2B). If any of these vari-

ances are near 0, there will be a tight cluster of thecorresponding regression coefficients across the Kresponses, whereas if they are large, there may beno noticeable common pattern across k for predic-tor g. The values of α0k correspond to the constantterms in the regressions, and we assume no com-

Page 4: Multivariate Bayesian Logistic Regression for …of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a

4 W. DUMOUCHEL

mon shrinkage of constant terms across issues, sincethe absolute frequencies of the issues are not beingmodeled here.Equation (6) embodies the assumption that the

null hypotheses Bg = 0 (i.e., no treatment-by-covari-ate interactions when averaged across responses) aregiven priority in the analyses. This is the assumptionthat helps protect against the multiple comparisonsfallacy when searching for vulnerable covariate sub-groups. The value of τ2 determines how strongly toshrink the G prior means Bg toward 0 in the secondlevel of the prior specification.The four standard deviations (σA, σ0, σB, τ) have

prior distributions assumed to be uniform in thefour-dimensional cube 0≤ σ, τ ≤ d. Their joint pos-terior distribution is approximated by a discrete dis-tribution for computational convenience, as describedbelow. The posterior distribution of the coefficients{Ag,Bg, αgk, βgk} is defined as a mixture of the dis-tributions of the coefficients conditional on the pos-sible values of the variance components. The methodproduces an approximate variance–covariance ma-trix for all the coefficients, and this also allows theestimation of standard deviations and confidence in-tervals (credible intervals) for linear combinationsof parameters such as the quantities (β0k + βgk) de-scribing the total treatment effect estimates for eachsubgroup.General discussion of hierarchical Bayesian regres-

sion models is available in Carlin and Louis (2000),although the particular model (involving multipleresponses) and estimation methods used in this pa-per are not discussed there. Searle, Casella and Mc-Culloch [(1992), Chapter 9] also discuss related meth-ods, including a logit-normal model somewhat sim-ilar to this one.

3. ESTIMATION DETAILS

Estimation of MBLR Parameters

The estimation algorithm for MBLR is based onseparate maximizations of the posterior distributionsof the coefficients, conditional on the values of thevariance components. Then these posterior distri-butions are averaged to provide an overall posteriordistribution, where the weights in the average aredetermined by the Bayes factors for different val-ues of the vector of the four variance components.First we assume that the four standard deviations(σA, σ0, σB , τ ) are fixed and known and consider es-timation of the other parameters.

Estimation of Coefficients and Prior MeansConditional on Prior Standard Deviations

There are M = 2(G+1)(K +1)− 1 such parame-ters: 2G+ 1 prior means, (G+ 1)K values αgk and(G+ 1)K values of βgk. However, 2J(K + 1) sumsof these parameters are defined as 0, leaving M∗ =2(G−J+1)(K+1)−1 dimensions for estimation. Itis convenient to imagine that subjects are groupedaccording to unique values of their covariates andtreatment allocation, so that the data are the sam-ple sizes ni and the counts Nik (i = 1, . . . ,m; k =1, . . . ,K), where i indexesm strata defined by uniquevalues of covariates and treatment. The joint distri-bution of the parameters and the data can be rep-resented as

p(A1, . . . ,AG,B0, . . . ,BG)

·∏

k

p(α0k, . . . , αGk, β0k, . . . , βGk|{A}{B})(7)

· p({Nik}|{A}{B}{α}{β}).

The prior distributions of A1, . . . ,AG, B0 and the{α0k} are assumed uniform over (−∞,+∞), whereasall the remaining parameters have prior distribu-tions as given in equations (3)–(6).Therefore, if logL is the log posterior joint distri-

bution of all the parameters, then, up to a constant,

2 logL=−

[

g>0

B2g/τ

2 + (G− J) log(τ2)

]

[

g>0

k

(αgk −Ag)2/σ2

A

+ (G− J)K log(σ2A)

]

[

k

(β0k −B0)2/σ2

0 +K log(σ20)

]

(8)

[

g>0

k

(βgk −Bg)2/σ2

B

+ (G− J)K log(σ2B)

]

+2∑

i

k

[Nik log(Pik)

+ (ni −Nik) log(1−Pik)].

In (8), the terms involving log(τ2), log(σ2A) and

log(σ2B) all have a factor (G − J), rather than the

Page 5: Multivariate Bayesian Logistic Regression for …of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a

MBLR FOR CLINICAL SAFETY DATA 5

more natural G, since there are G values {Ag} and{Bg}. But since they are being estimated subjectto J constraints where subsets of them add to 0,the factor (G− J) is substituted, analogous to theway REML estimates are defined for variance com-ponents in a frequentist analysis. For fixed variancecomponents, maximization of (8) with respect to allother parameters, remembering that the Pik are de-fined by (1) and (2), involves a relatively straight-forward modification of the usual logistic regressioncalculations. The prior means {Ag} and {Bg} aretreated analogously to the coefficients {αgk} and{βgk} during the Newton–Raphson maximization oflogL. Each iteration involves calculation of the vec-tor S of M first derivatives of logL with respect tothe parameters in (8) and the M ×M Hessian ma-trix H of the negative second derivatives of logL.The initial values of α0k are log(N+k/(n+ −N+k)),k = 1, . . . ,K (the subscript “+” means sum over thevalues of i), whereas the initial values of all otherparameters are 0.Upon convergence of the maximization, the vari-

ance–covariance matrix of the estimated parametersis assumed to be

V = V (σA, σ0, σB, τ) =H−1.(9)

[Actually, the matrix H will be singular becauseof the constraints that reduce the rank of H . Theinterpretation of (9) is as follows. Define a subset θ∗

of M∗ parameters out of the M -vector θ, where oneparameter from each constrained subset has beenomitted, but will be constrained to be equal to thenegative of the sum of the other parameters in itssubset. Define the M ×M∗ matrix Z that convertsfrom θ∗ to θ, that is, θ =Zθ∗. Then (9) is interpretedas V = Z(ZtHZ)−1Zt. The same transformation isused during the Newton–Raphson maximization oflogL. Also, in (11b) and later, the determinant of Vis computed as the determinant of V ∗ = (ZtHZ)−1.]The computation of V as H−1 uses the assump-

tion that the counts {Nik} are independent acrossboth i and k, conditional on the parameters. Theoccurrence of different events in the same subjectmay be connected via the parameters, but not oth-erwise correlated in this model. If this assumptionis violated, the variances in V may be underesti-mated. Since the M parameters include both all thecoefficients as well as their prior means, the vari-ances in V for any one component automaticallyinclude uncertainty due to correlation with all othercomponents. In particular, uncertainty in the priormeans {Ag,B0,Bg} is taken account of in the esti-mated posterior variances of the {αgk, βgk} (up to

the accuracy of the approximate multivariate nor-mality of the joint posterior distribution of the pa-rameters).

Accounting for Uncertainty in the Prior StandardDeviations

The prior distribution of the set of possible valuesof (σA, σ0, σB , τ ) is assumed to be uniform withinthe four-dimensional cube with limits (0, d), wherea default value of d = 1.5 is selected as discussedbelow. A discrete search method approximates theposterior distribution within this cube. Before dis-cussing the details, consider the situation where theprior standard deviation vector φ= (σA, σ0, σB, τ) isassumed to be one of S discrete values φ1, φ2, . . . , φS .Denote the vector of coefficients and prior means byθ = (A1, . . . ,AG,B0, . . . ,BG, α01, . . . , αGK , β01, . . . ,βGK), and assume that the maximized logL and theestimated posterior mean and covariance matrix of θare (logLs, θs, Vs) if φ = φs, s = 1, . . . , S. Then themarginal posterior distribution of θ, adjusting foruncertainty in φ, is assumed to be multivariate nor-mal with mean θ̂ and covariance matrix V , where

θ̂ =∑

s

πsθs,(10a)

V =∑

s

πs[Vs + (θs − θ̂)(θs − θ̂)t],(10b)

and where πs, the posterior weight given to φ= φs,s= 1, . . . , S, is defined by

πs = BF s/(BF 1 + · · ·+BFS),(11a)

BF s = exp(logLs)√

det(Vs).(11b)

The quantity BF s is the (relative) Bayes factor

for the hypothesis φ = φs. The usual definition ofthe Bayes factor requires the integration of the jointlikelihood over the space of all parameters not spec-ified by the hypothesis—in this case the space ofall θ. Using the approximation of this likelihood asproportional to a multivariate normal density withcovariance matrix Vs, and the known fact that vol-ume under the multivariate exponential formexp[−θt(Vs)

−1θ/2] is proportional to the square rootof the determinant of Vs, the definition of BF s is asgiven in (11). The approximation (11) is the stan-dard Laplace approximation often used for numeri-cal integration in Bayesian methods. However, a dif-ferent justification for computing (11b) in order toobtain estimates for variance components is givenby the theory of h-likelihood (Lee and Nelder, 1996;Lee, Nelder and Pawitan, 2006; Meng, 2009).

Page 6: Multivariate Bayesian Logistic Regression for …of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a

6 W. DUMOUCHEL

Selection and computation of the values(φs, πs), s= 1, . . . , S

Representing the 4-dimensional naturally contin-uous distribution of φ by a set of discrete points isa challenge. Assuming a range of d = 1.5 for eachelement of φ and a spacing of 0.1 would mean a gridof S = 154 > 50,000 points, the vast majority ofwhich would have values of πs nearly 0. Determi-nation of a set of just S = 33 points to representthe approximate posterior distribution of φ is per-formed as outlined next. A logistic transformationis used to convert the bounded cube (0, d)4 to theunbounded region where all four elements can rangefrom (−∞,+∞) by defining

λ= (λA, λ0, λB , λτ ) where

σA = d/(1 + e−λA), σ0 = d/(1 + e−λ0),(12)

σB = d/(1 + e−λB ), τ = d/(1 + e−λτ ).

With this transformation, a uniform prior distri-bution on (0, d) for each σ corresponds to a priordistribution for each λ over the real line of f(λ)∝σ(λ)(d − σ(λ)). The purpose of this transform isto allow simpler search procedures that don’t haveto worry about boundary constraints, as well as tomake approximation of the posterior by a multi-variate normal distribution more accurate. Then theposterior density of λ is assumed to be

g(λ) = g(λA, λ0, λB, λτ )

∝ f(λA)f(λ0)f(λB)f(λτ )(13)

· exp(logLs)√

det(Vs),

where logL and V in (13) are now functions of λ,and the λ’s vary over (−∞,+∞).The determination of the discrete distribution (φs,

πs), s= 1, . . . , S, is a five-step process:Step 1: Use the method of steepest ascent to find

the value λmax that maximizes g(λ) in (13). Deriva-tives of g are computed numerically as first differ-ence ratios with respect to each of the four argu-ments. The starting value for the search is λ= (0,0,0,0).Step 2: Construct a design of S = 33 λ-values by

adding 16 points on the surface of each of two con-centric spheres centered at λmax. The points on theinner sphere consist of 8 star points, where one com-ponent of λ is λmax ± 2δ0 and the other three com-ponents equal λmax, and 8 half-fractional factorialpoints, where all components are λmax ± δ0. Thepoints on the outer sphere are similar to those on

the inner sphere, except that δ0 is replaced by 1.5δ0and the fractional factorial points are from the op-posite half fraction as the fractional factorial pointson the inner sphere. The default value of δ0 = 0.3 onthe scale of λ. Visualizing the geometry of the de-sign, if a 4-dimensional sphere has radius 1.5 timesanother, it encloses about 5 times the volume.Step 3: The double central composite design of

Step 2 is centered but not scaled to the actual dis-tribution g(λ). To find the appropriate scale factorsin each dimension, δ = (δ1, δ2, δ3, δ4), for a better fit-ting design, a quadratic response surface model is fitto values of log g(λ) across the S points of this initialdesign. The fitted model is

log g(λ) = c0 +∑

i

ciλi +∑

i≤j

cijλiλj.(14)

Now if the quadratic model fit exactly (i.e., if gwere exactly multivariate normal), then the second-order coefficients cij would specify the elements ofthe inverse of the posterior covariance matrix of λ.Accordingly, we get what are hoped to be approx-imate posterior standard deviations by setting δ =vector of square roots of the diagonal of H−1, where

2H =

2c11 c12 c13 c14c12 2c22 c23 c24c13 c23 2c33 c34c14 c24 c34 2c44

.(15)

Step 4: Next a new design like that of Step 2 isconstructed except that the δ0 used in Step 2 for all4 dimensions is replaced by δ = (δ1, δ2, δ3, δ4) fromStep 3, so that the spheres are scaled differently ineach dimension. The values of log g(λ) are computedfor these 32 new points and a new quadratic re-sponse surface is fit to this 33-point final design.Let the peak of this fitted surface be denoted λfit,which will not exactly equal λmax, and redefine δ =(δ1, δ2, δ3, δ4) by using the coefficients from the newquadratic response surface in (15).Step 5: The discrete distribution defined by {λ(s),

g(λ(s)), s= 1, . . . , S} as computed in Step 4 will rough-ly approximate the continuous distribution definedby g(λ), but the approximation can be improved bymodifying the S = 33 probabilities to constrain the4 means and 4 standard deviations of the discretedistribution to exactly match the values λfit and δthat were computed from the response surface fitof Step 4. The final probabilities πs, s = 1, . . . , S,are computed as the solution to the following con-strained optimization problem:

Page 7: Multivariate Bayesian Logistic Regression for …of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a

MBLR FOR CLINICAL SAFETY DATA 7

Find positive π1, . . . , πS that minimize the Kul-lback–Leibler divergence

KL=∑

s

g(λ(s)) log[g(λ(s))/πs],

subject to the 9-dimensional constraints(16)

s

πs = 1;∑

s

πsλ(s) = λfit;

s

πs(λ(s) − λfit)2 = δ2,

where the last two equations are each interpretedas 4 constraints, one for each component of λ. Theconstrained minimization problem of (16) is solvedusing the method of Lagrange multipliers combinedwith a Newton–Raphson solution of the resulting 9equations.Thus, the {πs} used in (10) are the solution to (16)

rather than the more direct values in (11). They dif-fer from (11) by incorporating the Jacobian termsof (13) and the further modifications needed to sat-isfy the constraints in (16). The values of {φs} usedin (10) are the back-transformations defined by (12)of the final S = 33 points {λs} used in Steps 4 and 5.

Estimates Using Regularized Logistic Regression(RLR)

To compare the MBLR results to standard logis-tic regression, and still be able to avoid problemswith nonidentifiability, as discussed above, the RLRalgorithm is defined by fitting MBLR under the con-straints

σA = 5, σ0 = 5, σB = 0.001, τ = 0.001.(17)

Setting σB and τ very close to 0 effectively con-strains the estimates of covariate-by-treatment inter-actions to be 0. Setting σA and σ0 to be very large pre-vents the estimates across different response eventsfrom shrinking toward each other The rationale forthinking that a prior standard deviation of 5 is verylarge for a logistic regression coefficient is as fol-lows. Remembering that the coefficients are inter-preted as logs of odds ratios, an increase of 5 ina coefficient corresponds to a multiplicative factorof e5 = 148.4 in an odds ratio. With respect to theassumed normal prior distributions in equations (3)–(6), the prior standard deviation of 5 implies thatabout one-third of all estimated odds ratios are ex-pected to be outside the range of (1/148 = 0.007,148). This certainly seems to be well beyond the

range of expected odds ratios in any medical risk es-timation situation. See Gelman et al. (2008) for a re-lated discussion. [In the Bayesian setup describedabove, we use as default limits for the prior standarddeviations (0, d= 1.5). Considering a prior standarddeviation to be as large as 1.5, where e1.5 = 4.5, im-plies that about one-third of the estimated odds ra-tios would be outside the range of (1/4.5 = 0.22,4.5), which seems a bit of a stretch, but barely con-ceivable.]Using the values in (17) for the prior standard de-

viations, this alternative weak Bayesian prior methodestimates the parameters and their variances us-ing the iterative Newton–Raphson estimation de-scribed above. The resulting estimates are computa-tionally reliable even if many of the response eventsare sparse. Such estimates perform very little shrink-age across response models because the prior stan-dard deviations in equations (3)–(6) are large com-pared to the standard errors of the (estimable) logis-tic regression coefficients. However, the MBLR andRLR models as formulated will not protect againstproblems of estimability in case every response isquite sparse, because of the use of an improper priorfor the prior means (A1, . . . ,AG,B0). If certain co-variate or treatment categories are perfectly corre-lated with every response, then one must either dropsuch predictors or add additional response variables.The Bayes factor for φ0 = (5,5,0.001,0.001) can

be computed and compared to the 33 values foundin the final grid of the Bayesian estimation describedabove, which provides further evidence regardingthe prior standard deviations. In particular, largeBayes factors against φ0 imply that the MBLR modelfits the data better than the RLR model, meaningthat there is significant evidence that either the re-sponses have similar covariate profiles or that thereare significant covariate-by-treatment interactions.

Confidence Intervals for Odds Ratios

Let the final estimate of, for example, βgk be bgk,so that the odds ratio point estimate is ORgk =exp(bgk). Using the normal approximation to theposterior distribution of the coefficients and the es-timates of V in equation (10), 90% confidence in-tervals (posterior credible intervals) for the corre-sponding odds ratios are given by

OR.05 = exp[bgk − 1.645√

v(βgk)]<OR(18)

< exp[bgk +1.645√

v(βgk)] = OR.95.

Page 8: Multivariate Bayesian Logistic Regression for …of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a

8 W. DUMOUCHEL

For the main effects of covariates or for treatment,these provide confidence (credible) intervals for oddsratios of the predictor vs the response outcome. Theodds ratio comparing two categories of a multicat-egory covariate would be found by taking the ratioof the corresponding exponentiated coefficients.Interpreting the interaction effects of covariates

with treatment arm is tricky, since it would involveratios of odds ratios. To aid in interpretation, onecan present in addition to the interaction coefficientsthemselves, the sums of the treatment coefficientplus the interaction coefficients. Confidence intervalsfor these sums are formed in the usual way, takinginto account the covariances between the treatmentcoefficient and the interaction coefficients. Whenthese sums and their confidence limits are expo-nentiated, we get estimates and limits for subgrouptreatment-by-outcome odds ratios. These estimatesare oriented toward finding potentially vulnerablesubgroups where the adverse effect risk of treatmentis especially high.

4. DISCUSSION OF METHODS ANDALTERNATE MODELS

The philosophy of estimation is not to try to modelthe medical mechanisms perfectly, but to providea reliable compromise between pooling related sparseevents in order to increase the sample size, and fit-ting separate models to each event, with the corre-sponding loss of power due to small samples. Theselection of which issues to include in an MBLR isimportant. There needs to be at least a superficialplausibility that all or many of the selected outcomeissues might have similar odds ratios with treat-ment and with the covariates in the model, whatBayesians call exchangeability. Sometimes it may bedifficult to decide what other issues to include if at-tention has focused primarily on a single and seem-ingly unique issue such as subject death. Because ittakes several degrees of freedom to estimate a vari-ance component, the values of some of the standarddeviations in equations (3)–(6) may be poorly esti-mated if K and/or G are not large, but the use ofBayes factors and the computation of the πs in (11)and (16) allow some assessment and adjustment forthis uncertainty.The current model is quite similar in spirit to, and

somewhat inspired by, that proposed by Berry andBerry (2004). They also assume that drug adversereactions are classified into similar medical group-ings in order to use a shrinkage model to allow bor-rowing strength across similar medical events. They

focus on treatment/comparator odds ratios only anddo not consider covariates or the use of logistic re-gression. They also define a more complex modelhaving many more variance components than theone proposed here.One might ask the question of why estimate co-

variate effects at all, since in a randomized studythe covariates should all be nearly orthogonal to thetreatment variable? The rationale in MBLR is not somuch to adjust for potential biases in the treatmentmain effect, but to be able to include treatment-by-covariate interactions in order to detect possiblyvulnerable subgroups that might react differently tothe treatment. When G is large (many covariate cat-egories) it will often be difficult to estimate so manyparameters unless all the issues being modeled oc-cur frequently. The multiple comparisons involvedmake any search for vulnerable subgroups difficultand subject to false alarms, especially for sparseevents. This makes the use of Bayesian shrinkage ofthe interaction terms in (6) especially valuable: it ne-gotiates the bias-variance trade-off among multipleevent rates having possibly very different samplingvariances. Without this smoothing effect, estimatesof interactions affecting rare events will be so vari-able as to be useless, which is why the RLR methodis defined to estimate only main effects.The importance of avoiding undue rejection of the

null hypothesis in the presence of multiple post-hoccomparisons is central to being properly conserva-tive when evaluating treatment efficacy. There isa question as to how much this conservatism shouldextend to exploratory analyses of safety issues. Forexample, the prior specification (6) shrinks the inter-action prior means Bg toward 0, whereas the maineffect prior means Ag and B0 are not shrunk to-ward 0. We prefer to maintain maximum sensitivityto safety main effects, while accepting that true in-teraction effects are less likely and need more falsealarm protection. We also encourage parallel com-putation of the minimal-shrinkage regularized LRestimates discussed above, so that the analyst canperform an easy comparison and sensitivity analysisof the effects of shrinkage.The prior distributions in equations (3)–(6) are all

assumed to be normal distributions. Many Bayesianresearchers have pointed out that since normal dis-tributions generate few outliers, outliers may be cor-respondingly suppressed under this assumption.Commonly suggested alternative prior distributionsare the double exponential and Student’s t, whichtend to shrink outliers less. The double exponential

Page 9: Multivariate Bayesian Logistic Regression for …of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a

MBLR FOR CLINICAL SAFETY DATA 9

Issue Treatment events Comparator events 95% C.I. for Odds Ratio

Anuria 8 0 (1.0 , 295.4)Dry mouth 308 65 (3.9 , 6.7)Hyperkalaemia 218 162 (1.1 , 1.7)Micturition urgency 13 3 (1.2 , 12.6)Nocturia 19 7 (1.1 , 6.1)Pollakiuria 193 34 (4.1 , 8.5)Polydipsia 49 4 (4.2 , 29.3)Polyuria 100 17 (3.5 , 9.8)Thirst 543 66 (7.5 , 12.6)Urine output increased 13 1 (1.7 , 48.8)

Subject counts: Treatment = 3110 Comparator = 2642

Display 1. Statistics for ten issues related to dehydration/renal function for the pooled studies.

(“lasso”) prior has nonstandard theoretical proper-ties that make computation of standard errors of co-efficients problematical, and so have been ruled outfor this application. Alternative distributions likeStudent’s t are difficult to handle computationally inour complex situation where there are hundreds ofcoefficients and multiple variance components. Thenormal model that we use has a concave log pos-terior density function and the iterative estimationalgorithm is guaranteed to converge.There is a similar computational feasibility ra-

tionale for using the discrete approximation to thedistribution of prior standard deviations. It is morecommon in the recent Bayesian literature to useGibbs sampling or another Markov chain MonteCarlo (MCMC) method to estimate the posteriordistributions of all parameters. Two reasons for pre-ferring to avoid such methods are as follows: first,we want to allow scientists without much statisti-cal sophistication, much less experience with fancyBayesian computational methods, to use MBLR andthese users would have trouble assessing convergenceof such high-dimensional MCMC runs. Second, theseusers might also be uncomfortable with the fact thatrepeating an analysis on the same data typicallyleads to slightly, but noticeably, different answers.The method for handling the variance componentestimation outlined above provides computationallyand statistically reliable answers within a feasiblecomputational burden. As described above, thereare three roughly equally expensive stages in themodel fitting computations: the two preparatorystages of finding the maximum of the posterior dis-tribution and then evaluating it on an initial grid tofind scale parameters in each direction, and the last

stage of evaluating the model on the final grid to ap-proximate the posterior distribution of the variancecomponents.

5. EXAMPLE ANALYSIS

Data Description

The data used for the example analyses are froma pool of eight studies, kindly contributed by ananonymous partner. Four of the studies were for oneindication and four were for a second indication.There were a total of 5752 subjects in the pooledstudies, 3110 in the Treatment arm and 2642 in theComparator arm.Display 1 shows statistics from these studies for

a set of ten issues related to dehydration and/or re-nal function. All ten issues show up with greaterfrequency in the treatment arm than in the com-parator. The final two columns are the endpoints of95% confidence intervals for the odds ratios compar-ing treatment and comparator groups in the pooleddata, computed using a normal approximation forthe log (odds ratio) after adding 0.5 to every cellof each 2 × 2 table. It is clear that many of theseissues are associated with treatment, and we wishto investigate the commonality of these medicallyrelated issues, as well as the possibility that certainsubgroups of subjects may be more or less affectedby these associations.Display 2 shows the four covariates selected as

grouping variables for this analysis: Gender, StudyID, Renal History and Age. Recall that of the 8 stud-ies being pooled, there were 4 studies for each of twopotential indications for the drug. The Study ID val-ues of A1–A4 and B1–B4 distinguish the studies for

Page 10: Multivariate Bayesian Logistic Regression for …of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a

10 W. DUMOUCHEL

Treatment Comparator

Gender = F 908 685Gender = M 2202 1957Study = A1 246 84Study = A2 120 120Study = A3 239 80Study = A4 191 63Study = B1 102 103Study = B2 17 11Study = B3 123 120Study = B4 2072 2061Renal history = Y 190 191Renal history = N 2920 2451Age = 50 or under 382 348Age = 51 to 65 1089 902Age = 66 to 75 948 820Age = Over 75 691 572

All patients 3110 2642

Display 2. Distribution of subjects by covariates and treat-ment arm.

indication A and indication B. The Renal Historyvariable distinguishes those subjects whose medicalhistory (before randomization) includes one or morerenal problems. As can be seen from Display 2, thereare many more male than female subjects, and theage range 51 to 75 predominates. Three of the stud-ies for indication A had about a 3:1 split of Treat-ment to Comparator subject counts, while the otherfive studies are more equally split. Study B2 hadonly 28 subjects total, while Study B4 had 4133 sub-jects, over two-thirds of the total in the pool. Onlyabout 7% of the subjects had a previous history ofrenal problems.Display 1 shows that five of the ten issues af-

fected fewer than 10 Comparator-group subjects,whereas there are 16 separate covariate groups inDisplay 2. This makes it unlikely that those rare is-sues would occur in every treatment–covariate com-bination, which is necessary for convergence of a stan-dard LR where the model includes all treatment–covariate interaction terms. In fact, only 3 of the10 issues satisfy this condition, confirming the ne-cessity of some special technique such as MBLRto try to estimate treatment-by-covariate interac-tions, and, in fact, even a main-effects only modelwould not be estimable by standard logistic regres-sion applied to the rarer of these response issues,making the regularized LR necessary for this exam-ple.

Posterior Distributions for Prior StandardDeviations

This example has K = 10, J = 4 and G= 16, withthe total number of parameters (elements of θ) toestimate being M = 2(G+1)(K+1)−1 = 373, withM∗ = 285 degrees of freedom. Display 3 shows var-ious results as a function of the four prior stan-dard deviations. The top row 0 describes the reg-ularized LR case where σA = 5, σ0 = 5, σB = 0.001,τ = 0.001. The rows labeled 1–33 in Display 3 showresults for the final grid used to approximate theposterior distribution of φ. The row 1 values are themaximum posterior estimates (transformed from thescale of λ to that of φ) estimated by the final re-sponse surface fit described above. Rows 2–33 showthe remaining values of the final stage grid. In thisexample, all stages of the estimation required a to-tal of about 400 iterations through the data, thatis, about 400 evaluations of (8) and its first and sec-ond derivatives with respect to M∗ = 285 parame-ters.The rightmost column in Display 3, headed “PROB,”

shows the values of 100π%, as defined by (16). Asdiscussed above, these probabilities have been ad-justed so that the discrete distribution of λ matchesthe means and variances of the continuous distribu-tion of λ as estimated by the response surface fit tothe values of log g.The bottom two rows of Display 3 show the pos-

terior mean and standard deviations of the compo-nents of φ using this 33-point discrete approxima-tion. It can be seen that the values are approxi-mately (σA = 0.34, σ0 = 0.76, σB = 0.15, τ = 0.20).The value in the row marked “Mean” and the col-umn marked “PROB” is computed as

s π2s = 0.0639,

which is a measure of the dispersion of the probabil-ities πs. The smaller it is, the more spread out arethe probabilities among the 33 grid points. Largevalues of

s π2s , say, values above 0.2, would imply

that the scale or location of the grid might be poorlychosen, so that only a few points on the grid are veryprobable.

Comparison of MBLR and RLR Estimates ofTreatment Effects

Display 4 shows estimation results for the treat-ment main effects for each of the two methods andfor each response event and for the prior mean ofall responses. The prior mean odds ratio is definedas exp(B0), whereas the treatment odds ratio forthe kth response is exp(β0k). For each combination

Page 11: Multivariate Bayesian Logistic Regression for …of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a

MBLR FOR CLINICAL SAFETY DATA 11

σA σ0 σB τ PROB

0 5.000 5.000 0.001 0.001 0.00%1 0.327 0.688 0.161 0.196 15.90%2 0.276 0.505 0.110 0.312 1.87%3 0.276 0.505 0.232 0.118 3.02%4 0.276 0.879 0.110 0.118 2.71%5 0.276 0.879 0.232 0.312 4.32%6 0.384 0.505 0.110 0.118 4.25%7 0.384 0.505 0.232 0.312 1.83%8 0.384 0.879 0.110 0.312 5.01%9 0.384 0.879 0.232 0.118 1.75%10 0.252 0.423 0.090 0.091 0.14%11 0.252 0.423 0.276 0.387 0.42%12 0.252 0.969 0.090 0.387 0.88%13 0.252 0.969 0.276 0.091 0.83%14 0.416 0.423 0.090 0.387 0.51%15 0.416 0.423 0.276 0.091 0.10%16 0.416 0.969 0.090 0.091 5.40%17 0.416 0.969 0.276 0.387 0.13%18 0.231 0.688 0.161 0.196 1.86%19 0.448 0.688 0.161 0.196 2.13%20 0.327 0.350 0.161 0.196 1.37%21 0.327 1.053 0.161 0.196 6.96%22 0.327 0.688 0.074 0.196 8.12%23 0.327 0.688 0.327 0.196 0.75%24 0.327 0.688 0.161 0.070 5.80%25 0.327 0.688 0.161 0.473 3.21%26 0.192 0.688 0.161 0.196 0.87%27 0.518 0.688 0.161 0.196 2.43%28 0.327 0.232 0.161 0.196 0.02%29 0.327 1.196 0.161 0.196 4.69%30 0.327 0.688 0.049 0.196 5.54%31 0.327 0.688 0.447 0.196 0.00%32 0.327 0.688 0.161 0.041 6.18%33 0.327 0.688 0.161 0.670 1.01%

Mean 0.336 0.756 0.146 0.196 6.39%St.Dev. 0.053 0.183 0.053 0.105

Display 3. Calculation summary for the final grid of prior standard deviations.

the odds ratio and its approximate 90% confidence(credible) interval are shown, based on (18). Com-paring the MBLR to the RLR estimates, we see thatthe MBLR estimates are pulled away from the RLRestimates and “shrunk” toward the MBLR priormean, which represents the average or typical oddsratio across response issues. The degree of shrinkageis greatest for the highest-variance RLR estimates,corresponding to the rare issues such as Anuria andUrine output increased. For these two issues, al-

though the MBLR odds ratio estimate is smallerthan the corresponding RLR odds ratio, but so aretheir posterior variances, so that the lower boundsof the MBLR intervals are greater, providing greaterstatistical significance from the null hypothesis ofOR = 1. Even though all 8 occurrences of Anuriawere in the treatment arm, the treatment effect doesnot show up as significant with the multiple-predictorRLR model—the MBLR estimate of the effect onAnuria seems more reasonable.

Page 12: Multivariate Bayesian Logistic Regression for …of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a

12 W. DUMOUCHEL

Display 4. Estimates of main effect of treatment by method and response variable.

Inspection of Display 4 shows that not all of theMBLR confidence intervals are narrower than thecorresponding RLR interval. The reverse is true forthe more frequent responses such as Hyperkalaemiaand Thirst. In these cases, the MBLR estimates donot benefit much from the relatively weak prior dis-tribution, and their posterior variances are adverselyimpacted by the uncertainty in the variance compo-nent estimation as well as the need to estimate all ofthe interaction parameters, which are assumed awayby the RLR model.

MBLR Estimates of Prior Means

Display 5 graphs the MBLR estimates of the (ex-ponentiated) prior means {Ag,B0,Bg}, with their90% CIs. These are interpreted as effects for a “typ-ical” response variable. Remembering that coeffi-cients for categories of each covariate must sum to 0,the corresponding odds ratios must average to 1when plotted on a log scale. The middle intervalshows the main effect of treatment, the intervalsabove show covariate main effects, and the intervalsbelow show treatment interactions. As also shown inDisplay 4, the treatment effect prior mean is about4.4 on the odds ratio scale, with 90% limits of (2.7,

7.1). The main effects of covariate estimates, shownabove the treatment line, can be thought of as theeffects of covariate categories within the compara-tor arm, and as centers of shrinkage across the re-sponses. Thus, the rates of these events in the com-parator arm are somewhat less for Age:50 and un-der and for Renal History:N. Also, Study:A2 hada particularly high event rate, while Study:B4 hada particularly low event rate. But none of these dif-ferences in groups based on covariates are as largeas the treatment effect.The lower set of estimates in Display 5 portray the

treatment–covariate interactions. As can be seen,these effects are smaller than the main covariate ef-fects and much smaller than the main treatment ef-fect. The treatment effect estimates within the fourstudies for Indication A are all larger than the fourestimates for the Indication B studies, but the uncer-tainty intervals all overlap considerably. Althoughthis does not rule out larger interaction effects forsome of the response variables, the fact that σA, isabout 0.3 and both σB , and τ are each less than0.2 means that such effects for individual responsesare also likely to be fairly small. Since σ0 is about0.76, there is more room for variation in treatment

Page 13: Multivariate Bayesian Logistic Regression for …of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a

MBLR FOR CLINICAL SAFETY DATA 13

Display 5. Estimates of PRIOR MEAN from MBLR.

main effect among the responses, as we also saw inDisplay 4, where the Treatment odds ratios rangedfrom 1.3 for Hyperkalaemia to 7.4 for Polydipsia.The prior means of the treatment by covariate

interactions (the bottom 16 intervals of Display 5)have especially small posterior means, as might beexpected given that they have been shrunk toward 0because of the small value of τ , with posterior mean= 0.196 in Display 3. Another way of saying this isthat the estimates of Bg were so small compared totheir sampling variances that only a small value of τis compatible with these results and the assumptionof (6).The estimates of prior means under the regular-

ized LRmodel are less interesting. Assuming that σAand σ0 are large implies that Ag and B0 cannot beestimated well and will thus have wide confidenceintervals, and of course assuming no interactionsmeans that the Bg = 0 for g > 0.

Breakdown of Estimates by Study for IssuePollakiuria

Display 6 shows the MBLR 90% intervals for oddsratios relating to the Study ID covariate and the is-sue Pollakiuria (very frequent daytime urination).

The 2× 2 table information in Display 1 shows thatthis was highly associated with treatment (193:34split by treatment:comparator). Our discussion fo-cuses on whether and how the results differ by thestudies being pooled, and what summary conclu-sions are justified across studies. The goal is sim-ilar to meta-analysis, except that we have completedata from each study and so can adjust for morepotentially biasing between-study differences.The top eight intervals in Display 6 show the Study

main effects, corresponding to relative differencesamong the comparator arm odds of reporting Pol-lakiura within the studies. These differential esti-mates are adjusted for the other covariates Age,Gender and RenalHistory. There are relatively largeand significant study effects, especially betweenStudy A2 and Study B4, where the estimated oddsratio is over 8 (2.8 versus 0.34 on the horizontalaxis), with relatively narrow 90% intervals.The next set of eight intervals shows the Treat-

ment by Study interaction estimates. Although thedifferences are not as large as in the comparatorarms, the pattern is similar, in that the studies thathad a large base rate of Pollakiuria tended to havelarger increases in adjusted Pollakiura rates. The

Page 14: Multivariate Bayesian Logistic Regression for …of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a

14 W. DUMOUCHEL

Display 6. MBLR estimates of odds ratios relating to the Study covariate for the response Pollakiuria.

three studies having the largest treatment effects(A1, A2, A4) are all based on Indication A. Theseestimates are somewhat hard to interpret, being ra-tios of odds ratio estimates. The lower set of inter-vals return to the simple odds ratio scale by adding(on the log scale) the interaction estimates to themain effect of treatment. The very bottom intervalshows the 90% interval for the treatment main ef-fect, and the central points for the eight intervalsabove it average to the center point at the bottom.These last 9 intervals in Display 6 are reminis-

cent of the way a meta-analysis is often presented ina “ladder plot,” with estimates of effect for each study,and followed by a combined treatment estimate atthe bottom. However, there are certain differencesdue to the more complex MBLR model. First, asmentioned above, these estimates have been adjustedfor differential covariate distributions across stud-ies. Second, the Pollakiuria estimates here have beenshrunk toward the prior mean estimates of the oddsratios involving all responses. Third, the shrinkage ofinteraction estimates toward 0, governed by τ in (6),is similar to the shrinkage toward a common meaneffect that occurs in a random effects meta-analysis.Fourth, the weight that each study contributes to

the overall estimate is governed by a more complexformula than in either the standard fixed or ran-dom effects meta-analyses. However, it does sharewith the random effects methodology the fact thatrelative weights are much attenuated compared torelative sample sizes. Finally, this more complex cal-culation means that the single-study treatment es-timates in the above MBLR graph do not preservethe between-study differences, as might be shown ina standard meta-analysis presentation.The response Pollakiuria was chosen as the exam-

ple for Display 6 because that issue showed a greaterTreatment-by-Study effect than other issues: for ex-ample, in Display 6 the Trt*Study:A1 effect is 1.33,while the Trt*Study:B4 effect is 0.79, for a ratio of1.68, and the two 90% intervals barely overlap. Isthis post-hoc selection legitimate? Clearly, this wayof finding “interesting” results is biased in manystandard settings. However, the Bayesian shrinkagemethodology tends to offset such biases, as will beseen in the simulation results to follow.

6. SIMULATION STUDY OF MBLR AND RLR

The statistical properties of MBLR are studiedusing a simulation of the model that MBLR as-

Page 15: Multivariate Bayesian Logistic Regression for …of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a

MBLR FOR CLINICAL SAFETY DATA 15

sumes. The purpose is to compare the accuracy ofthe MBLR results with that of the RLR results inthe context of a situation like that in the exampleof Section 5, where there are rare events and sparsedata. The simulation emulates that example in thesense that the distribution of subject covariates andtreatment assignment matches the data in Section 5exactly. Also, the list of response issues is the sameand the baseline probabilities (as measured by theintercept term in the logistic regressions) of each re-sponse in the simulation are similar to that in thedata of Section 5. The protocol for each simulationinvolves the following steps:

1. Set the K intercept term values α0k, one foreach of the responses.

2. Set the G+1 prior means A1,A2, . . . ,AG,B0.3. Set the four prior standard deviations φ= (σA,

σ0, σB, τ).4. Repeat steps (5 through 12) NSIM times:

5. Draw {αgk} from N(Ag, σ2A), g = 1, . . . ,G;

k = 1, . . . ,K.(Note: all random variable generation is

performed using built-in R functions. Also,constraints that αgk must sum to 0 as gvaries over the categories of each single co-variate are enforced by subtracting meansover the corresponding covariate from theoriginally drawn αgk. An analogous proce-dure is used in steps 7 and 8.)

6. Draw {β0k} from N(B0, σ20), k = 1, . . . ,K.

7. Draw {Bg} from N(0, τ2), g = 1, . . . ,G.8. Draw {βgk} from N(Bg, σ

2B), g = 1, . . . ,G;

k = 1, . . . ,K.9. For each set of ni subjects having the same

covariate values and treatment assignment,compute Zik and Pik using (1) and (2), i=1, . . . ,m; k = 1, . . . ,K.

10. Draw {Nik} from binomial (ni, Pik), i =1, . . . ,m; k = 1, . . . ,K.

11. Fit both the MBLR and the RLR model tothe counts {Nik}.

12. Update cumulative summaries of estimationresults for each simulation as described be-low.

13. Create reports summarizing the estimation ac-curacy of the two methods regarding all param-eters.

Simulation Summary Statistics

There areM = 2(G+1)(K+1)−1 parameters be-ing estimated and two estimation methods: MBLRand RLR, so the total number of estimators being

evaluated is R = 2M . For simulation s (s = 1, . . . ,NSIM) and for estimate r (r = 1, . . . ,R), let:

θrs = true value of parameter r for simulation s,as defined by steps 1, 2, 5, 6, 7 and 8,qrs = estimated value (posterior mean) of param-eter r for simulation s,sers = estimated SE (posterior standard devia-tion) for parameter estimate r for simulation s,BIASr =

s(qrs− θrs)/NSIM [average estimationerror],RMSEr =

(∑

s(qrs − θrs)2/NSIM) [square rootof mean squared estimation error],Z2r = (

s(qrs−θrs)2/se2rs)/NSIM [average squared

standardized estimation error],CI.05r = (# times qrs + 1.645sers < θrs)/NSIM

[proportion of times 90% CI is too low],CI.95r = (# times qrs − 1.645sers > θrs)/NSIM

[proportion of times 90% CI is too high].

These summary statistics focus on the estimationaccuracy of qrs and also on the calibration accuracyof sers. We want BIAS and RMSE to be as close to0 as possible, we want Z2 to be near 1, and we wantCI.05 and CI.95 to be near 0.05. The R estimatescan be grouped by the two methods, the (K + 1)responses (counting PRIOR MEAN as a general-ized response) and the 2G + 2 different term defi-nitions. The term definitions fall into three generalterm types:

COV = {Ag, αgk},

TREAT= {B0, β0k},

TRT∗COV= {Bg, βgk}.

We can summarize the simulation of the R es-timates by averaging the six accuracy summarieslisted above over groups defined by method, responseand/or term type.Finally, for the MBLR, we can summarize the pos-

terior means and standard deviations of the four es-timated prior standard deviations.

Simulation Design

The simulations are designed to compare varia-tions in three design factors, each at two levels, sothat 8 separate simulations were performed, witheach simulation having NSIM = 250 replications, andso that a simple comparison of the two levels of eachfactor will be based on 1000 replications at eachlevel. The three design factors correspond to twodifferent choices at each of the first three steps inthe simulation protocol given above:

Page 16: Multivariate Bayesian Logistic Regression for …of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a

16 W. DUMOUCHEL

Response Intercept Base.Prob

1. Hyperkalaemia −2.906 0.05472. Thirst −3.333 0.03573. Dry mouth −3.429 0.03244. Pollakiuria −3.645 0.02615. Polyuria −4.787 0.0083

6. Nocturia −5.646 0.00357. Polydipsia −5.819 0.00308. Micturition urgency −6.618 0.00139. Urine output increased −6.972 0.000910. Anuria −7.722 0.0004

Display 7. Estimated values of intercept terms α0k thatare used in the simulations. When K = 5, only the first 5responses in the display are used. The baseline probabilitiesare defined as exp(α0k), which range from 0.055 for Hyper-kalaemia to 0.00044 for Anuria.

Factor 1, Level 1: Frequent responses only (most fre-quent 5 in the example data).

Factor 1, Level 2: Both frequent and rare responses(all 10 issues in the example data).The K = 10 situation uses the same 10 issues

as in the example of Section 5, with the valuesof the intercept terms α0k set equal to the esti-mated values from the real data, shown in Dis-play 7. The baseline probabilities are defined asexp(α0k), also shown, which range from 0.055for Hyperkalaemia to 0.00044 for Anuria. WhenK = 5, the most frequent 5 response issues areused, as shown in rows 1–5 of Display 7.

Factor 2, Level 1: Average of main effects = 0 (Ag =0 for all g, B0 = 0).

Factor 2, Level 2: Prior Means of main effects rela-tively large nonzero values.For Level 2, the values of A1, . . . ,AG,B0 are set

to two times the values estimated in the analy-sis of the actual data. Display 8 shows the co-efficient values as estimated and as used in thesimulations.

Factor 3, Level 1: σA = 0.4, σ0 = 0.6, σB = 0.2, τ =0.2 (Small PSDs).

Factor 3, Level 2: σA = 1.0, σ0 = 1.2, σB = 0.8, τ =0.8 (Large PSDs).The Level 1 values of prior standard deviations

are similar to those estimated from the exampledata, while the Level 2 values are significantlylarger.

All simulations create 5752 subjects having thesame joint distribution of covariates and treatmentallocations as the actual data and as summarized

Term Estimated Level 1 Level 2

Gender: F 0.028 0 0.056Gender: M −0.028 0 −0.056Study: A1 0.094 0 0.188Study: A2 0.529 0 1.058Study: A3 −0.325 0 −0.65Study: A4 0.33 0 0.66Study: B1 0.293 0 0.586Study: B2 −0.232 0 −0.464Study: B3 −0.273 0 −0.546Study: B4 −0.417 0 −0.834RenalHistory: N −0.187 0 −0.374RenalHistory: Y 0.187 0 0.374Age: 50 and under −0.251 0 −0.502Age: 51–65 0.109 0 0.218Age: 66–75 0.239 0 0.478Age: over 75 −0.097 0 −0.194Treatment 1.484 0 2.968

Display 8. Prior means for the main effects, as estimatedby MBLR from the real data, and as varied in the simulations,either all zeros (Level 1), or set to twice the estimated values(Level 2).

in Display 2. Thus, G = 16 and M = 203, R = 566whenK = 5, whileM = 373, R= 1066 whenK = 10.

Simulation Results

Display 9 shows summaries of the distributionsof (square roots of) variance component estimates,which are denoted PSDs for prior standard devia-tions in the model equations (3)–(6). Since there are4 separate PSDs in the model, and the simulationsare run at two sets of PSDs, the scales of all thePSDs in Display 9 have been normalized by divid-ing each estimated PSD and each estimated sam-pling standard deviation by the true PSD used inthe corresponding simulation. Thus, a value of 1 foran average estimated PSD in Display 9 is interpretedas an unbiased estimate, and a value of 0.1 for thestandard deviation of the sampling distribution ofa PSD in Display 9 is interpreted as a coefficient ofvariation of 10%.Display 9 has 8 columns and 7 rows. There are

4 pairs of columns, corresponding to the samplingmeans and standard deviations of the estimates ofeach of the four PSDs in the model. The 7 rows ofDisplay 9 correspond to different subsets of the 2000simulations. Row 1 shows averages over all simula-tions, whereas the other rows show averages overa subset of 1000 simulations corresponding to the

Page 17: Multivariate Bayesian Logistic Regression for …of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a

MBLR FOR CLINICAL SAFETY DATA 17

σA SDσA σ0 SDσ0 σB SDσB τ SDτ

All MBLR simulations 1.035 0.130 1.005 0.271 0.988 0.236 1.088 0.368Responses: Frequent 1.044 0.144 1.004 0.283 1.004 0.253 1.073 0.373Responses: Freq + Rare 1.026 0.116 1.007 0.260 0.971 0.219 1.102 0.363Mean effects: Zero 1.034 0.133 1.005 0.276 1.000 0.245 1.091 0.376Mean effects: Large 1.035 0.126 1.006 0.267 0.975 0.227 1.084 0.360Prior SDs: Small 1.037 0.153 1.131 0.353 0.954 0.340 1.136 0.476Prior SDs: Large 1.033 0.106 0.879 0.190 1.022 0.132 1.039 0.259

Display 9. Summary of estimation of prior standard deviations (PSD) in the MBLR simulations. All estimated PSDs aredivided by the true PSD to put their sampling distributions on a common scale. The row “All Simulations” shows meansand standard deviations of normalized estimates across all 2000 simulations. Other rows show results for subsets of 1000simulations broken down by the two levels of each of the three design factors in the experiment. See text for explanation of thedesign factors and their levels.

levels of each of the factors in the experimental de-sign. For example, consider the columns labeled σ0and SDσ0 in Display 9. In row 1, 1.005 implies thatoverall the mean of estimates of the Treatment PSDare within 0.5% of the true value, and the next valueof SDσ0 = 0.271 implies that individual estimatesare typically about 27% off the true value. Of coursethat value is principally reflective of the sample sizeand the experimental design of the clinical studies.All simulations used the same clinical study setups,but there was variation according to the three fac-tors in the simulation. Going down the rows in thesesame two columns, we see that estimates of σ0 hadalmost exactly the same means and standard devia-tions whether all ten responses were being simulatedor whether just the most frequent five responseswere simulated. Similarly, the next two rows showthat there was virtually no difference in the sam-pling means and standard deviations of σ0 betweenthe situation where the average effects are about 0versus relatively large effects. However, the final tworows of the Display show that when all four PSDs aresmall (σA = 0.4, σ0 = 0.6, σB = 0.2, τ = 0.2) the es-timate of σ0 is biased upward about 13%, and whenall four PSDs are large (σA = 1.0, σ0 = 1.2, σB = 0.8,τ = 0.8) it is biased downward by about the samepercentage. The direction of the biases implies thatestimates tend to be somewhat more central with re-spect to the restricted range imposed (0< σ0 < 1.5)than the true value, thus moderating the estimates.The coefficient of variation of σ0 is about 35% in theformer case and about 19% in the latter case. Thiscorresponds to roughly the same standard deviationof the estimate of σ0 whether σ0 is 0.6 or 1.2.This effect only shows up with respect to σ0; the

other columns in Display 9 show that mean esti-mates of σA, σB and τ are relatively unaffected by

any of the three factors in the simulation, especial-ly σA and σB . Consideration of degrees of freedommay explain this—these two variance componentshave (G − J)(K − 1) degrees of freedom, whereasσ0 has K − 1 df and τ has G− J df, so one mightexpect them to be harder to estimate (although thedefinition of degrees of freedom is somewhat fuzzyin this nonlinear Bayesian setting). Estimates of τseem to be most variable percentagewise, with co-efficient of variation in the 30–40 percent range. Inall cases the coefficient of variation is larger for thesmaller true PSDs. The standard deviation of esti-mation decreases when the true PSD decreases, butnot fully proportionally.However, remember that the goal of the analysis is

not to estimate the variance components per se, butto use them to define a model that can better esti-mate the logistic coefficients by adjusting to globalpatterns in the data across responses and predic-tor categories. Each individual estimation does notassume that the PSDs are exactly equal to their pos-terior mean, but rather the estimation involves anintegration across the posterior distribution of thePSDs. In that respect, it is interesting to examinethe posterior standard deviations of the PSDs. Theyhave not been included in Display 9 in order to savespace, but in fact the average of the posterior stan-dard deviations across simulations was remarkablysimilar to the sampling standard deviations of theposterior means of each PSD. They typically differedby only 10% or so for each of the 8 sets of 250 simula-tions. Thus, our model expects that the PSDs will behard to estimate and works within that uncertainty.

Estimation of Logistic Coefficients

Display 10 summarizes the simulation distribu-tions of the various logistic regression coefficients.

Page 18: Multivariate Bayesian Logistic Regression for …of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a

18 W. DUMOUCHEL

(a) Treatment effect prior mean B0 Treatment effect for responses β0k

RMSE Z2 CI.05 CI.95 RMSE Z2 CI.05 CI.95All RLR simulations 0.719 0.192 0.000 0.003 1.066 26.026 0.108 0.354All MBLR simulations 0.383 1.167 0.056 0.070 0.466 1.248 0.061 0.074Responses: Frequent 0.424 1.235 0.068 0.067 0.314 1.200 0.059 0.073Responses: Rare 0.343 1.099 0.044 0.073 0.619 1.297 0.063 0.075Mean effects: Zero 0.386 1.163 0.046 0.076 0.491 1.284 0.052 0.090Mean effects: Large 0.381 1.170 0.066 0.064 0.442 1.212 0.070 0.058Prior SDs: Small 0.286 1.023 0.043 0.062 0.375 1.217 0.058 0.073Prior SDs: Large 0.481 1.310 0.069 0.078 0.557 1.280 0.064 0.076

(b) Covariate effect prior means Ag Covariate effect for responses αgk

RMSE Z2 CI.05 CI.95 RMSE Z2 CI.05 CI.95All RLR simulations 0.490 0.105 0.000 0.000 0.819 11.662 0.166 0.190All MBLR simulations 0.297 0.972 0.044 0.052 0.373 1.041 0.049 0.057Responses: Frequent 0.323 0.959 0.043 0.052 0.280 1.041 0.049 0.056Responses: Rare 0.272 0.986 0.045 0.052 0.466 1.042 0.049 0.057Mean effects: Zero 0.300 0.959 0.042 0.051 0.388 1.026 0.047 0.056Mean effects: Large 0.295 0.986 0.046 0.053 0.358 1.056 0.051 0.057Prior SDs: Small 0.205 0.990 0.044 0.055 0.283 1.048 0.050 0.057Prior SDs: Large 0.390 0.954 0.043 0.049 0.463 1.034 0.048 0.057

(c) Interaction effect prior means Bg Interaction effect for responses βgk

RMSE Z2 CI.05 CI.95 RMSE Z2 CI.05 CI.95All MBLR simulations 0.347 3.189 0.151 0.144 0.346 1.116 0.059 0.057Responses: Frequent 0.353 2.940 0.144 0.141 0.292 1.110 0.059 0.056Responses: Rare 0.340 3.439 0.159 0.147 0.399 1.122 0.059 0.057Mean effects: Zero 0.347 3.114 0.150 0.140 0.360 1.120 0.060 0.057Mean effects: Large 0.346 3.265 0.152 0.148 0.331 1.112 0.059 0.056Prior SDs: Small 0.171 2.489 0.130 0.122 0.203 1.174 0.062 0.061Prior SDs: Large 0.522 3.890 0.173 0.166 0.488 1.058 0.056 0.053

Display 10. Summary of estimated logistic coefficient distributions within the simulations. Separate subtables for (a) treat-ment effects B0 and β0k , (b) covariate main effects Ag and αgk, (c) treatment-by-covariate interactions Bg and βgk

(g = 1, . . . ,G). See text for explanation of the summary statistics.

Part (a) of Display 10 focuses on the main effect ofTreatment. The first two rows of Display 10(a) com-pare the Treatment effect accuracy of the RLR esti-mates to that of the MBLR estimates. The first fourcolumns refer to the estimation of the prior mean co-efficient, B0, what might be called the “all responsesummary,” while the last four columns refer to theestimation of coefficients, β0k, for the individual re-sponses. Across all 2000 simulations, the RMSE forRLR is almost double that of MBLR for estima-tion of B0, and more than double, on average, forestimating the β0k. Since statistical efficiency is typ-ically inversely proportional to the square of RMSE,this implies that MBLR is about 4 times as efficientas RLR at estimating treatment/comparator oddsratios in this setting.

The statistic Z2 is designed to measure the cal-ibration of the posterior standard deviations com-puted by a method to the actual sampling distribu-tion, where Z2 = 1 implies perfect calibration. WhenZ2 ≫ 1, the claimed standard errors of coefficientsare too optimistic (too small), and the reverse istrue when Z2 ≪ 1. The values of Z2 provide similarinformation to the counts of times confidence inter-vals fail to enclose the true values of coefficients.When Z2 is too large and putative standard errorsare too small, the too-short confidence intervals willmiss the true values more than the nominal percentof times, and conversely. Looking at the first tworows of Display 10(a), we see that RLR is poorlycalibrated in this sense. Computed standard errorsare too large for the all-response summary and too

Page 19: Multivariate Bayesian Logistic Regression for …of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a

MBLR FOR CLINICAL SAFETY DATA 19

small for the individual response treatment effects.As a result, supposedly 90% confidence intervals had99.7% coverage for the all-response summaries andonly 53.8% coverage for individual response treat-ment effects. In contrast, the MBLR estimates aremuch better calibrated, with Z2 about 1.2 and nomi-nally 90% intervals having coverage probabilities av-eraging about 87%.The remaining rows of Display 10(a) show the be-

havior of the MBLR estimation for subsets of simu-lations defined by the three two-level factors. Rows 3and 4 compare results for simulations with the 5more frequent responses to those for the 5 less fre-quent responses. In the latter case, although theruns generated all 10 responses and all 10 were usedin the analysis, the results in the row labeled “Re-sponses: Rare” are based only on accuracy statisticsfor the 5 least frequent responses, in order to bet-ter isolate the estimation ability of MBLR for rareevents. We see that in fact the RMSE, Z2, and 90%interval coverage probabilities are roughly the samefor the rare and frequent events. (Of course, we as-sume that a run with only the five rare events wouldlead to much more variable estimation—it is theability of the Bayesian algorithm to detect and mea-sure similarities between frequent and rare events,and to “borrow strength” appropriately, that allowssuch accuracy.) The next two rows of Display 10(a)show that whether the true prior means are 0 ornot makes no difference in the estimation properties.The final two rows of Display 10(a) show that esti-mation is significantly more accurate when PSDs aresmall than when they are large, which make sense,because small PSDs imply more commonality acrossthe responses, and thus more opportunity to borrowstrength and increase estimation accuracy. But evenwith the larger set of PSDs, MBLR quite outper-forms RLR.

Display 10(b) shows the corresponding results forthe estimation of covariate main effects. For this ex-ample, the estimation of Ag and αgk seems to bemore accurate, using either RLR or MBLR, on av-erage for the 16 covariate effects (indexed by g) thanit was for the single main effect of treatment. How-ever, the advantage of MBLR over RLR is aboutthe same, both in terms of RMSE and in terms ofstandard error calibration as measured by Z2 andthe coverage probabilities of nominal 90% intervals.Display 10(c) shows the simulation accuracy of

estimation of covariate–treatment interaction coeffi-cients. Since the RLR model does not estimate inter-actions, only MBLR results are presented. Lookingfirst at the right-hand set of four columns in Dis-play 10(c) that refer to estimation of the βgk, allfour accuracy measures seem to mimic the valuesin Display 10(b)—it appears that MBLR can es-timate individual covariate–treatment interactionsas accurately as the main effects of covariates. Nowlooking at the first four columns of Display 10(c),where the Bg are being estimated, the entire col-umn of RMSE values are about the same as in Dis-plays 10(a) and 10(b), so the posterior mean esti-mates of the Bg are about as accurate as those of B0

and of Ag. However, it seems that the posterior stan-dard deviations of the Bg are too optimistic, sincethe values of Z2 are about 3 times too large, andthe error rate of the corresponding nominal 90% in-tervals is about 30% instead of 10%. This result ispuzzling and awaits further investigation.

Bayesian Shrinkage Estimates are Resistant tothe Multiple Comparisons Fallacy

Display 11 shows the remarkable power of Bayesianshrinkage estimates to avoid bias even in the pres-ence of post-hoc selection of the most significant ofmany estimates. For each simulation, the task is

True Int. Bias RMSE Z2 CI.05 CI.95All MBLR simulations 0.976 0.004 0.173 1.314 0.070 0.070Responses: Frequent 0.985 0.007 0.179 1.346 0.078 0.077Responses: Freq +Rare 0.968 0.000 0.167 1.281 0.061 0.062Mean effects: Zero 0.972 0.011 0.171 1.312 0.074 0.068Mean effects: Large 0.980 −0.004 0.175 1.315 0.065 0.071Prior SDs: Small 0.337 0.019 0.163 1.600 0.084 0.093Prior SDs: Large 1.616 −0.011 0.184 1.027 0.055 0.046

Display 11. Simulation of the resistance to multiple comparisons bias of MBLR. At each simulation, the most significanttreatment × covariate interaction was singled out across all responses by selecting the largest of the GK values (G= 16, K = 5or 10, GK = 80 or 160) of (estimated interaction coefficient)/(estimated posterior s.d. of coefficient). The MBLR estimatesare unbiased with relatively small RMSE. See text for discussion of other columns.

Page 20: Multivariate Bayesian Logistic Regression for …of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a

20 W. DUMOUCHEL

to find the most significant treatment × covariateinteraction among all the responses. There are 16covariate-based subsets in the model that get inter-action estimates for every response variable andK =10 or 5 responses, making a total of 16K = 80 or 160ratios (estimated interaction coefficient)/(estimateds.e. of interaction coefficient). The largest ratio (one-sided alternative) in each MBLR analysis is selectedand then the known true value for the selected in-teraction is used to compute the accuracy measures.This selection and assessment is repeated for eachof the 2000 simulations. The first column of Dis-play 11 is the average of the true coefficients for theselected interactions. Remembering that the true in-teractions are generated from a N(0, σ2

B + τ2) dis-

tribution, where√

(σ2B + τ2) = either 0.283 (smaller

PSDs) or 1.13 (larger PSDs), it is clear from the“True Int” column that MBLR is selecting fairlylarge interactions. The column headed BIAS con-tains the average difference between the selected es-timate and its true value. Remarkably, the MBLRpost-hoc selections have virtually no bias, either over-all or in any of the six factor-based subsets. The finalfour columns in Display 11 show the same accuracymeasures as those of Display 10. The RMSE valuesin Display 11 are smaller than any of those in Dis-play 10, which at first might seem surprising, but isa consequence of the fact that the maximum of 80or 160 identically normally distributed variates willhave smaller standard deviation than a single suchvariate, due to the short tail of the normal distribu-tion. The calibration of the posterior standard de-viations of the selected most significant interaction,as measured by the values of Z2 and the coverageprobabilities, is not perfect but is similar to that ofthe treatment main effects in Display 10(a). Thisexcellent accuracy of MBLR shrinkage estimates inthe face of post-hoc selection is in spite of the factthat the variance components which determine theamount of shrinkage were not known in advance butwere estimated separately for each simulation.

Discussion of the Simulation

It should not be surprising that data generated bya specific Bayesian model can be better analyzed byfitting that model. But these simulations show thatthere is a surprisingly large advantage to doing so,and that you give up a lot of efficiency (equivalentlywaste clinical resources) by forgoing such an analysisif, in fact, such a model is realistic. With the RLRapproach, which itself is probably a more efficientanalysis than straight logistic regression, you give

up the possibility of estimating treatment–covariateinteractions and yet still lose accuracy in estimationof main effects. The principal nonBayesian alterna-tive is to form a single pooled response, treating thedifferent issues as equivalent. But then you don’teven get estimates for the separate issues and youwould be submerging completely the medical dis-tinction between, say, such a serious adverse eventas Anuria and Dry mouth or Thirst. Our methodol-ogy is a “Goldilocks alternative” to the bias-variancetrade-off, neither as variable as estimating so manyparameters with no prior shrinkage, nor as biased asassuming that all issues have the same response totreatment and that all interactions are 0.

7. SUMMARY AND CONCLUSIONS

Safety issues with low observed frequencies willproduce standard logistic regression estimates withwide confidence intervals (based on highly variablesampling distributions). Clinical safety data is oftenof very fine granularity. Each observation of a sub-ject’s adverse event is described with great precision,providing a great multiplicity of events to be tabu-lated and whose event frequencies must be comparedacross treatment arms. Defining event groupings forthe purpose of getting pooled events with more reli-able relative frequencies is hard to do in advance, be-fore the set of somewhat frequent events is observed.After the data are collected, it can be controver-sial to lump events together because the selection ofevents to pool can determine how significant Treat-ment/Comparator odds ratios become. The multi-variate Bayesian logistic regression methodology de-scribed here is designed to be a compromise betweenseparate analyses of finely distinguished events anda single analysis of a pooled event. It requires theselection of a set of medically related issues, poten-tially exchangeable with respect to their dependenceon treatment and covariates.A key concept underlying the proposed methodol-

ogy is that a set of K issues have been prespecifiedas important and likely to be biologically and clin-ically related. It would be a misuse of the methodto try very many subsets of a large set of issues,stopping only when an “interesting” result is ob-tained. A similar caution pertains to selection ofcovariates—only those with some prior justificationshould be included. When too many extraneous co-variates are entered, the estimated variance compo-nents may lead to over-shrinking those effects and/orinteractions that are present, and lead to overly nar-row confidence intervals.

Page 21: Multivariate Bayesian Logistic Regression for …of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a

MBLR FOR CLINICAL SAFETY DATA 21

The methodology is exploratory in nature, in thatthe analyst is encouraged to examine the relation-ship of the adverse event frequencies to multiplecovariates and to treatment by covariate interac-tions. These more complicated models may not beestimable by a standard logistic regression algorithmbecause the data are often too sparse for the num-ber of parameters being estimated. Two strategiesare used to cope with this sparsity. First, a Bayesianmodel allows the analysis of each issue to borrowstrength from the other issues, assumed medicallyrelated so that this sort of averaging is not unrea-sonable. The fitting of the MBLR model is accom-plished by the multiple runs of a maximum like-lihood algorithm, together with the estimation ofBayes factors for a range of values of the unknownvariance components. The MBLR algorithm is in-tended to be able to measure the degree to which theissues have similar main effects and interactions withtreatment on the logit scale. The hierarchical priorspecification in equations (3)–(5) allows for partialaveraging across issues for those model coefficientsthat seem similar. There is also a tendency for thetreatment × covariate interaction coefficients to beshrunk toward the null value of 0, to an extent con-trolled by an estimated EB variance parameter asin (6). This shrinkage is intended to offset the ten-dency of exploratory methods to find “significant”subgroup effects purely by chance. Second, a com-parison method, denoted regularized logistic regres-sion, sets particular values of the variance compo-nents in the empirical Bayes model to emulate stan-dard logistic regression (without interactions) whileavoiding computational problems and inestimableeffects that can be caused by low counts. This mod-ification is designed to hardly affect the estimatesfrom standard logistic regression when the data arenot sparse.Since treatment-by-covariate interaction coeffi-

cients are difficult to interpret, the sums of the treat-ment main effect plus the covariate interactions arealso presented and interpreted as the estimated treat-ment effect that would hold for subjects in the sub-group identified by the covariate value. This com-bined effect is apropos to the search for a subjectsubgroup that might be particularly vulnerable toan adverse reaction to treatment.The goal is to allow safety review of a large amount

of clinical data using a sophisticated methodologythat can nevertheless be mastered by those withoutadvanced training in Bayesian methods or the the-

ory of variance component estimation, or the inter-pretation of large masses of sparse data. Section 5shows an example partial analysis of 10 medicallyrelated issues within a pool of 8 studies involvingover 5700 subjects with a model involving treat-ment, 5 covariates involving 13 defined covariatevalues, and including examination of treatment-by-covariate interactions. Section 6 describes a simula-tion study that measures the large gains in efficiencythat MBLR can attain, compared to separate analy-ses for each issue. The striking results in Display 11show the ability of Bayesian modeling to greatly re-duce bias due to post-hoc selection of the most sig-nificant contrast.The Multivariate Bayesian Logistic Regression is

a technique that can add to the tools available to thedata analyst or medical reviewer. The method doesnot eliminate the need for experimental replicabil-ity and convergence with medical knowledge. A sig-nificant Bayesian result found in one sample thatis not replicable may just be indicative of a sam-pling problem. With that said, it is hoped that thisnew tool will ease the burden of seeing the forestfor the trees during the analysis of clinical safetydata.

ACKNOWLEDGMENTS

Thanks to Sally Cassells, Rick Ferris and RaveHarpaz for their data management and computerassistance, and to Ram Tiwari and Brad McEvoyfor extensive and helpful comments on an earlierversion of this paper. Any remaining flaws in theconception and execution of this research are solelydue to the author.

REFERENCES

Berry, S. M. and Berry, D. A. (2004). Accounting for mul-tiplicities in assessing drug safety: A three-level hierarchicalmixture model. Biometrics 60 418–426. MR2066276

Carlin, B. P. and Louis, T. A. (2000). Bayes and Empiri-cal Bayes Methods for Data Analysis, 2nd ed. Chapman &Hall/CRC, Boca Raton, FL.

Dean, B. (2003). Adverse drug events: What’s the truth?Qual. Saf. Health Care 12 165.

Gelman, A., Jakulin, A., Pittau, M. G. and Su, Y.-S.

(2008). A weakly informative default prior distribution forlogistic and other regression models. Ann. Appl. Stat. 21360–1383. MR2655663

Lee, Y. and Nelder, J. A. (1996). Hierarchical general-ized linear models. J. Roy. Statist. Soc. Ser. B 58 619–678.MR1410182

Page 22: Multivariate Bayesian Logistic Regression for …of clinical safety data called multivariate Bayesian logistic regression (MBLR). Parallel logistic regression models are fit to a

22 W. DUMOUCHEL

Lee, Y., Nelder, J. A. and Pawitan, Y. (2006). Gener-

alized Linear Models with Random Effects. Unified Analy-sis via H-Likelihood. Monographs on Statistics and AppliedProbability 106. Chapman & Hall/CRC, Boca Raton, FL.MR2259540

Meng, X.-L. (2009). Decoding the H-likelihood. Statist. Sci.24 280–293. MR2757430

Searle, S. R., Casella, G. and McCulloch, C. E. (1992).Variance Components. Wiley, New York. MR1190470