thomas mcandrew, bjorn redfors, aaron crowley, yiran … cox...2 t.mcandrewetal....

15
Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=cjas20 Journal of Applied Statistics ISSN: 0266-4763 (Print) 1360-0532 (Online) Journal homepage: https://www.tandfonline.com/loi/cjas20 How Cox models react to a study-specific confounder in a patient-level pooled dataset: random effects better cope with an imbalanced covariate across trials unless baseline hazards differ Thomas McAndrew, Bjorn Redfors, Aaron Crowley, Yiran Zhang, Shmuel Chen, Mordechai Golomb, Maria C. Alu, Dominic P. Francese, Ori Ben- Yehuda, Akiko Maehara, Gary S. Mintz, Gregg W. Stone & Paul L. Jenkins To cite this article: Thomas McAndrew, Bjorn Redfors, Aaron Crowley, Yiran Zhang, Shmuel Chen, Mordechai Golomb, Maria C. Alu, Dominic P. Francese, Ori Ben-Yehuda, Akiko Maehara, Gary S. Mintz, Gregg W. Stone & Paul L. Jenkins (2019): How Cox models react to a study-specific confounder in a patient-level pooled dataset: random effects better cope with an imbalanced covariate across trials unless baseline hazards differ, Journal of Applied Statistics To link to this article: https://doi.org/10.1080/02664763.2019.1573216 Published online: 31 Jan 2019. Submit your article to this journal View Crossmark data

Upload: others

Post on 07-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Thomas McAndrew, Bjorn Redfors, Aaron Crowley, Yiran … Cox...2 T.MCANDREWETAL. asafixedorrandomeffect.Stratifyingallowstheper-trialbaselinehazardratestotake anyform,whilebothfixedandrandomeffectsmodelsstiffenassumptionsabouthowthe

Full Terms & Conditions of access and use can be found athttps://www.tandfonline.com/action/journalInformation?journalCode=cjas20

Journal of Applied Statistics

ISSN: 0266-4763 (Print) 1360-0532 (Online) Journal homepage: https://www.tandfonline.com/loi/cjas20

How Cox models react to a study-specificconfounder in a patient-level pooled dataset:random effects better cope with an imbalancedcovariate across trials unless baseline hazardsdiffer

Thomas McAndrew, Bjorn Redfors, Aaron Crowley, Yiran Zhang, ShmuelChen, Mordechai Golomb, Maria C. Alu, Dominic P. Francese, Ori Ben-Yehuda, Akiko Maehara, Gary S. Mintz, Gregg W. Stone & Paul L. Jenkins

To cite this article: Thomas McAndrew, Bjorn Redfors, Aaron Crowley, Yiran Zhang, ShmuelChen, Mordechai Golomb, Maria C. Alu, Dominic P. Francese, Ori Ben-Yehuda, Akiko Maehara,Gary S. Mintz, Gregg W. Stone & Paul L. Jenkins (2019): How Cox models react to a study-specificconfounder in a patient-level pooled dataset: random effects better cope with an imbalancedcovariate across trials unless baseline hazards differ, Journal of Applied Statistics

To link to this article: https://doi.org/10.1080/02664763.2019.1573216

Published online: 31 Jan 2019.

Submit your article to this journal

View Crossmark data

Page 2: Thomas McAndrew, Bjorn Redfors, Aaron Crowley, Yiran … Cox...2 T.MCANDREWETAL. asafixedorrandomeffect.Stratifyingallowstheper-trialbaselinehazardratestotake anyform,whilebothfixedandrandomeffectsmodelsstiffenassumptionsabouthowthe

JOURNAL OF APPLIED STATISTICShttps://doi.org/10.1080/02664763.2019.1573216

NOTE

How Coxmodels react to a study-specific confounder in apatient-level pooled dataset: random effects better cope withan imbalanced covariate across trials unless baseline hazardsdiffer

Thomas McAndrew a, Bjorn Redforsa,b, Aaron Crowleya, Yiran Zhanga, ShmuelChena,c, Mordechai Golomba,c, Maria C. Alua,d, Dominic P. Francesea, OriBen-Yehudaa,d, Akiko Maeharaa,d, Gary S. Mintza,d, Gregg W. Stonea,d andPaul L. Jenkinse

aCardiovascular Research Foundation, New York, NY, USA; bDepartment of Cardiology, Sahlgrenska UniversityHospital, Gothenburg, Sweden ; cHadassah Medical Center, Jerusalem, Israel ; dColumbia University MedicalCenter, New York, NY, USA; eBassett Research Institute, Cooperstown, NY, USA

ABSTRACTCombining patient-level data from clinical trials can connect rarephenomena with clinical endpoints, but statistical techniquesapplied to a single trial may become problematical when trials arepooled. Estimating the hazard of a binary variable unevenly dis-tributed across trials showcases a common pooled database issue.We studied how an unevenly distributed binary variable can com-promise the integrity of fixed and random effects Cox proportionalhazards (cph) models. We compared fixed effect and random effectscph models on a set of simulated datasets inspired by a 17-trialpooled database of patients presenting with ST segment elevationmyocardial infarction (STEMI) and non-STEMI undergoing percuta-neous coronary intervention. An unevenly distributed covariate canbias hazard ratio estimates, inflate standard errors, raise type I error,and reduce power. While uneveness causes problems for all cphmodels, random effects suffer least. Compared to fixed effect mod-els, random effects suffer lower bias and trade inflated type I errorsfor improved power. Contrasting hazard rates between trials preventaccurate estimates from both fixed and random effects models.

ARTICLE HISTORYReceived 28 June 2018Accepted 3 December 2018

KEYWORDSCox proportional hazards;frailty; fixed effects; randomeffects; pooling data

1. Introduction

Pooling data from several clinical trials [12,15,16] can create robust results for endpointstoo rare to study within any single trial [3,4,6,10,14,17–19], but analyzing patient differ-ences from a study-level covariate unequally distributed across trials (due to differencesin eligibility criteria, definitions, or other study-specific factors) could lead to inaccurateconclusions [2,5,8]. Cox proportional hazards (cph) models [7] associate endpoints withcovariates controlling between-trial differences through stratification, or by including trial

CONTACT Thomas McAndrew [email protected]; [email protected] Cardiovascular ResearchFoundation, New York, NY, USA

© 2019 Informa UK Limited, trading as Taylor & Francis Group

Page 3: Thomas McAndrew, Bjorn Redfors, Aaron Crowley, Yiran … Cox...2 T.MCANDREWETAL. asafixedorrandomeffect.Stratifyingallowstheper-trialbaselinehazardratestotake anyform,whilebothfixedandrandomeffectsmodelsstiffenassumptionsabouthowthe

2 T. MCANDREW ET AL.

as a fixed or random effect. Stratifying allows the per-trial baseline hazard rates to takeany form, while both fixed and random effects models stiffen assumptions about how thehazard rate varies by trial.

Past studies compared the performance of these three types of models [1,9,13] by mod-ifying a simulated trial effect (i.e. differing numbers of trials, more varied trial baselinehazards), but previous studies have not consideredmodeling a study-level binary covariatewith only one disease level per clinical trial. If we can only observe one level of a binarycovariate per trial, stratified models cannot fit this data and we turn to fixed or randomeffects models. Studying an imbalanced study-level covariate across pooled clinical tri-als will answer whether we can glean sensible statistical estimates from clinical trials withvaried purposes.

We simulated time to event data for a set of trials where each trial contains patientswith only one level of a binary study-level covariate. After simulation, we compared fixedand random effects cphmodel’s ability to estimate our binary study-level covariate’s hazardratio in the presence of confounding by trial.

2. Methods

2.1. Pooled DES study data

We pooled data from 17 coronary stent trials comparing drug-eluting stents (first andsecond generation) to bare-metal stents from 2006 to 2013 into a single dataset (pooled-DES). This enabled us to study how the 26,564 patient’s clinical presentation (ST segmentelevation myocardial infarction [STEMI] versus non-ST segment elevation myocardialinfarction [NSTEMI], two prevalent disease types in cardiovascular science) impacts mor-tality, myocardial infarction, bleeding, revascularization, and stent thrombosis at 5 yearswhile also adjusting for trial-specific differences in baseline hazard rates.

The majority of trials enrolled NSTEMI patients and only one pooled trial(HORIZONS-AMI) enrolled STEMI patients. This pooled dataset inspired our simulateddatasets to capture key characteristics: (i) a single clinical presentation per trial, (ii) trial-specific baseline hazard rates, and (iii) an association between clinical presentation andendpoint.

2.2. Simulated data

Our simulated trial data considered: (i) within trial assignment to disease type A or B,(ii) the number of pooled trials, (iii) variable baseline hazard rates (frailty) between trials,and (iv) a hazard ratio of 1.0 (to study type I error) and 2.0 (to study power) betweenpatients with disease type A versus type B. We fixed the number of patients studied to2000, generated 1000 simulated datasets, and uniformly divided patients into T trials. Foreach trial t, we either assigned all patients to disease type Bwith probability p (unevenness)or all patients to disease type A with probability 1−p. Thus, within each trial, disease typewas a constant (either all A or all B). Assuming constant hazards (h), we drew event andcensoring times from an exponential distribution

p(T = t) ∝ e−ht

with a 15% event rate at 1825 days, and 25% censoring rate at 1825 days.

Page 4: Thomas McAndrew, Bjorn Redfors, Aaron Crowley, Yiran … Cox...2 T.MCANDREWETAL. asafixedorrandomeffect.Stratifyingallowstheper-trialbaselinehazardratestotake anyform,whilebothfixedandrandomeffectsmodelsstiffenassumptionsabouthowthe

JOURNAL OF APPLIED STATISTICS 3

We defined the hazard rates for types A and B as

h(B) = h0elog(HR)+s

h(A) = h0es

s ∼ LN(ν, τ),

where h0 represents the baseline hazard, HR is the assumed hazard ratio between diseasetypes A and B, and s is a trial-specific quantity (one per trial) drawn from a Log-Normaldistribution (LN) centered at ν with standard deviation τ . Our simulated data: (i) variedthe number of pooled trials, T, from 3 to 10, (ii) unevenly assigned the proportion (p) ofpatients among all pooled data to either type A or B, and (iii) multiplied half of all pooledtrials baseline hazard rates by ν on average (contrasting baseline hazard rates). Unevenlyassigning disease types per trial and separating baseline hazard rates was done to mimicour pooled trial data and demonstrate how pooling trials can bias the cph model.

2.3. Survival analysis

We estimated hazard ratios from the simulated data using (i) a stratified cph model (cph-S), (ii) a cph model including trial as a fixed effect (cph-F), (iii) a cph model includingtrial as a Gamma distributed random effect (cph-G), and (iv) a cph model including trialas a Log-Normal distributed random effect (cph-L). Each model controls for unmeasuredtrial-related effects differently.

The cph-S model breaks the overall baseline hazard rate into separate trial-specificbaseline hazard rates. Given the pth patient within the τ th trial:

hτ ,p(t|x,β) = hτ ,0(t) × g(x,β)

describes a trial-specific baseline hazard rate(hτ ,0(t)

)and patient-specific function(

g(x,β))that depends on patient p’s set of covariates (x) and population parameters (β).

While we can include patient-specific covariates in this model, we focus on models thatconsider trial and a single binary covariate. By separating trials into strata, the cph-Smodelcopes with non-proportional hazards across trials but cannot estimate a hazard ratio froma trial with only a single level of the effect of interest. Stratified models can only includetrials that contain both levels of a binary variable.

The cph-F model adjusts for the effect of interest and trial enrollment while assumingpatients follow equal hazards through time. Mathematically,

hp(t|x,β) = h0(t) × g(xτ ,p,β),

where x includes effects for trial and covariates for the patient. The cph-F model estimateshazard ratios fromall available data but assumes proportional hazards between trials. Com-pared to stratified models that exclude trials with a single level of a binary covariate butallow differing baseline hazards per trial, the fixed model assumes a single baseline hazardacross trials and includes all trial data (less selection bias).

Page 5: Thomas McAndrew, Bjorn Redfors, Aaron Crowley, Yiran … Cox...2 T.MCANDREWETAL. asafixedorrandomeffect.Stratifyingallowstheper-trialbaselinehazardratestotake anyform,whilebothfixedandrandomeffectsmodelsstiffenassumptionsabouthowthe

4 T. MCANDREW ET AL.

The cph-G and cph-L models suppose differences in hazard rates across trial follow adistribution. Let

hs,p(t|x,β) = h0(t) × φs × g(x,β),

where the trial-specific effect φs (randomly drawn from the Gamma or Log-normal dis-tribution) multiplies each patient’s hazards rate. We multiply patient hazards by randomdraws from φ to govern patient differences within trial.

Stratifying, adjusting as a fixed effect, and introducing a random effect for trial representthe three most common paradigms to handle pooled trial data in clinical statistics.

2.4. Statistical inference

Fromour 17-trial pooled database, we estimated baseline hazard rate’s posterior probability(h) given follow-up times (d) and events (e) as

p(h|e, d) ∝ h∑

e+α × e−(∑

d+γ )

considering an exponential model with gamma prior generating the time to event data,and uninformative Gamma prior parameters α = 10−5 and γ = 10−5.

We related any two variables using linear regression with non-informative priors for theintercept (b), slope (m), and variance (σ 2). Mathematically, we relate two variables V andH by

p(V = v|H = h) ∼ N(b +

J∑i=1

mi · hi, σ 2

),

where J=1 for fitting a linear model and J=2 for fitting a quadratic model, and computeposterior probabilities for b and mi assuming a Normally distributed N (

0, 10−5) priorprobability and assuming a Gamma distributed G(4 × 10−2, 4 × 10−2) prior probabilityfor σ 2.

We compared statistics between two models by fitting a polynomial and averaging overthe number of trials (T), unevenness (p), or baseline hazard multiplier (ν) as

S̄ = 1r − q

∫ v=r

v=qf (v) dv,

where f (v) represents a linear or quadratic model, and reported the probability (ρ) anytwo models difference is within a relative % with respect to R, or mathematically,

ρR, = p(∣∣∣∣X − Y

R

∣∣∣∣ <

).

We also considered absolute differences between two groups (X and Y ) as

δ = p (|X − Y| < ) .

We designated ρR at = 5%, δ at = 1 unless stated otherwise, and considered ρ or δ

values < 5% significant.

Page 6: Thomas McAndrew, Bjorn Redfors, Aaron Crowley, Yiran … Cox...2 T.MCANDREWETAL. asafixedorrandomeffect.Stratifyingallowstheper-trialbaselinehazardratestotake anyform,whilebothfixedandrandomeffectsmodelsstiffenassumptionsabouthowthe

JOURNAL OF APPLIED STATISTICS 5

Figure 1. After collecting 5-year event rates from 17 cardiovascular stent trials (A) we found a strongimbalance in clinical presentation, variable baseline hazard rates for 10 different clinical endpoints at5 years (B), and differences between fixed and random effects cph models (right). (A) Among 17 clin-ical trials, only 1/17 or 5.89% of trials studied STEMI patients. (B) We estimated baseline hazard ratesusing an exponential-gamma model and reported the relative difference in per-trial baseline hazardscompared to the average (5.15 × 10−5) over all 17 trials and 10 events. The majority of event’s baselinehazards noticeably fluctuated by trial. (Right) Fixed effect and random effect cph reported different lev-els of point and interval estimates and likely related to the imbalance among clinical presentation andvariable baseline hazards by trial.

3. Results

Basing simulation parameters on real data that pooled trials enrolling patients of only onedisease type (Figure 1), butmaintaining similar baseline hazard rates, random effectsmod-els showed less bias, smaller standard error, variable type I error, and increased power overfixed effects models (Figures 2– 4). Exploring assorted baseline hazard rates across trials,we found both fixed and random effects models suffered (Figure 5). Assigning a single dis-ease type (all A or all B) per trial prevents us from using any model stratified by trial [11],decreases power and biases fixed models, and random effects models likely traded inflatedtype I error for stronger hazard ratios and statistical confidence.

3.1. Real data: 17 pooled cardiovascular clinical trials

We found a heavy imbalance between STEMI/NSTEMI patients (Figure 1A) and vari-able baseline hazard rates (Figure 1B) among 10 different clinical endpoints across 17STEMI/NSTEMI stent trials. This imbalance between disease types (STEMI/NSTEMI)

Page 7: Thomas McAndrew, Bjorn Redfors, Aaron Crowley, Yiran … Cox...2 T.MCANDREWETAL. asafixedorrandomeffect.Stratifyingallowstheper-trialbaselinehazardratestotake anyform,whilebothfixedandrandomeffectsmodelsstiffenassumptionsabouthowthe

6 T. MCANDREW ET AL.

Figure 2. For T = 10 trials and an even number of trials suffering disease type A versus type B (p =1/2), random effects models had lower bias (A), smaller standard errors (B), variable type I error (C), andsuperior power (D). Random effects models balanced bias and standard error better than fixed models,and this balance resulted in more powerful inference.

and variable baseline hazards likely caused disagreement between the fixed and randomeffects models (Figure 1, Right). Our pooled data set contained one trial studying STEMIpatients (HORIZONS-AMI) and 16 trials studying NSTEMI patients. After estimatingbaseline hazard rates for all 10 endpoints among 17 trials, we found an overall mean equalto 5.15 × 10−5 and relative trial differences from a minimum −1.00 to a maximum 7.95times the mean. Studying an imbalanced STEMI/NSTEMI covariate with variable baselinehazards led to large disagreement between fixed and random effects models for: death, car-diac death, non-cardiac death, myocardial infarction, target vessel revascularization, andstent thrombosis at 1825 days. Our simulated data aimed to tease apart why we found thesedisagreements between models.

3.2. Simulations

For T=10 trials (Figure 2), the fixed effects model biased hazard ratio estimates more(ρRan. < 0.001, Figure 2 A), had inflated standard errors (ρRan. < 0.001, Figure 2 B), com-parable type I error (ρRan. = 0.069, Figure 2 C), and diminished power (ρRan. < 0.001,Figure 1D) compared to random effects models. Compared to the normal random effectsmodel, the gamma random effects model had comparable bias (ρNormal,0.5 = 1.0), smallerstandard error (ρNormal = 0.044), elevated type I error (ρNormal = 0.016), and improvedpower (ρNormal < 0.001). The bias did not significantly differ between random effects

Page 8: Thomas McAndrew, Bjorn Redfors, Aaron Crowley, Yiran … Cox...2 T.MCANDREWETAL. asafixedorrandomeffect.Stratifyingallowstheper-trialbaselinehazardratestotake anyform,whilebothfixedandrandomeffectsmodelsstiffenassumptionsabouthowthe

JOURNAL OF APPLIED STATISTICS 7

Figure 3. Varying the number of pooled trials from 3 to 10, random effects models maintained lowerbias, standard error, and type I error compared to fixed models. Random effects models also had higherpower than fixed models. An increasing number of trials also raised fixed model bias and standard errorwhile doing the opposite for random effects models; it shrunk type I error for both fixed and randommodels. Poolingmore trials also raised the power of random effectsmodels while decreasing fixedmod-els’ power. Combining more trials intensified uneven patient assignment, breaking the fixed model andstrengthening random effects models.

models, but compared to the normal model, the gamma model’s smaller standard errormagnified type I error and strengthened power. We uncovered each model’s strengths andweaknesses by studying a static T=10 trials. Differing the number of trials and study-ing the same model properties helped determine how random effects models compare tofixed effects models with fewer pooled trials (and so a smaller number of different baselinehazard rates) and with larger pooling studies (and so a larger number of different baselinehazard rates).

When varying the number of trials (Figure 3) averaging over simulations, and com-pared to fixed effectmodels, random effectsmodelsmaintained a lower bias (ρFix. = 0.026,Figure 3A), reduced standard error (ρFix. = 0.039, Figure 3B), had variable type I error(ρFix. < 0.001, Figure 3C), and improved power (ρFix. < 0.001, Figure 3D). The fixed effectmodel biased hazard ratiosmore than randomeffectsmodels and resulted in elevated type Ierrors, but inflated standard errors robbed the fixedmodel of power. A similar power/type Ierror tradeoff persisted between random effects models when varying the number of trials.The gamma model had smaller standard error than the normal model (ρNormal = 0.001),more type I error than the normal model (ρNormal = 0.027), and similar power to thenormal model (ρNormal = 1.0).

Page 9: Thomas McAndrew, Bjorn Redfors, Aaron Crowley, Yiran … Cox...2 T.MCANDREWETAL. asafixedorrandomeffect.Stratifyingallowstheper-trialbaselinehazardratestotake anyform,whilebothfixedandrandomeffectsmodelsstiffenassumptionsabouthowthe

8 T. MCANDREW ET AL.

Figure 4. When each trial has a single disease type (A or B), all models performed best with an evenproportion of type A and B patients across all trials. Random effectsmodels had the smallest relative bias(A), lower standard error (B), variable type I error (C), and stronger power (D). We observed no differencein relative bias between normal and gammamodels, but the gammamodel had smaller standard error,inflated type I error, and higher power than the normal model. Uneven (p) pooling biases hazard ratioestimates and decreased power in a pooled study.

Pooling studies with an unequal ratio of disease typesA to B, we compared an intercept-only versus quadratic model of relative bias, standard error, type I error, and power,averaged over unevenness from 10% to 90%, and found random effects and fixed effectsmodels increased bias (Figure 4A δ = 0.048), inflated standard error (Figure 4B δ =0.039), reduced type I error (Figure 4C δ = 0.047), and reduced power (Figure 4D δ =0.049). Fixed models biased hazard ratio estimates 2.8 times more than random effectsmodels (ρNormal = 0.026), inflated standard errors 1.14 times more than random effectsmodels (ρNormal = 0.11), and lowered power 1.96 times more than random effects mod-els (ρNormal < 0.001). Compared to fixed models, the normal model shrunk type I error1.56 times (ρNormal = 0.22) but the gammamodel raised type I error 1.76 times (ρNormal =0.97). Random effects models better managed bias, standard error, and power, but inflatedtype I error compared to fixed models.

Comparing a quadratic versus intercept-only fit to relative bias, standard error, type Ierror, and power, separating baseline hazards between trials, and fitting random effectsmodels (Figure 5), inflated type I error (δ < 0.001, Figure 5C) and weakened power(δ < 0.001, Figure 5D). Fixedmodels sustained increasing bias (δ < 0.001, Figure 5A) andtype I error (δ < 0.001, Figure 5 C). Fixed models also suffered from large standard error(δ < 0.001, Figure 5B) and low power (δ < 0.001) but increasing between-trial baseline

Page 10: Thomas McAndrew, Bjorn Redfors, Aaron Crowley, Yiran … Cox...2 T.MCANDREWETAL. asafixedorrandomeffect.Stratifyingallowstheper-trialbaselinehazardratestotake anyform,whilebothfixedandrandomeffectsmodelsstiffenassumptionsabouthowthe

JOURNAL OF APPLIED STATISTICS 9

Figure 5. Separating between-trial baseline hazard rates inflated random effect models’ standard error(B) and decreased power (D). While differing baseline hazard rates damaged random effects modelpower, these models maintained a small bias (A) compared to fixed models. We saw heightened biasestranslated to inflated type I errors in both fixed and gammamodels (C). Dissimilar baseline hazard ratesdamaged random effects models.

hazard rates escalated the normal model’s standard error (m ± std.err. = 0.062 ± 0.225)and decreased power (m ± std.err. = −12.595 ± 1.512). Diverse baseline hazards alsoinflated the gamma model’s type I error (m ± std.err. = 19.337 ± 0.773). Contrastingbaseline hazard rates between trials damaged random effects models.

Although unevenly distributing patient disease types across trials damaged hazard ratioestimates, random effects models stayed unbiased and maintained power compared tofixed effect models. Disparate baseline hazards damaged random effects models. We sawour simulated results replicated in a 17-trial dataset; random effectsmodels performed bestwhen confronted with an unevenly distributed covariate.

4. Discussion

Our simulated experiment showed both fixed and random effects models fail when study-ing a covariate unevenly distributed across trials that have varying baseline hazard rates.We applied fixed and randomeffectmodels to a real 17-trial dataset of STEMI andNSTEMIpatients and found worse performance for fixed effect models than random effects mod-els when trials had similar baseline hazard rates. Unlike previous studies that scrutinizecluster variability, this work tests model robustness under a fixed-effect’s imbalance acrossclusters.

Page 11: Thomas McAndrew, Bjorn Redfors, Aaron Crowley, Yiran … Cox...2 T.MCANDREWETAL. asafixedorrandomeffect.Stratifyingallowstheper-trialbaselinehazardratestotake anyform,whilebothfixedandrandomeffectsmodelsstiffenassumptionsabouthowthe

10 T. MCANDREW ET AL.

Clinical science strives to better understand scarce disease types, but these scarce diseasetypes and trial diversity will intensify survival rate variability. When we pool clinical datato study rare events, we need to collect comprehensive outcomes data from many trials orresort to more robust models. This increased survival variability may result in contrastingbaseline hazard rates.

After comparing fixed and random effects models, we found random effects modelsperformed better in bias, standard error, and power, but failed when pooling trials withdissimilar baseline hazards. We speculate the difficulty estimating a single set of param-eters for dissimilar baseline hazards causes random effects models to fail. When poolingtrials with contrasting baseline hazards, including a covariate that groups trials by baselinehazards may help.

Compared to fixed models, random effects models take better advantage of trial infor-mation to reduce hazard ratio estimates’ standard errors. Stratifiedmodels estimate covari-ate effects within each trial, and with only one covariate level per trial, have no use. Fixedeffectmodels do take advantage of covariate effects across all trials, but estimating a param-eter for each trial creates a more uncertain covariate effects hazard ratio. Random effectsmodels isolate disease type (as a fixed effect) from trial effects (as a random effect) andapply additional trial data toward better estimating the effect of disease.

We limited ourselves to studying a binary covariate unevenly distributed across trials,simulating data with constant hazards, and modeling time to event with cph models. Acontinuous covariate could behave differently than a binary variable under the same con-ditions. Constant hazards do not occur in typical trials, but cph models ignore baselinehazard rates. Many other models exist to model time to event, such as accelerated failuretime models, and may prove more useful than cph. We also did not model more compli-cated clinical characteristics that could further confound the relationship between diseaseand time to event data. This study limitationsmay inspire more realistic simulated datasetsand their effect on the cph model.

We plan to study whether cph models perform better with a smaller set of trials withcovariate balance compared to a larger set of unbalanced trials. We also plan to study con-tinuous variables split across pooled trials and alternative models of time-to-event datawithin the context of unevenly distributed variables across a pooled database.

In most cases, random effects models stand up to an unevenly distributed binarycovariate so long as each trial has similar baseline hazard rates.

Acknowledgements

The authors thank Karl Lherisson and the Cardiovascular Research Foundation’s InformationTechnology department for computational resources.

Disclosure statement

No potential conflict of interest was reported by the authors.

ORCID

Thomas McAndrew http://orcid.org/0000-0002-6362-9231

Page 12: Thomas McAndrew, Bjorn Redfors, Aaron Crowley, Yiran … Cox...2 T.MCANDREWETAL. asafixedorrandomeffect.Stratifyingallowstheper-trialbaselinehazardratestotake anyform,whilebothfixedandrandomeffectsmodelsstiffenassumptionsabouthowthe

JOURNAL OF APPLIED STATISTICS 11

References

[1] P.K.Andersen, J.P. Klein andM.J. Zhang,Testing for centre effects inmulti-centre survival studies:aMonte Carlo comparison of fixed and random effects tests, Stat.Med. 18 (1999), pp. 1489–1500.

[2] N.G. Berman and R.A. Parker,Meta-analysis: neither quick nor easy, BMCMed. Res.Methodol.2 (2002), pp. 10.

[3] A. Caixeta, M.B. Leon, A.J. Lansky, E. Nikolsky, J. Aoki, J.W. Moses, J. Schofer, M.C. Morice, E.Schampaert and A.J. Kirtane, 5-year clinical outcomes after Sirolimus-eluting stent implantation:insights from a patient-level pooled analysis of 4 randomized trials comparing Sirolimus-elutingstents with bare-metal stents, J. Am. Coll. Cardiol. 54 (2009), pp. 894–902.

[4] C.P. Cannon, B.A. Steinberg, S.A.Murphy, J.L.Mega and E. Braunwald,Meta-analysis of cardio-vascular outcomes trials comparing intensive versusmoderate statin therapy, J. Am.Coll. Cardiol.48 (2006), pp. 438–445.

[5] T.C. Chalmers, Problems induced by meta-analyses, Stat. Med. 10 (1991), pp. 971–980.[6] P.S. Collaboration, S Lewington, R Clarke, N Qizilbash, R Peto and R Collins, Age-specific rel-

evance of usual blood pressure to vascular mortality: a meta-analysis of individual data for onemillion adults in 61 prospective studies, The Lancet 360 (2002), pp. 1903–1913.

[7] D.R. Cox and D. Oakes, Analysis of survival data 21,CRC Press, 1984.[8] M.D. Flather,M.E. Farkouh, J.M. Pogue and S. Yusuf, Strengths and limitations ofmeta-analysis:

larger studies may be more reliable, Control. Clin. Trials. 18 (1997), pp. 568–579.[9] D.V.Glidden andE.Vittinghoff,Modelling clustered survival data frommulticentre clinical trials,

Stat. Med. 23 (2004), pp. 369–388.[10] R.G. Hart, L.A. Pearce andM.I. Aguilar,Meta-analysis: antithrombotic therapy to prevent stroke

in patients who have nonvalvular atrial fibrillation antithrombotic therapy in atrial fibrillation,Ann. Intern. Med. 146 (2007), pp. 857–867.

[11] E.T. Lee and J. Wang, Statistical methods for survival data analysis 476, John Wiley & Sons,2003.

[12] B. Lo, Sharing clinical trial data: maximizing benefits, minimizing risk, JAMA 313 (2015), pp.793–794.

[13] A.R. Localio, J.A. Berlin, T.R. Ten Have and S.E. Kimmel, Adjustments for center in multicenterstudies: an overview, Ann. Intern. Med. 135 (2001), pp. 112–123.

[14] M.A.Mamas, K. Ratib,H. Routledge, F. Fath-Ordoubadi, L.Neyses, Y. Louvard,D.G. Fraser andJ. Nolan, Influence of access site selection on pci-related adverse events in patients with STEMI:meta-analysis of randomised controlled trials, Heart 98 (2012), pp. 303–311.

[15] M.M. Mello, J.K. Francer, M. Wilenzick, P. Teden, B.E. Bierer and M. Barnes, Preparing forresponsible sharing of clinical trial data (2013).

[16] J.S. Ross and H.M. Krumholz, Ushering in a new era of open science through data sharing: thewall must come down, JAMA 309 (2013), pp. 1355–1356.

[17] K.D. Sjauw, A.E. Engström,M.M. Vis, R.J. van der Schaaf, J. Baan, Jr, K.T. Koch, R.J. deWinter,J.J. Piek, J.G. Tijssen and J.P. Henriques, A systematic review and meta-analysis of intra-aorticballoon pump therapy in st-elevation myocardial infarction: should we change the guidelines?,Eur. Heart. J. 30 (2009), pp. 459–468.

[18] C. Spaulding, J. Daemen, E. Boersma, D.E. Cutlip and P.W. Serruys, A pooled analysis of datacomparing sirolimus-eluting stents with bare-metal stents, New Engl. J. Med. 356 (2007), pp.989–997.

[19] P. G. Steg, D. L Bhatt, C. W Hamm, G. W Stone, C M. Gibson, K. W Mahaffey, S. Leonardi, T.Liu, S. Skerjanec, J. R Day, R. S Iwaoka, T. D Stuckey, H. S Gogia, L. Gruberg, W. J French, H.D White and R. A Harrington, Effect of cangrelor on periprocedural outcomes in percutaneouscoronary interventions: a pooled analysis of patient-level data, Lancet 382 (2013), pp. 1981–1992.

Appendix. Trial data simulation

Our trial data simulation (Algorithm 1) generated a one-patient-per-row dataset and columns con-taining: trial, treatment assignment, survival time, and whether or not the patient experienced (1) or

Page 13: Thomas McAndrew, Bjorn Redfors, Aaron Crowley, Yiran … Cox...2 T.MCANDREWETAL. asafixedorrandomeffect.Stratifyingallowstheper-trialbaselinehazardratestotake anyform,whilebothfixedandrandomeffectsmodelsstiffenassumptionsabouthowthe

12 T. MCANDREW ET AL.

didnot experience (0) an event.We cannot capture all the intricacies of randomized clinical trials, butmeant to capture trial variability and uneveness among a set of trials following patients up to 5 years.In particular, this simulation mimics a covariate either completely present or completely absent pertrial, and common to cardiovascular trial pooling. The pseudo-code below can be implemented inany number of programming languages capable of generating randomnumbers and simple algebraicoperations.

Algorithm 1 simulateTrialData: Algorithm to simulate trial dataInput: 5YrEventRate, 5YrCensorRate, HazardRatio, uneveness, studyVar, numberOf-

PooledPts, numberOfTrials, btwUneveness, baseHazFacOutput: allData

1: baseHzRate = log(

100100−5YrEventRate

)/1825

2: baseCensorRate = log(

100100−5YrCensorRate

)/1825

3:

4: perTrialN = patientPerTrial(numberOfTrials, numberOfPooledPts)5: flips = assignTreatment(numberOfTrials, btwUneveness)6:

7: for trial in numberOfTrials do8: if random.uniform(0,1) < 0.5 then9: studyEffect = random.Normal( log(baseHazFac), studyVar)10: else11: studyEffect = random.Normal(0, studyVar)12: end if13: N = perTrialN[trial]14: NA = N15: NB = 016:

17: flip = flips[trial]18: NA,NB = flipTreatment(flip,NA,NB)19:

20: hazardB = baseHzRate ∗ exp( log (HR) + studyEffect)21: hazardA = baseHzRate ∗ exp(studyEffect)22:

23: survTimeB = random.exponential(hazardB,NB) � Sample NB times24: survTimeA = random.exponential(hazardA,NA) � Sample NB times25:

26: censB = random.exponential(baseCensorRate,NB) � Sample NB times27: censA = random.exponential(baseCensorRate,NA) � Sample NA times28:

29: censB = min(censB,1825) � Following patients up to 5 years30: censA = min(censA,1825) � Following patients up to 5 years31:

32: dta = generateEventData(survTimeB,censB,survTimeA,censA)33: allData.append(dta)34: end for35: return allData

Page 14: Thomas McAndrew, Bjorn Redfors, Aaron Crowley, Yiran … Cox...2 T.MCANDREWETAL. asafixedorrandomeffect.Stratifyingallowstheper-trialbaselinehazardratestotake anyform,whilebothfixedandrandomeffectsmodelsstiffenassumptionsabouthowthe

JOURNAL OF APPLIED STATISTICS 13

Algorithm 2 patientPerTrial: Cut the total number of patients into trialsInput: numberOfTrials, numberOfPooledPtsOutput: perTrialN

cuts = random.uniform(0,1,numberOfTrials)cuts = cuts/sum(cuts)

for p in cuts doperTrialN.append( int(numberOfPooledPts ∗ p) )

end forreturn perTrialN

Algorithm 3 assignTreatment: determine whether a trial studies disease type A or BInput: numberOfTrials, btwUnevenessOutput: perTrialN

flips = []for trial in numberOfTrials do

if random.uniform(0,1) < btwUneveness thenflips.append(1)

elseflips.append(0)

end ifend forreturn flips

Algorithm 4 flipTreatment: Assign all patients to treatment A rather than BInput: flip, NA, NBOutput: NA,NB

if flip==1 thentemp=NANA = NBNB = temp

end ifreturn NA,NB

Page 15: Thomas McAndrew, Bjorn Redfors, Aaron Crowley, Yiran … Cox...2 T.MCANDREWETAL. asafixedorrandomeffect.Stratifyingallowstheper-trialbaselinehazardratestotake anyform,whilebothfixedandrandomeffectsmodelsstiffenassumptionsabouthowthe

14 T. MCANDREW ET AL.

Algorithm 5 generateEventData: Assign event and survival times for all patientsInput: survTimeB, censB, survTimeA,censAOutput: eventB,survTimeB,eventA,survTimeA

eventB = 1eventA = 1if survTimeB > censB then

survTimeB = censBeventB = 0

end ifdta.append( [’B’,eventB,survTimeB])if survTimeA > censA then

survTimeA = censAeventA = 0

end ifdta.append( [’A’,eventA,survTimeA])return dta