medical research council conference on biostatistics · medical research council conference on...

33
Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year 24th - 26th March 2014 | Queens' College Cambridge, UK MEDICAL RESEARCH COUNCIL CONFERENCE ON BIOSTATISTICS In celebration of the MRC Biostatistics Unit’s Centenary Year Contributed Talks Tuesday 25 March 2014

Upload: vuongmien

Post on 10-Jun-2018

239 views

Category:

Documents


0 download

TRANSCRIPT

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

MEDICAL RESEARCH COUNCIL CONFERENCE ON BIOSTATISTICS

In celebration of the MRC Biostatistics Unit’s Centenary Year

Contributed Talks Tuesday 25 March 2014

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

A modelling framework for exposure-lag-response associations Authors: Gasparrini, Antonio (presenting) London School of Hygiene and Tropical Medicine [email protected] Complex Observational and Longitudinal Data 09.00-09.15 Fitzpatrick Hall Keywords: exposure-lag-response distributed lag models delayed effects latency splines Abstract: In biomedical research, a health effect is frequently associated with protracted exposures of varying intensity, with the risk being dependent on the specific exposure pattern sustained in the past. This type of dependency is defined here as exposure-lag-response association. This issue is common to different types of exposures, such as environmental stressors, drugs or carcinogenic substances, among others, with a lag time spanning from seconds to decades. The main complexity of modeling and interpreting such phenomena lies in the additional temporal dimension needed to express the association, as the risk depends on both intensity and timing of past exposures. In this contribution, I illustrate a general statistical framework for such associations, established through the extension of distributed lag non-linear models, originally developed in time series analysis. The method is based on the definition of a cross-basis, obtained by the combination of two functions to flexibly model linear or nonlinear exposure-responses and the lag structure of the relationship. The framework is fully implemented in the R package dlnm. The methodology and software is illustrated with a examples on various applications in different settings, using time series, case-control, cohort and other longitudinal data, and validated through a simulation study. Advantages and limitations if compared to standard approaches, together with assumptions and interpretational issues, will be discussed. This modelling framework generalizes to various study designs and regression models, and can be applied to study the health effects of protracted exposures to environmental factors, drugs or carcinogenic agents, among others.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

Covariance modelling for bivariate longitudinal processes Authors: MacKenzie, Gilbert (presenting) University of Limerick, Centre of Biostatistics, Ireland [email protected] Xu, Jing Birkbeck College, London University, UK Complex Observational and Longitudinal Data 09.15-09.30 Fitzpatrick Hall Keywords: Covariance Models Longitudinal Data Bivariate Process Matrix Logarithm Data-Driven

Abstract: In many studies subjects are measured on several occasions with regard to multivariate response variables. Consider, as an example, a longitudinal randomized controlled trial of teletherapy for age-related macular degeneration (Hart, et al., 2002). Patients were randomly assigned to either radiotherapy or observation and distance visual acuity, y1(t), near visual acuity, y2(t), and contrast sensitivity, y3(t), were measured throughout the study. Modelling the covariance structures for such multivariate longitudinal data is usually more complicated than for the univariate case due to correlation between the responses at each time point, correlation within separate responses over time and cross-correlation between different responses at different times. Two approaches are commonly adopted: models with a Kronecker product covariance structure and multivariate mixed models with random coefficients. These approaches select the covariance structures from a limited set of potential candidate structures including compound symmetry, AR(1) and unstructured covariance, and very often assume that the data are sampled from multiple stationary stochastic processes. In this talk, we develop a method to model covariance structures for bivariate longitudinal data by extending the ideas of the modified Cholesky decomposition (Pourahmadi, 1999) and matrix-logarithmic covariance modelling (Chiu et al., 1996). Finally, we model the parameters in these matrices parsimoniously using regression models and use our new methods to analyze the bivariate response in the MRC's longitudinal trial of teletherapy in ARMD referenced above.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

Chain event graphs and missing data Authors: Hutton, Jane L (presenting) University of Warwick [email protected] Barclay, Lorna M University of Warwick Smith, Jim Q University of Warwick Complex Observational and Longitudinal Data 09.30-09.45 Fitzpatrick Hall Keywords: Chain Event Graphs Missing Not at Random Longitudinal data Bayesian Model Selection Abstract: Chain event graphs (CEGs) extend graphical models to address situations in which, after one variable takes a particular value, possible values of future variables differ from those following alternative values (Smith and Anderson, 2008, Thwaites et al 2010). These graphs are a useful framework for modelling discrete processes which exhibit strong asymmetric dependence structures, and are derived from probability trees by merging the vertices in the trees together whose associated conditional probabilities are the same. We exploit this framework to develop new classes of models where missingness is influential and data are unlikely to be missing at random. Context-specific symmetries are captured by the CEG. As models can be scored efficiently and in closed form, standard Bayesian selection methods can be used to search over a range of models. The selected maximum a posteriori model can be easily read back to the client in a graphically transparent way. The efficacy of our methods are illustrated using a longitudinal study from birth to age 25 of children in New Zealand, analysing their hospital admissions aged 18-25 years with respect to family functioning, education, and substance abuse aged 16-18 years. Of the initial 1265 people, 25% had missing data at age 16, and 20% had missing data on hospital admissions aged 18-25 years. More outcome data were missing for poorer scores on social factors. For example, 21% for mothers with no formal education compared to 13% for mothers with tertiary qualifications.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

Mechanistic versus marginal structural models for estimating the effect of HAART on CD4 counts Authors: Commenges, Daniel (presenting) Institut national de la santé et de la recherche médicale [email protected] Prague, Mélanie Institut national de la santé et de la recherche médicale Thiebaut, Rodolphe Institut national de la santé et de la recherche médicale Complex Observational and Longitudinal Data 10.00-10.15 Fitzpatrick Hall Keywords: Mechanistic marginal structural models treatment effect observational studies dynamical models Abstract: The problem of assessing the effect of a treatment on a marker in observational studies raises the difficulty that attribution of the treatment may depend on the observed marker values. This problem has been treated using marginal structural models relying on the counterfactual/potential response formalism. Another approach to causality is based on dynamical models. This approach allows incorporating biological knowledge naturally, and a continuum can be established between descriptive and mechanistic modeling. The mechanistic models involve distinguishing the model for the system and the model for the observations. Indeed, biological systems live in continuous time, and mechanisms can be expressed in the form of a system of differential equations. Inference in mechanistic models is challenging, particularly from a numerical point of view, but these models can yield much richer and reliable results. Because of the difficulty of inference in models based on stochastic differential equations, most models that have been developed are based on ordinary differential equations. The different approaches are illustrated by estimating the effect of highly active antiretroviral treatment (HAART) on CD4+ lymphocytes counts in an observational study of HIV infected subjects. In this context, mechanistic models are available: they are much more challenging numerically than marginal structural models but they give much more precise results.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

Dynamic modelling of kidney function with interventions at acute kidney injury occurrences Authors: Özgür, Asar (presenting) Lancaster Medical School, Lancaster University [email protected] Diggle, Peter Lancaster Medical School, Lancaster University Ritchie, James Institute of Population Health, The University of Manchester Complex Observational and Longitudinal Data 10.15-10.30 Fitzpatrick Hall Keywords: Renal medicine Observational studies Stochastic processes Epidemiology Statistical Modelling Abstract: Kidney health is monitored by serial measurements of serum creatinine in blood samples. An increase in creatinine level is indicative of worsening kidney function, but measured values are subject to substantial noise due to imprecision of the assay and short-term fluctuations for reasons unrelated to kidney function. Acute kidney injury (AKI) is defined as a large and sudden (within 48 hours) rise in serum creatinine level. For example, a stage 1 AKI is defined as a 1.5 fold increase in serum creatinine. The influence of AKI occurrence on long-term kidney health is still an open research area. In this study, our main objective is to compare serum creatinine profiles for the pre and post AKI occurrence periods. Our data are taken from the Chronic Renal Insufficiency Standards Implementation Study conducted by the Salford Royal Hospital, Greater Manchester and consist of a total of 37,716 serum creatinine measurements from 1,651 subjects. We present a model for the data in which the stochastic variation in serum creatinine is decomposed into three components: between-subject random effects; a within-subject latent non-stationary process; and uncorrelated measurement error.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

Analysing recurrent events: a review of statistical methodology and future directions, with application to major trials in heart failure Authors: Rogers, Jennifer (presenting) London School of Hygiene and Tropical Medicine [email protected] Pocock, Stuart London School of Hygiene and Tropical Medicine McMurray, John University of Glasgow Design and Analysis of Randomised Trials 09.00-09.15 Old Kitchens Keywords: Recurrent events Clinical Trial Heart Failure Hospitalisation Informative Censoring Abstract: Composite outcomes are frequently adopted as primary endpoints in clinical trials as they take account of both the fatal and non-fatal consequences of the disease under study and lead to higher event rates. Such analyses of time to first event are suboptimal for a chronic disease such as heart failure, characterised by recurrent hospitalisations, as relevant information on repeat events is ignored. We shall illustrate and compare various methods of analysing data on repeat hospitalisations, using data from major trials in heart failure. In addition to describing each method and its estimated treatment effect and statistical significance, we investigate the statistical power using bootstrapping techniques. A simple measure of the number of admissions to hospital for worsening heart failure is the event rate. The Poisson distribution is commonly used to determine if rates of an event differ between treatment groups, but ignores the heterogeneity amongst patients within treatment groups. The Andersen-Gill model is a survival based approach to analysing recurrent events that examines the inter-event times and is a generalisation of the Cox proportional-hazards model. A more appropriate alternative approach that allows for different individual tendencies for repeat heart failure hospitalisations uses the Negative Binomial distribution. Because an increase in heart failure hospitalisations is associated with an increased risk of subsequent death, any analysis of recurrent admissions should also allow for cardiovascular death as a competing risk. Death was incorporated into analyses by treating it as an additional event in the recurrent event process, and by considering methods that jointly model hospitalisations and mortality. We used a parametric joint frailty model to analyse the recurrent heart failure hospitalisations and time to cardiovascular death simultaneously. Our analyses show that methods that take account of repeat hospital admissions demonstrate a larger treatment benefit than the conventional time to first event analysis, even when accounting for death. Inclusion of recurrent events also leads to a considerable gain in statistical power compared to the time to first event even approach. It seems plausible that in future heart failure trials, treatment benefit would not be confined to first hospitalisations only and so recurrent events should be routinely incorporated.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

Bayesian inference under model uncertainty for dose-response experiments Authors: Otava, Martin (presenting) Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Hasselt University, Belgium [email protected] Shkedy, Ziv Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Hasselt University, Belgium Kasim, Adetayo Wolfson Research Institute for Health and Wellbeing, Durham University Design and Analysis of Randomised Trials 09.15-09.30 Old Kitchens Keywords: Bayesian modelling Model uncertainty Order restricted inference Permutation test Abstract: Bayesian modelling of dose-response data offers the possibility to efficiently estimate the nature of the dose-response relationships between continuous response and increasing doses of therapeutic compound. We focus on an order restricted Bayesian variable selection (BVS) model that addresses estimation, inference and model selection while taking into account model uncertainty. The model is based on an order restricted one-way ANOVA model. Model uncertainty is taken into account by fitting all possible models for a given number of dose levels. The proposed modelling framework provides posterior probabilities of candidate models by translating the inequality constraints for monotone relationship into a BVS problem. The model allows us to test the null hypothesis of no dose-effect, i.e., flat dose-response profile, against an ordered alternative. The methodology is based on combination of BVS model posterior probabilities and permutation test framework. To overcome dependency on the choice of priors and necessity of subjectivity in hypothesis testing, a permutation test is used to produce an objective quantity for inference. A large simulation study was conducted to explore properties and behaviour of proposed method. The dependency on prior probabilities of particular models almost diminished and the results show close correspondence with frequentist p-values obtained by likelihood ratio test or multiple contrast tests. Therefore, permutation test can be used to support the inference if there is no strong scientific knowledge to specify informative choice of prior probabilities. Using the BVS posterior probability implies accounting for model uncertainty. Hence, BVS model allows us to avoid a post selection inference and estimation. In addition to the simulation study, the proposed method was applied to two dose-response studies with 4 and 5 dose-levels and to a dose-response microarray experiment with 4 dose-levels and 16998 genes.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

A method for IPD meta-analysis of treatment-covariate interaction with a continuous predictor in randomised trials Authors: Sauerbrei, Willi (presenting) Institute of Medical Biometry and Medical Informatics, University Medical Centre Freiburg, Germany [email protected] Royston, Patrick Hub for Trials Methodology Research, MRC Clinical Trials Unit and University College London Kasenda, Benjamin Institute for Clinical Epidemiology and Biostatistics, University Hospital Basel, Basel, Switzerland Design and Analysis of Randomised Trials 09.30-09.45 Old Kitchens Keywords: meta-analysis continuous variable interaction Abstract: Aims: In clinical trials, there is considerable interest in investigating whether a treatment effect is similar in all patients, or that some prognostic variable indicates a differential response to treatment. To examine this, a continuous predictor is usually categorized into groups according to one or more cutpoints. Several weaknesses of categorisation are well known. To avoid the disadvantages of cutpoints and to retain full information, it is preferable to keep continuous variables continuous in the analysis. The aim is to derive a statistical procedure to handle this situation when individual patient data (IPD) are available from several studies. Methods: For continuous variables, the multivariable fractional polynomial interaction (MFPI) method provides a treatment effect function, that is, a measure of the treatment effect on the continuous scale of the covariate (Royston and Sauerbrei, Stat Med 2004, 2509-25). MFPI is applicable to most of the popular regression models, including Cox and logistic regression. A meta-analysis approach for averaging risk functions across several studies has recently been proposed (Sauerbrei and Royston, Stat Med 2011, 3341-60). Here we combine the two techniques to produce a method of IPD meta-analysis in which treatment-effect functions are averaged across studies. Results: We used the new approach to investigate four potential treatment effect modifiers in a meta- analysis of IPD from three randomised trials in acute lung injury, where the main outcome of interest was 60-day in-hospital mortality (Briel et al., JAMA 2010, 865-73). In contrast to cutpoint-based analyses, the results give more detailed insight into whether treatment effects are influenced by any of the four factors considered. Conclusions: The present method appears to be first to address the problem of retaining full information when performing IPD meta-analyses to examine continuous effect modifiers in randomised trials. Early experience suggests it is a promising approach.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

A multi-arm multi-stage clinical trial design for binary outcomes with application to tuberculosis Authors: Bratton, Daniel MRC Clinical Trials Unit at University College London [email protected] Phillips, Patrick MRC Clinical Trials Unit at University College London Parmar, Max MRC Clinical Trials Unit at University College London Design and Analysis of Randomised Trials 09.45-10.00 Old Kitchens Keywords: tuberculosis multi-arm multi-stage clinical trial binary outcomes Abstract: Multi-arm multi-stage (MAMS) designs can greatly increase the speed and efficiency of treatment evaluation compared to separate trials of each new therapy. A class of MAMS designs with stopping guidelines only for lack-of-benefit have been developed for time-to-event outcomes and have been used to design several trials in oncology. In these designs, interim assessments can be made on an intermediate outcome which is on the causal pathway to the definitive primary outcome of the trial. Such designs can therefore incorporate both phase II and phase III in a single trial, further increasing the efficiency of treatment evaluation.

Tuberculosis (TB) is treated with combination regimens and, with many new or repurposed drugs currently in clinical development, better designs are needed to facilitate simultaneous testing of several new regimens in a single trial. To allow use of MAMS designs, we extend the methodology to binary outcomes, which are often used in TB trials.

We present methods for finding MAMS designs which have the desired pairwise type I error rate and power. Since there may be many such designs, choosing which to use can be tricky, as those which minimise the maximum or expected sample size are likely to perform poorly in practice. Designs which balance the two measures, known as admissible designs, may therefore be more appealing.

We give examples of admissible MAMS TB designs and demonstrate the savings in time and resources that could be made over conducting separate trials, particularly when using them to design seamless phase II/III trials.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

Advantages of a wholly Bayesian approach to assessing efficacy in early drug development: a case study Authors: Woodward, Phil (presenting) Pfizer Ltd [email protected] Walley, Ros UCB Celltech Design and Analysis of Randomised Trials 10.00-10.15 Old Kitchens Keywords: Bayesian inference Design operating characteristics Informative priors pre-posterior distributions early clinical development Abstract: This presentation illustrates how the design and statistical analysis of the primary endpoint of a proof of concept study can be formulated within a Bayesian framework and illustrates this with a Pfizer case study in chronic kidney disease. It is shown how decision criteria for success can be formulated and how the study design can be assessed in relation to these, both using the traditional approach of probability of success conditional on the true treatment difference and also using Bayesian assurance and pre-posterior probabilities, explaining why assurance is rarely a good measure of design quality in early clinical development. It is shown how complexities due to potential outliers, interim analyses and concerns with priors that may be discordant with the data can be elegantly handled via appropriate pre-specified Bayesian models. The case study illustrates how an informative prior on placebo response can have a dramatic effect in reducing sample size, saving a significant amount of time and resource and argue that in such cases it can be considered unethical not to include relevant literature data in this way. We conclude with a brief but important discussion on the more controversial use of informative priors for treatment effects, particularly for the purpose of internal decision making.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

Bias adjusted estimation for binary outcomes in Bayesian adaptive trials Authors: Bowden, Jack (presenting) MRC Biostatistics Unit, Cambridge [email protected] Trippa, Lorenzo Dana Farber Cancer Institute Design and Analysis of Randomised Trials 10.15-10.30 Old Kitchens Keywords: Clinical trial Adaptive randomisation Bias adjustment Inverse probability weighting Abstract: Bayesian adaptive trials have the defining feature that the probability of randomisation to a particular arm can change as information becomes available as to its true worth. Despite some clear advantages over the standard (fixed randomisation) approach, there is still a general reluctance to implement such designs in many clinical settings. One area of concern is that their frequentist operating characteristics are often poorly understood. We investigate the bias induced in the maximum likelihood estimate of a response probability parameter for binary outcome by the process of adaptive randomisation. We discover that it can only be negative (towards the null), and therefore inferences are made more conservative by this design facet. Inverse probability weighting is shown to offer a simple means to obtain bias adjusted estimates, and it works well in many situations. In scenarios where it fails, a more computationally intensive Monte-Carlo procedure is instead proposed. We illustrate these estimation strategies using several well-known Bayesian adaptive designs from the literature.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

The use of quantile regression to define reference ranges in echocardiography Authors: Vazquez-Montes, Maria (presenting) University of Oxford [email protected] Perera, Rafael University of Oxford Poppe, Katrina University of Auckland, on behalf of the EchoNoRMAL collaboration Policy or Translational Impact 09.00-09.18 Armitage Room Keywords: Reference ranges quantile regression echocardiography covariates Abstract: In medicine, healthy population reference ranges are used by clinicians as guidelines to interpret patient’s test results during diagnosis and screening. The aim is to characterise a measure’s normal variability. Examples of these vary from the use of lung clearance index for managing cystic fibrosis to common laboratory tests for monitoring pregnancy. The simplest method estimates reference ranges based on 95%CI for the response variable distribution, assuming normality. However, many biomedical measures are not normally distributed. Hence another approach is to model reference ranges using the 5th and 95th percentile of the sample distribution. We present an example of a robust and efficient method based on quantile regression for the estimation of reference ranges in echocardiographic measurements. Quantile regression is a natural extension to linear regression. In contrast with linear regression which focuses on the mean, quantile regression gives a more complete description of the distribution of the outcome conditional on the covariates by estimating their relationship at each percentile. It assumes independent errors but not a specific distribution. This allows modelling of the effect covariates have on the shape of the response variable as well as its location, therefore inferences based on these methods are robust to the presence of outliers and heteroscedasticity. However, the method is sensitive to areas with sparse data and could yield non-unique solutions. Their availability in Stata and R facilitate their use. We present an application example in echocardiography using data extracted from a cohort of 22,404 healthy subjects from the EchoNoRMAL collaboration, an individual-person meta-analysis of echocardiographic data. The study aims to calculate reference ranges and evaluate their association with age, gender and ethnicity. This is a scenario where quantile regression is recommended. We will concentrate on left atrial size parameters, stratifying by measurement methods to control for the external variability they introduce.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

The potential benefit of allocating treatments for CVD based on lifetime risks instead of 10-year risk Authors: Rapsomaniki, Eleni (presenting) University College London [email protected] Shah, Anoop University College London Hemingway, Harry University College London Policy or Translational Impact 09.18-09.36 Armitage Room Keywords: Lifetime risk cardiovascular disease gain in life-years net benefit treatment rules Abstract: Background We compared the potential gain in cardiovascular disease (CVD)-free life-years between two treatment rules, one based on high lifetime CVD risk and one based on >20% 10-year CVD risk (as per current clinical guidelines). Methods Analysis was based on an electronic health records cohort of 1.25 million patients aged ≥30 and free of diagnosed CVD. We estimated 10-year and lifetime risks from age 30 to 95 of total CVD based on Cox models adjusted for age, sex, blood pressure, smoking, diabetes and lipids. Lifetime risks were adjusted for the competing risk of non-CVD mortality. We assumed a treatment with hazard ratio 0.8 throughout the treatment period, costing 0.02 CVD-free life years, offered to people with >20% 10-year risk or to an equal number of people with the highest lifetime risks. For each treatment rule we estimated the number of CVD-free life-years gained from the difference in the area under the lifetime risk curve of treated vs. untreated, and compared the two estimates. Results Over 5.2 years mean follow-up 95859 CVD events were recorded. The mean lifetime risk of total CVD was 37.5%. A lifetime risk treatment threshold of approx. 46% resulted in the same treatment rate as the 20% 10-year risk threshold. Treatment based on lifetime risks resulted in 6.2 [bootstrap 95% CI, 5.7, 6.7] years free of CVD from age 30 to 95. From these, 45% were attributed to angina, 22% to myocardial infarction, 18% to heart failure, and 15% to prevention of stroke and other CVDs. Lower gains were observed for lifetime risks from older index ages, and no gains in those older than 80. Conclusions Treatment rules based on lifetime risks vs. 10-year risks may prevent more cardiovascular events if applied from a young age, but either rule leads to similar treatment benefits in the elderly.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

How much evidence is needed to change practice? – Experiences from the IVAN Trial Authors: Rogers, Chris (presenting) University of Bristol [email protected] Chakravarthy, Usha Queen’s University Belfast Harding, Simon University of Liverpool Wordsworth, Sarah University of Oxford Policy or Translational Impact 09.36-09.54 Armitage Room Keywords: RCT Meta analysis Neovascular age-related macular degeneration Lucentis Avastin Abstract: Background Neovascular age-related macular degeneration (nAMD) is a major cause of blindness. Treatment involves repeated injections and costs the NHS >£300m/year. When IVAN was conceived, two drugs were used to treat nAMD; Lucentis® was licensed for nAMD but Avastin® was unlicensed. Lucentis® is a fragment of the Avastin® molecule and differences between the molecules led to hypotheses about their relative efficacy and persistence in the eye. Lucentis® cost £11,000/patient/year, while Avastin® was significantly cheaper. Study design IVAN was a factorial non-inferiority trial comparing Lucentis® vs. Avastin®, and monthly vs. as-needed treatment for two years. The primary outcome was best corrected distance visual acuity (BCVA) at two years. Avastin® was hypothesised to be non-inferior to Lucentis®, and as-needed non-inferior to monthly treatment. A USA trial, CATT, was developed in parallel to IVAN. Results 1795 patients were recruited (IVAN n=610, CATT n=1185), 87% of whom were followed to 2 years. CATT reported two-year results in April 2012 and IVAN in May 2013. A meta-analysis of the results confirmed that Avastin® is non-inferior to Lucentis® with respect to BCVA (mean difference -1.37 letters, 95%CI -3.75 to +1.01, non-inferiority margin 3.5 letters). Lesion morphology was also similar for the two drugs. CATT, but not IVAN, suggested systemic adverse events occurred more frequently with Avastin®. Cost analyses suggested the NHS could save >£85m/year by switching from Lucentis® to Avastin®. Impact NICE continues to recommend using Lucentis® to treat nAMD in the NHS. The cost of Lucentis® has been discounted (by ≈30%) as the IVAN results have been published. However, we are unaware of any UK nAMD service commissioned using Avastin®. In contrast, >60% of patients in the USA are treated with Avastin® and some European countries only provide Avastin®. It is unclear why a substantial saving has not been realised by the NHS. Funded by the NIHR HTA programme, and the views and opinions expressed are those of the authors and do not necessarily reflect those of the HTA programme and NIHR, NHS or the Department of Health.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

Methods for inferring short- and long-term effects of exposures on outcomes, using longitudinal data on both measures Authors: Keogh, Ruth (presenting) Department of Medical Statistics, London School of Hygiene and Tropical Medicine [email protected] Vansteelandt, Stijn Department of Applied Mathematics and Computer Science, University of Ghent Daniel, Rhian Department of Medical Statistics, London School of Hygiene and Tropical Medicine Policy or Translational Impact 09.54-10.12 Armitage Room Keywords: Longitudinal studies Short- and long-term effects Causal inference Generalised estimating equations Inverse probability weighting Abstract: In this work we consider methods for studying the effects of time-dependent exposures (e.g. medication use) on time-dependent outcomes (e.g. a health status measure) using longitudinal data on both. Data of this type arise when study participants are observed at a number of visits at which various measures are obtained. It is of interest to study both short-term and long-term effects of the time-dependent exposure on the outcome. A challenge is time-dependent confounding by past outcome measures. This occurs when a current outcome is affected by past exposure and in turn affects future exposure. Much of the causal inference literature has focused on a single outcome, e.g. survival time, and one popular approach to analysis using longitudinal exposure data is to use marginal structural models fitted using inverse probability weighting (IPW). The setting which interests us, where outcomes are measured longitudinally, has received much less attention. In this work we emphasise that short term effects of a current exposures on a current outcome can be estimated using standard regression methods. We consider generalised estimating equations (GEE) for this. Care is needed to include past outcome measures appropriately in the model to account for time-dependent confounding, and to use an appropriate working correlation structure. We will present results from simulation studies comparing GEEs with IPW for estimating short term effects. These show that IPW can be seriously unstable and inefficient, and may even result in bias. We will also show that adjustment for propensity scores in GEEs can be advantageous. We will present a new test for evidence of long term exposure effects. It will also be noted that total effects of past exposures on future outcomes can be estimated in this setting by using standard methods, but that what is being estimated is not the same as that using IPW.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

Helping to Drive the Robustness of Preclinical Research Authors: Gore, Katrina (presenting) Pfizer Neusentis, Granta Park, Cambridge [email protected] Kadyszewski, Edward Pfizer WRD, Groton, Connecticut, US

Policy or Translational Impact 10.12-10.30 Armitage Room Keywords: Reproducibility Scientific Rigour Experimental Design Assay Capability Abstract: In an Editorial titled “Reducing our irreproducibility” dated 24 April 2013, Nature announced publication guidelines intended to improve the consistency and quality of reporting in life-sciences articles. The editorial acknowledged that the root causes of the irreproducibility resided in the laboratories conducting the experiments but that these were compounded by journals “when they fail to exert sufficient scrutiny over the results that they publish and when they do not publish enough information for other researchers to assess results properly”. Many other articles published in numerous journals over the past decade have also highlighted the need for improved pre-clinical research.

This presentation outlines steps Pfizer is already taking to address the root causes of irreproducibility in preclinical research in the pharmaceutical environment. We will describe our efforts to improve the scientific rigour of our experiments using something we’ve developed and call the Assay Capability Tool. The Tool is a thirteen question checklist that guides the collaboration of scientists and statisticians in developing fit for purpose in vivo and in vitro assays. By promoting surprisingly basic but absolutely essential experimental design strategies; documenting the strengths, weaknesses and precision of an assay; and encouraging the quantitative expression of what a successful outcome will look like prior to running the assay, the Tool serves as a warranty for the level of confidence decision makers can place in assay results. The emphasis on prior consideration of how big an effect matters and what constitutes adequate precision for the question at hand are the cornerstones for crisp drug development decision making.

We believe the Assay Capability Tool is a practical step forward in improving the reproducibility of preclinical research and is central to Pfizer’s continued drive to embed excellent statistical design and analysis into all of our research.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

Multi State modelling of hospitalization process and time to death in patients affected by Heart Failure: how to assess the burden of an extensive pathology from the administrative data Authors: Ieva, Francesca (presenting) Modelling and Scientific Computing, Department of Mathematics, Politecnico di Milano, Italy [email protected] Sharples, Linda Institution Clinical Trials Research Unit, Leeds Institute of Clinical Trials Research, University of Leeds Jackson, Christopher MRC Biostatistics Unit, Cambridge General Biostatistics 09.00-10.30 Bowett Room Keywords: Multi-State Models Heart Failure Administrative Data Hospital Admissions Abstract: In chronic diseases like Heart Failure (HF), the disease course and associated clinical event histories for the patient populationvary widely and precise modelling of complex clinical pathways is problematic. Episodes of hospitalization for HF-related events can act as a surrogate for disease burden when assessing the healthcare processes quality. Accounting jointly for mortality and hospital admissions for clinical events enables a more precise understanding of the patients prognosis and enables people in charge of healthcare planning to assess the economic burden of diseases. We propose a multi-state model for serial hospitalizations and death for HF-patients and fit it to data from the administrative databank of Lombardia region in Italy. The aim is to highlight a flexible approach that is able to catch important features of disease progression, like multiple ordered events and accommodation of the competing risks. The eligible cohort consisted of 15,298 patients, who had a first hospitalization ended in 2006 and 4-years of follow up thereafter. Apart from transitions to the absorbing state (Death), serial admissions to, and discharge from, hospital were allowed. Transient states consisted of In-hospital and Out-of-hospital stay for the 1st up to the 5th hospitalization, then a further state representing all subsequent admissions. The intensities were estimated conditional on covariates, allowing analysis of the dynamic nature of case-mix effects. The model was fitted using the R package msm. The estimation of transition probabilities enables customized predictions. In- and out-of- hospital mortality can be computed for each status, together with the mean sojourns, total time spent or expected times of first transition in each stauts. Modelling the HF-related hospitalizations in their natural form captured detailed information on the disease progression and recovery processes and allowed precise prediction of patient prognosis, giving new insight into the burden that HF places on patients and hospital services.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

The utility of the Dirichlet process clustering model for the exploration of high order interactions; advantages and limitations Author: Papathomas, Michail (presenting) University of St Andrews, School of Mathematics and Statistics, St Andrews, Scotland [email protected] General Biostatistics 09.00-10.30 Bowett Room Keywords: Bayesian model selection Bayesian variable selection Sparse data Efficient model search Abstract: The investigation of highly complex dependence structures is becoming increasingly important in Biostatistics. Doing so within a linear modelling framework is not straightforward due to the difficulty in investigating an unwieldy large space of competing models. One approach for reducing the dimensionality of the problem is to create homogenous clusters using the Dirichlet process, a Bayesian flexible clustering algorithm. We investigate the relation between clustering with the Dirichlet process and linear modelling, and discuss the utility of the Dirichlet process for the exploration of high order interactions, especially when sparse data are analysed. We consider log-linear and logistic regression models for the analysis of data from large cohort studies. Relevant theoretical and computational results are discussed.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

Estimating the contribution of human-to-human transmission to Lassa fever Author: Lo Iacono, Giovanni (presenting) Disease Dynamics Unit, Department of Veterinary Medicine, University of Cambridge [email protected] Wood, James Disease Dynamics Unit, Department of Veterinary Medicine, University of Cambridge General Biostatistics 09.00-10.30 Bowett Room Keywords: Zoonosis haemorrhagic fever super-spreader emerging diseases spillover Abstract: Zoonotic diseases, i.e. those transmitted from animal to humans, are the origin of the majority of emerging human pathogens. If the pathogen acquires the ability to transmit from human-to-human, then both sources of transmission may coexist. To separate the relative contributions is challenging and this is the case of Lassa fever, an acute, severe, viral, haemorrhagic fever common in West Africa. Mastomys nataliensis, the multimammate rat, is recognised as the reservoir of the virus. The disease is also transmitted from human-to-human as demonstrated by a nosocomial outbreak occurred in Nigeria in 1972. We assume that the patterns observed in the hospital outbreak are general and can be applied to non-nosocomial situations too. We use these data in conjunction with the number of hospitalised patients at the Lassa ward in Kenema government hospital to infer the contribution of human-to-human transmission compared to pure zoonotic spillover. More precisely, i) we calculate the distribution of the generation times (i.e. the interval between onsets of symptoms in a primary and in secondary case) based on the data from the hospital outbreak; ii) we then use this to estimate the effective reproductive numbers associated to the nosocomial epidemic, 𝑅𝑁𝑂𝑆, and to the epidemic curve given by a fraction 𝑄 of the number of hospitalized patients at the Lassa ward, 𝑅𝐾𝐺𝐻(𝑄); iii) 𝑄 represents the contribution of human-to-human transmission, which is estimated by imposing equality of the two reproductive numbers: 𝑅𝑁𝑂𝑆 = 𝑅𝐾𝐺𝐻(𝑄). The analysis shows that almost 20% of the cases at KGH are likely to be secondary cases arising from human-to-human transmission. These are probably caused only by a few ‘super-spreaders’, as 94% of human cases are not able to cause sustained chains of transmission (𝑅𝐾𝐺𝐻 < 1); in contrast, the effective reproductive number for the remaining 5% can be up to 12.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

Extrapolation of trial-based survival curves: constraints based on external information Authors: Welton, Nicky University of Bristol [email protected] Guyot, Patricia University of Bristol Ades, AE University of Bristol General Biostatistics 09.00-10.30 Bowett Room Keywords: Extrapolation Survival External evidence Head-and-neck Cancer Bayesian Abstract: Background: In the UK, policy decisions on the adoption of new treatments for cancer are informed by a cost-effectiveness analysis over a lifetime time horizon. However, evidence on relative treatment efficacy comes from randomised controlled trials (RCTs) that typically have a follow-up period of only a few years. Mean survival time is a key input to cost-effectiveness analysis, but is very sensitive to assumptions made on the extrapolation from the short-term RCT evidence into the long-term, and very different estimates can be obtained from different models that each fit the RCT data equally well. There is therefore a need for methodology to help inform this extrapolation. The objective of this work was to explore the use of other sources of information, external to the RCT data, that can be used to inform model choice and estimation in order to provide long-term extrapolation of survival curves for use in cost-effectiveness analysis. Methods: We describe methods that allow external information on various different survival metrics to be synthesised with RCT evidence to put constraints on the resulting estimated survival model. We indicate some external sources that may potentially provide relevant evidence, including expert opinion. We illustrate the methods using an RCT of cetuximab+radiotherapy vs radiotherapy for patients with head and neck cancer, reporting a 5-year follow-up. A US cancer database (SEER), a meta-analysis, and mortality statistics from the general population were identified to impose constraints on overall survival, conditional survival, and the hazard ratio. The models were estimated using Bayesian Markov Chain Monte Carlo simulation in WinBUGS. Conclusions: The external data allowed us to rule out certain parametric models, fit flexible spline models beyond the length of the trial follow-up, and obtain credible estimates of mean survival that reflect both the RCT and external data.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

Multiple single nucleotide polymorphism analysis methods in nonlinear mixed-effect pharmacokinetic models Authors: Bertrand, Julie University College London Genetics Institute [email protected] Balding, David J University College London Genetics Institute De Iorio, Maria Institution University College London Department of Statistical Science General Biostatistics 09.00-10.30 Bowett Room Keywords: Nonlinear mixed effect model Pharmacokinetics Genetics Penalised regression Model selection Abstract: Context: In 2009, regulation authorities have provided guidelines on the conduct of pharmacogenetic studies in the pharmacokinetic (PK) evaluation of medicinal products promoting the exploration of large sets of Single Nucleotide Polymorphisms (SNP) and the use of nonlinear mixed effects models which enable analysis of the entire time-concentration curve, even for sparse data. However, until recently, there has been no systematic method to examine the effects of multiple SNP on the model parameters. Objective The aim of this study is to assess different penalised regression methods for including SNPs in PK analyses in comparison with a classical stepwise approach. Methods: A total of 200 data sets were simulated under both the null and an alternative hypothesis. In each data set for each of the 300 participants, a PK profile at six sampling times was simulated and 1227 genotypes were generated through haplotypes. Genetic association was investigated using: (i) a classical stepwise approach, (ii) a two-stage approach, in which after modelling the PK profiles a penalised regression is applied on individual parameter Empirical Bayes Estimates (EBE) and (iii) an integrated approach simultaneously estimating the PK model parameters and the genetic size effects, in which a penalised regression is applied at each iteration of the Expectation Maximisation algorithm to update the vector of fixed effects. Under the alternative, H1, we randomly picked 6 SNPs per simulated data set which together explain 30% of the variance in the logarithm of the apparent clearance of elimination. Results: The stepwise procedure, the penalised regression on EBEs and the integrated approach obtained FWER estimates non-significantly different from the target value of 0.2. All three approaches obtain similar power estimates to detect each of the 6 causal SNPs with the integrated approach detecting fewer false positives.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

Bayesian meta-analytical techniques to evaluate multiple surrogate endpoints Authors: Bujkiewicz, Sylwia (presenting) University of Leicester, Department of Health Sciences, Biostatistics Group [email protected] Thompson, John University of Leicester, Department of Health Sciences, Genetic Epidemiology Group Abrams, Keith University of Leicester, Department of Health Sciences, Biostatistics Group Evidence Synthesis to Inform Health 13.30-13.45 Armitage Room Keywords: surrogate endpoints multivariate meta-analysis Bayesian analysis health technology assessment Abstract: Biomarkers and surrogate endpoints are increasingly being investigated as candidate endpoints in randomised controlled trials where measuring a primary outcome of interest may be too costly, too difficult to measure or require long follow-up time. Such outcomes are also increasingly important in health technology assessment (HTA) especially in early stages of drug development. A number of meta-analytical methods have been proposed that aim to evaluate surrogate endpoints and are then used to predict the target outcomes. Different meta-analytical approaches make different modelling assumptions and take into account different levels of uncertainty which may impact on accuracy of the predictions. In our recent paper on Bayesian multivariate meta-analysis of mixed outcomes we model the between-study covariance in a formulation of a product of normal univariate distributions (Stat Med 2013; 32:3926–3943). This formulation is particularly convenient for including multiple surrogate outcomes and combining diverse sources of evidence by, for example, placing external-evidence-based informative prior distributions directly on the parameters of the model. In this model however, two outcomes (which can be surrogate endpoints to the target outcome) are conditionally independent, conditional on the target outcome. This assumption may be considered strong when, for example, the two surrogate outcomes are strongly correlated and one can act as a surrogate to the other. In this talk, the model’s sensitivity to this assumption in its ability to predict the target outcome is discussed alongside the model extensions. The modelling techniques are investigated using examples from rheumatoid arthritis (where Health Assessment Questionnaire is of interest in HTA, but also multiple alternative instruments are used to measure response to treatment) and multiple sclerosis (where the disability worsening is the target outcome, while other outcomes such as relapse rate and MRI lesions have been shown to be good surrogates to the disability progression).

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

Some statistical issues related to vaccination policies Authors: Demiris, Nikos (presenting) Athens University of Economics and Business, Greece [email protected] Baguelin, Marc London of Hygiene and Tropical Medicine Edmunds, John London of Hygiene and Tropical Medicine Evidence Synthesis to Inform Health 13.45-14.00 Armitage Room Keywords: Matrix Prior Vaccination Policy Social Contact MCMC Epidemic Model Abstract: This presentation is concerned with vaccination policies that maximise health benefit through the efficient use of limited resources. We developed methods to synthesise virological, clinical, epidemiological, and behavioural evidence in order to evaluate how changing target populations in the seasonal influenza vaccination program would affect infection and mortality. The key statistical issues relate to (i) the efficient updating of the epidemic parameters and (ii) the precise decomposition of the transmission matrix into its sociological and biological components via estimating the matrix of social contacts and the probability of infection conditional upon a contact. In this work we consider efficient adaptive MCMC algorithms for the former issue and a number of distinct priors for estimating the contact matrix. The approach is illustrated via a number of scenarios for the UK influenza vaccination program.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

Fast efficient computation of Value of Information from a Probabilistic Sensitivity Analysis sample: a non-parametric regression approach Authors: Strong, Mark (presenting) University of Sheffield, School of Health and Related Research [email protected] Oakley, Jeremy University of Sheffield, School of Maths and Statistics Brennan, Alan University of Sheffield, School of Health and Related Research Evidence Synthesis to Inform Health 14.00-14.15 Armitage Room Keywords: Health Economics Value of Information Non-parametric Regression Decision Theory Computer Models Abstract: Health economic models are used to estimate the expected net benefits of competing decision options. The true values of the input parameters of such models are rarely known with certainty, and it is often useful to quantify the value of undertaking further data collection in order to reduce uncertainty. An upper bound on the value of learning a subset of input parameters is quantified by its partial Expected Value of Perfect Information (EVPI). The value of a future trial or study is quantified by its Expected Value of Sample Information (EVSI). The standard approach to computing both partial EVPI and EVSI is via a nested two-level Monte Carlo scheme that includes at each inner loop step both parameter sampling and model evaluation. This scheme can be prohibitively slow for complex models. Additional problems arise if the two-level Monte Carlo scheme results in an inner loop conditional distribution that is difficult to sample from. This most commonly occurs when computing EVSI for a problem in which the parameter distribution is not conjugate to the data likelihood, but can also occur when computing partial EVPI where parameters are correlated. In either case we typically need to resort to MCMC methods, implemented for example in WinBUGS. In practice, these difficulties have resulted in the restriction of Value of Information analyses to only a small subset of health economic evaluation studies. To overcome the problems above we present fast and efficient non-parametric regression based methods for computing partial EVPI and EVSI. The methods require only the "probabilistic sensitivity analysis" (PSA) sample: a single set of samples from the model parameters, along with the corresponding model evaluations. The new methods allow Value of Information measures to be computed for models of any complexity, and hence be made more widely available to modellers and decision makers.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

Competing Risks meta-analysis Authors: Siannis, Fotios (presenting) Department of Mathematics, University of Athens, Greece [email protected] Copas, Andrew MRC Clinical Trials Unit, London Tierney, Jayne MRC Clinical Trials Unit, London Evidence Synthesis to Inform Health 14.15-14.30 Armitage Room Keywords: Meta analysis Competing Risks Sensitivity Analysis Multivariate meta analysis Abstract: There is a great interest in exploring how studies with competing risks outcomes can be brought together in order to provide summary estimates for various quantities of interest. Such quantities may be the hazard or the cumulative incidence of all risks investigated in the studies, or just of a specific one, estimated under certain assumptions and based on various modelling approaches. Meta-analysis can be quite complex if the studies included vary in the risks considered, the focus of the research, and the chosen analytical approach. Therefore, as a starting point we consider a simple scenario for the meta-analysis of competing risks data, as a first step towards the development of a more comprehensive approach to this statistical problem. We assume that we have IPD from several studies with competing risks outcomes, all considering the same set of k-risks. It is straight forward to estimate the observable hazard for each one of the k-risks, although additional modelling assumptions would be required for the estimation of the marginal hazard functions. Under the assumption of independence between the risks, the observable hazard is equivalent to the marginal, though this assumption is probably unrealistic. Under the Cox´s proportional hazards model, pooling together the sub-hazard ratios can be performed using multivariate meta-analysis techniques. Clearly, the associations between the risks are unknown. Hence we explore a sensitivity type of analysis in order to explore how the outcome of the meta-analysis outcome can be affected by small levels of associations between the risks studied. The extension to an alternative model for bivariate meta-analysis is also of interest, where the within- study correlations are unknown. The proposed methodology is used for the analysis of simulated data generated from several simulation scenarios, in order to explore it´s validity. Implementation on real data set held by the meta-analysis group at MRC-CTU was also performed.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

How much of socioeconomic differences in breast cancer patient survival can be explained by differences in stage of diagnosis and treatment? – Applying causal mediation analysis to routine data Authors: Li, Ruoran (presenting) London School of Hygiene & Tropical Medicine [email protected] Daniel, Rhian London School of Hygiene & Tropical Medicine Rachet, Bernard London School of Hygiene & Tropical Medicine Evidence Synthesis to Inform Health 14.30-14.45 Armitage Room Keywords: Epidemiology Cancer Routine data Mediation Survival Abstract: Substantial socioeconomic inequalities in breast cancer survival persist in England. The main contributing factors to the survival differences could be late presentation at more advanced stages and differential access to treatment. Information on 36,793 women diagnosed with breast cancer during 2000-2007 was routinely collected by an English population-based cancer registry. Surgical treatment information was retrieved from the Hospital Episode Statistics and was dichotomised as major versus minor or none procedures. A deprivation category was allocated to each patient according to their area of residence at the time of diagnosis. G-computation procedures were used to estimate the proportion of the effect of deprivation on short-term survival mediated by stage and by treatment. Single stochastic imputation was incorporated in the g-computation procedures to handle missing stage (8%). Net survival from breast cancer differed between the most affluent and the most deprived patients by 3% at one year (97% versus 94%), and 10% at five years (86% versus 76%) after diagnosis. Adverse stage distribution was associated with more deprived patients (p<.001). The more advanced the stage at diagnosis, the less likely the patient received major surgical treatment (p<.001). The most deprived patients were almost three times more likely to die within six months after diagnosis than the most affluent patients (OR: 2.77 [2.17–3.53]). One third of this short-term excess mortality was mediated by adverse stage distribution whilst none was mediated through differential surgical treatment. Based on virtually all breast cancer patients, our results showed how much effect different contributory factors have on the differential short-term cancer survival between deprivation groups in breast cancer patients. Effort to advance the diagnoses is important, but would reduce the socio-economic inequalities in cancer survival only by a third. We did not have reliable information on comorbidity, another potential mediator on the causal pathway to account for.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

(Abstract title to be confirmed) Authors: Birrell, Paul (presenting) MRC Biostatistics Unit, Cambridge [email protected] Wernisch, Lorenz MRC Biostatistics Unit, Cambridge De Angelis, Daniela MRC Biostatistics Unit, Cambridge Evidence Synthesis to Inform Health 14.45-15.00 Armitage Room Keywords: real-time MCMC Sequential Monte Carlo Transmission modelling Influenza Abstract: During the 2009 A/H1N1pdm outbreak, much attention was devoted to capturing the dynamics of the epidemic through real-time modelling. The goal was to provide up-to-the-moment assessments of the state of the epidemic at any time, as well as predictions of its future course based upon streams of information updated at regular intervals. In the UK, existing and expanding surveillance consists of a multiplicity of data sources, typically noisy and providing only indirect evidence on the epidemic characteristics of interest. Models capable of reconstructing a pandemic on the basis of this type of information are, therefore, necessarily complex, as they need to link the unobserved transmission process to the intricate mechanisms generating the observed data (e.g. healthcare-seeking behaviour and reporting). As the volume and type of data expand, so does the model complexity and the attendant computational burden, limiting the capacity for real-time inference. This problem is exaggerated when multiple model runs are required to adapt the model to sudden changes in data patterns due, for example, to modifications in population behaviour following an intervention. Here we extend the modelling of Birrell et al. 2011, focussing on the capability to perform real-time inference. Originally, the model was implemented within the Bayesian statistical framework using Markov Chain-Monte Carlo (MCMC). The real-time utility of MCMC is limited by its requirement to consider all relevant data in their entirety each time the analysis is iterated. We investigate sequential methods for Bayesian analysis that form a hybrid of MCMC and particle filtering that prove capable of drastically reducing the required computation time without losing the intrinsic accuracy of the “gold-standard” MCMC techniques. We illustrate this using both simulated data and data from the 2009 pandemic in England, highlighting

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

Machine learning, genomic data cytotoxic prediction: The DREAM Toxicogenetics Challenge Authors: Savage, Richard (presenting) University of Warwick [email protected] Moore, Jonathon University of Warwick Evidence Synthesis to Inform Health 13.30-13.48 Armitage Room Keywords: Toxicogenetics Genomics Machine learning Data science Cytotoxicity Abstract: The cytotoxicity of a given chemical compound depends on the nature of the cells that are being exposed, with differences in the genomic make-up of a cell influencing the level of toxic response. If we can predict this toxic response on the basis of genomic and other information, things such as stratified medicine come one step closer. The DREAM Toxicogenetics Challenge was an open competition to predict cytotoxic response to a range of unknown chemical compounds. These compounds were applied to cell lines from the 1000 genomes project. The challenge was to accurately predict the EC10 toxic response in a subset of these cell lines, using SNP data and a small number of other covariates. I'll talk about what we learned from this challenge, and how our team (WarwickDataScience) topped the challenge leaderboard.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

Random-effects models incorporating genomic relationships between individuals Zhao, Jung Hua (presenting) MRC Epidemiology Unit, University of Cambridge [email protected] Luan, Jian’an MRC Epidemiology Unit, University of Cambridge Sharp, Stephen MRC Epidemiology Unit, University of Cambridge Statistical Genomics 13.48-14.06 Bowett Room Keywords: random effects model genomic relationship meta-analysis longitudinal data analysis Abstract: In contrast to analysis of family data where relationships among related individuals are usually taken into account, the use of relationships among individuals derived from genomic data in the analysis of a population-based study is relatively recent (Yang et al. Nat Genet 2010;42:565-9; Zhao & Luan. J Prob Stat 2012; http://dx.doi.org/10.1155/2012/485174). Through a series of analyses of body size and Type 2 diabetes, we investigated the impact of incorporating a relationship matrix derived from large-scale genomic data on the results from random effects models fit to data from both family and population-based studies. Population-based studies were the EPIC-Norfolk case-cohort study including 2,417 subcohort and 1,305 obese individuals with data from the Affymetrix 500K genechip, the Birtish National Survey of Health and Development study including 2,452 individuals with repeated measures at eleven time points and genetic data from Metabochip, and the EPIC-InterAct study with 4,624 type-2 diabetes cases and 4,666 non-cases from seven European countries and genetic data from llumina660WQuad genechip. Family-based data came from the Framingham Heart study including 6,848 individuals with genetic data from an Affymetrix 500K genechip. We estimated genotype-phenotype associations (using SNPs for body mass index identified by the GIANT consortium, Locke et al. in preparation) and heritabilities, incorporating relationships based on family structure (in the family-based study) and derived from genomic data (in the population-based studies and the family-based study when appropriate). We also assessed potential secular trends in heritabilities and variation in heritability estimation using both individual participant data and summary data from meta-analysis. We have developed R functions to facilitate these analyses. We discuss the implications of our findings for future analyses, in particular with respect to meta-analysis.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

Genetic Reconstruction of Large Pedigrees Authors: Sheehan, Nuala (presenting) University of Leicester [email protected] Bartlett, Mark University of York Cussens, James University of York Statistical Genomics 14.06-14.24 Bowett Room Keywords: Pedigrees Constraints Integer Programming Maximum Likelihood Abstract: The problem of estimating relationships amongst a group of individuals from genetic marker data (‘pedigree reconstruction’) is of interest in many diverse areas. In epidemiological research, large population biobanks of unrelated individuals have been highly successful in detecting common genetic variants affecting diseases of public health concern but lack the statistical power to detect more modest gene-gene and gene-environment interaction effects or the effects of rare variants for which related individuals are ideally required. Most large population studies will undoubtedly contain sets of undeclared relatives, or pedigrees. Relatives are more likely to share longer haplotypes around disease susceptibility loci and are hence biologically more informative for rare variants than unrelated cases and controls. Moreover, the identification of relatives enables appropriate adjustments of statistical analyses that typically assume unrelatedness. In theory, estimating the pedigree for a given set of individuals from genetic marker data requires consideration of all possible relationships amongst them and computing the likelihood for each. For any reasonably sized problem, brute force enumeration is clearly impractical. The reconstruction problem can be formulated as a problem of Bayesian network (BN) learning or, more generally, graphical structure estimation, and is known to be NP-hard. The desired graph has structural constraints so maximum likelihood pedigree reconstruction can be viewed as a constrained optimisation problem. We propose an integer linear programming (ILP) approach which is adapted to find valid pedigrees by imposing appropriate constraints. Our method, unlike others, is not restricted to small pedigrees and is guaranteed to return a maximum likelihood pedigree. With additional constraints, we can also search for multiple high probability pedigrees and thus account for the inherent uncertainty in any particular pedigree reconstruction. The method performs well in a straightforward situation and extensions to more complex problems seem feasible. (Work funded by: MRC Project Grant G1002312)

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

Analyzing gene expression data with matrix-variate distributions: Estimation and Hypothesis Testing Authors: Touloumis, Anestis (presenting) CRUK CI, University of Cambridge [email protected] Tavaré, Simon CRUK CI, University of Cambridge Marioni, John EMBL-European Bioinformatics Institute Keywords: Gene Expression High-Dimensional Estimation Hypothesis Testing Statistical Genomics 14.24-14.42 Bowett Room Abstract: In genetics, there are studies in which it is critical to write the data for each subject in a matrix such that both the rows and the columns correspond to features of interest. In this way, we avoid losing important interpretability aspects of the data. For example, suppose that for each subject gene expression levels are measured in different tissue samples. For each subject, we can record the RNA expression levels in a matrix where the row variables correspond to genes and the column variables to different tissue samples. Two possible questions of interest are: i) what, if any, is the dependence structure of the genes and of the different tissue samples and ii) how the gene expression levels vary across the different tissue samples. In this talk, we present shrinkage estimators for the covariance matrix of the genes and of the tissue samples assuming a matrix-variate normal model. Then we relax the normality assumption and we propose nonparametric tests for the sphericity and identity hypotheses of the genes or tissue samples covariance matrix. Further, we relax the Kronecker product assumption for the dependence structure and we propose nonparametric tests to assess the mean relationship between the genes and different tissue samples. We illustrate the above by analysing two empirical studies: a glioblastoma cancer study and the mouse aging atlas project.

Medical Research Council Conference on Biostatistics in celebration of the MRC Biostatistics Unit's Centenary Year

24th - 26th March 2014 | Queens' College Cambridge, UK

Balancing robustness and predictive performance in biomarker selection, in application to proteomics Authors: Lewin, Alex (presenting) Imperial College London [email protected] Kirk, Paul Imperial College London Statistical Genomics 14.42-15.00 Bowett Room Keywords: Proteomics variable selection Abstract: Balancing robustness and predictive performance in biomarker selection One of the most common uses for high-throughput genomics data is biomarker selection, using classification and prediction models to find a small number of biomarkers associated with a particular disease or other outcome. A major concern in feature selection, in particular for gene expression data and proteomics data, is the reproducibility of results. Many studies have found very low reproducibility of selected biomarkers, despite good predictive performance (Ein-Dor et al. 2005). Recently several papers have explored methods which use stability selection to pick a robust set of biomarkers (He and Yu 2010). Stability alone is not enough however, as predictive ability can suffer. We explore a number of variable selection strategies which combine stability selection with predictive performance to provide robust biomarker sets. A second, very common, issue is how to deal with correlated biomarkers. Correlations amongst variables can lead to highly predictive but very unstable selections. We use regression methods with the elastic net penalty (Zou and Hastie 2005) to allow us to control whether single variables or correlated sets are selected. We illustrate the approach on simulated data sets, showing that including stability reduces false positive rates, and apply the method to identify robust biomarkers in proteomics data. Kirk P, Witkover A, Bangham CR, Richardson S, Lewin AM, Stumpf MP. (2013), Balancing the robustness and predictive performance of biomarkers. J Comput Biol. 2013 Aug 2.