How Big Data and Causal Inference Work
Together in Health Policy
Lisa Lix, University of Manitoba
Thematic Program on Statistical Inference, Learning, and Models for Big Data
June 12, 2015
Presentation Overview
• Background, purpose and objectives of the Health
Policy Workshop
• Speakers and topics
• Comments from participants: What was most
valuable?
• Conclusions – Research and training opportunities
Major Inferential Challenges
• Assessment of sampling biases
• Inference about tails
• Resampling inference
• Change point detection
• Reproducibility of analyses
• Causal inference for observational data
• Efficient inference for temporal streams
Manitoba Centre for Health Policy Data
Repository
Population- Based Health
Registry
Social Housing
Education
Healthy Child MB
Immunization
Medical Services
Lab
Nursing Home
Clinical
Provider Vital
Statistics
ER
Health Links
Home Care
Pharmaceuticals
Hospital
Family Services
Income Assistance
Census
Data at
DA/EA Level
• Family First
• Healthy Baby
• EDI
• ICU
• FASD
• Pediatric
Diabetes
CancerCare
Registry
Characteristics of Big Data in Health
Policy
• Diversity of linked databases
– Unstructured data
• Physician notes, laboratory test results
– Genomics, imaging, longitudinal data
• Population-based data vs. clinical registry
data (wide/broad versus deep/narrow)
• Privacy-conscious analysis environments
Workshop Rationale
• A massive numbers of variable associations can
be explored in big healthcare databases
• Causality is hard to establish and is a vague and
poorly specified construct
• Traditional approaches for causal inference, such
as regression adjustment and stratification, have
limitations in big data environments
Workshop Context
• Theoretical frameworks, study designs and
analytic methods that allow causality to be
inferred from large, observational data.
• What is challenging and unique about
exploring causality in big data settings?
• Examples of large-scale investigations in
which causal inferences are being explored.
Workshop Objectives
• To provide a forum for health policy
analysts/program staff to engage with
statisticians to discuss research challenges
and opportunities;
• To serve as a catalyst for exploring research
collaborations;
• To expose participants to innovations in study
design techniques and methods for health
policy research
Overview of Topics
• Day 1: Opening panel session; data quality; graphical methods
for causal inference
• Day 2: Pragmatic trials; causality in comparative effectiveness
research; healthcare system applications; microsimulation
modeling
• Day 3: Propensity score models; drug safety and effectiveness;
diagnostic testing
• Day 4: marginal structural models; external validity of
observational studies and clinical trials; models to test for
periodicity and trend effects
• Day 5: medical device safety and effectiveness; cautions when
working in big data environments
Opening Panel
• Speakers
– Michael Schull, Institute for Clinical Evaluative Sciences
(ICES)
– Arlene Ash, University of Massachusetts Medical School
– Mark Smith, Manitoba Centre for Health Policy (MCHP)
– Therese Stukel, University of Toronto & ICES
• Overview of data repositories at ICES, MCHP, and in
other jurisdictions
Opening Panel
• Challenges
– Data privacy & confidentiality
– Differences in data structures (even for common sources)
– Siloes: government, academia, industry
• Opportunities
– Emphasis on evidence to inform decision making
– Skill development amongst researchers & analysts
– Increased emphasis on the value of cross-disciplinary
research
Session on Data Quality
• Speakers
– Mark Smith, Manitoba Centre for Health Policy
– Mahmoud Azimaee, Institute for Clinical Evaluative Sciences
• Topics
– Data quality frameworks
– Automated graphical, and inferential techniques for data
quality evaluation
– Data scale issues
Workshops
• Peter Austin, University of Toronto
– Propensity score methods for estimating treatment
effects using observational data
– The propensity score is the probability of treatment
assignment conditional on observed baseline covariates
– Four different methods of using the propensity score were
discussed: matching, weighting, stratification, and covariate
adjustment
– Emphasized the role of the propensity score as a balancing
score
Workshops
• Erica Moodie, McGill University
– Marginal structural models
– Produce semi-parametric estimates that adjust for time-
dependent confounding in observational studies about the
effect of time-varying treatments on binary outcomes
– Assumptions required for model identification were
discussed
– Three approaches to estimation were presented: inverse
probability weighting, g-computation, and g-estimation
Other Speaker Topics
• Jonas Peters, Max Planck Institute for Intelligent
Systems
– Three Ideas for Causal Inference
– Ideas that aim at solving the problem of causal inference in
observational data:
• additive noise models: assume that the involved
functions are of a particularly simple form
• constraint-based methods: relate conditional
independences in the distribution with a graphical
criterion called d-separation
• invariant prediction: makes use of observing the data
generating process in different "environments"
Other Speaker Topics
• Patrick Heagerty, University of Washington
– Pragmatic Trials and the Learning Health Care System
– Stepped wedge cluster randomised trial
• increasingly being used in the evaluation of service delivery type
interventions
• design involves random and sequential crossover of clusters from
control to intervention until all clusters are exposed.
• use is on the increase: HIV, cancer treatment, healthcare associated
infections, healthcare treatments
• well suited to evaluations that do not rely on individual patient
recruitment
Common Trial Designs
Stepped Wedge Design
Other Speaker Topics
• Elizabeth Stuart, Johns Hopkins University
– Using big data to estimate population treatment effects
– Analysis methods for improved external validity
• Goal: to make statements about the likely effects of a
treatment in the target population
• Assessing and enhancing external validity with respect to
the characteristics of trial and population subjects
• Some approaches: meta-analysis, cross-design
synthesis, reweighting (reweight trial members to full
population using inverse probability of participation
weights)
Applications
• Xiochun Li, Indiana University
– EMR² : Evidence Mining Research in Electronic Medical
Records Towards Better Patient Care
– Indianapolis Network for Patient Care
– Created in 1995
– Houses clinical data from over 80 hospitals, public health
departments, local laboratories and imaging centers, surgical
centers, and a few large-group practices closely tied to
hospital systems, for approximately 13.4 million unique
patients
– Data are being used for comparative effectiveness and
pharmaco-epidemiology research
Applications
• Danica Marinac-Dabic, Center for Devices and
Radiological Health, Food and Drug Administration,
USA
– MDEpiNet: Strengthening Medical Device Ecosystem for
Surveillance and Innovation
– Public-private partnership that provides leadership in
innovative data source development and analytic
methodologies for implementation of medical device
research and surveillance to enhance patient- centered
outcomes
24
National Medical Device Postmarket
Surveillance Plan
Applications
• Dan Chateau, University of Manitoba
• Implementing a research program using data from multiple
jurisdictions: The Canadian Network for Observational Drug
Effect Studies (CNODES)
Canadian Network for Observational Drug
Effect Studies (CNODES)
• Network of over 60 Canadian pharmacoepidemiologists,
biostatisticians, clinicians, clinical pharmacologists,
pharmacists, IT professionals, data analysts, and students
using linked administrative data in 7 provinces plus CPRD
and US data.
• Timely responses to queries from Canadian public
stakeholders regarding drug safety and effectiveness
• Funded by the Canadian Institutes of Health Research (CIHR)
through the government’s Drug Safety and Effectiveness
Network (DSEN) program to create CNODES, in March 2011
for five years.
Typical CNODES Study
• Query (potential safety signal) from Health Canada • Review and feasibility studies • Distributed network
» Up to 9 sites (7 provinces + GPRD + US MarketScan)
• Common protocol • Different data structures/availability
Typically > 500 potential (but non-specific) confounders
• Meta-analysis combines results across studies • Methods team: provides statistical/epidemiological
expertise across projects
1 -
2 -
C. Blais, S. Jean, C. Sirois, L. Rochette, C. Plante, I.
Larocque, M. Doucet, G. Ruel, M. Simard, P. Gamache,
D. Hamel, D. St-Laurent, V. E´mond, Quebec Integrated
Chronic Disease Surveillance System (QICDSS), an
innovative approach. Chronic Diseases and Injuries in
Canada, 2014:34(4):226-235.
Quebec Integrated Chronic
Disease Surveillance System
Participant Comments
• The challenges of working with big data requires an
interdisciplinary and collaborative approach
• Significant investments of time, expertise, and
resources are needed to work with large datasets
• Important to have a grounding in statistical theory:
e.g., structural models
• Uncertainty: in data acquisition, data quality, and
modeling
• Big data can be blended or combined in creative
ways to address policy-relevant problems
Conclusions: An Era of “Data-Centric
Science”
• Requires new paradigms that address how data are
captured, processed, discovered, exchanged,
distributed, analyzed
• Traditional methods have largely focused on analysts
being able to develop and analyze data within their
own computing environment(s)
• The distributed and heterogeneous nature of large
databases provides substantial challenges
Conclusions: Training Opportunities
• Real-world simulations
– Observational Medical Outcomes
Partnership/Observational Health Data
Sciences
• Distributed data models
– Increased emphasis on standardized
protocols (e.g., common data model)
• Model development, selection,
evaluation
Thanks to Planning Committee
• Constantine Gatsonis
• Thérèse Stukel
• Sharon-Lise Normand
• Special thanks to Nancy Reid for advice
and guidance