Download - How Big Data and Causal Inference Work Together in Health ... · –Three Ideas for Causal Inference –Ideas that aim at solving the problem of causal inference in observational

How Big Data and Causal Inference Work

Together in Health Policy

Lisa Lix, University of Manitoba

Thematic Program on Statistical Inference, Learning, and Models for Big Data

June 12, 2015

Presentation Overview

• Background, purpose and objectives of the Health

Policy Workshop

• Speakers and topics

• Comments from participants: What was most

valuable?

• Conclusions – Research and training opportunities

Major Inferential Challenges

• Assessment of sampling biases

• Inference about tails

• Resampling inference

• Change point detection

• Reproducibility of analyses

• Causal inference for observational data

• Efficient inference for temporal streams

Manitoba Centre for Health Policy Data

Repository

Population- Based Health

Registry

Social Housing

Education

Healthy Child MB

Immunization

Medical Services

Lab

Nursing Home

Clinical

Provider Vital

Statistics

ER

Health Links

Home Care

Pharmaceuticals

Hospital

Family Services

Income Assistance

Census

Data at

DA/EA Level

• Family First

• Healthy Baby

• EDI

• ICU

• FASD

• Pediatric

Diabetes

CancerCare

Registry

Characteristics of Big Data in Health

Policy

• Diversity of linked databases

– Unstructured data

• Physician notes, laboratory test results

– Genomics, imaging, longitudinal data

• Population-based data vs. clinical registry

data (wide/broad versus deep/narrow)

• Privacy-conscious analysis environments

Workshop Rationale

• A massive numbers of variable associations can

be explored in big healthcare databases

• Causality is hard to establish and is a vague and

poorly specified construct

• Traditional approaches for causal inference, such

as regression adjustment and stratification, have

limitations in big data environments

Workshop Context

• Theoretical frameworks, study designs and

analytic methods that allow causality to be

inferred from large, observational data.

• What is challenging and unique about

exploring causality in big data settings?

• Examples of large-scale investigations in

which causal inferences are being explored.

Workshop Objectives

• To provide a forum for health policy

analysts/program staff to engage with

statisticians to discuss research challenges

and opportunities;

• To serve as a catalyst for exploring research

collaborations;

• To expose participants to innovations in study

design techniques and methods for health

policy research

Overview of Topics

• Day 1: Opening panel session; data quality; graphical methods

for causal inference

• Day 2: Pragmatic trials; causality in comparative effectiveness

research; healthcare system applications; microsimulation

modeling

• Day 3: Propensity score models; drug safety and effectiveness;

diagnostic testing

• Day 4: marginal structural models; external validity of

observational studies and clinical trials; models to test for

periodicity and trend effects

• Day 5: medical device safety and effectiveness; cautions when

working in big data environments

Opening Panel

• Speakers

– Michael Schull, Institute for Clinical Evaluative Sciences

(ICES)

– Arlene Ash, University of Massachusetts Medical School

– Mark Smith, Manitoba Centre for Health Policy (MCHP)

– Therese Stukel, University of Toronto & ICES

• Overview of data repositories at ICES, MCHP, and in

other jurisdictions

Opening Panel

• Challenges

– Data privacy & confidentiality

– Differences in data structures (even for common sources)

– Siloes: government, academia, industry

• Opportunities

– Emphasis on evidence to inform decision making

– Skill development amongst researchers & analysts

– Increased emphasis on the value of cross-disciplinary

research

Session on Data Quality

• Speakers

– Mark Smith, Manitoba Centre for Health Policy

– Mahmoud Azimaee, Institute for Clinical Evaluative Sciences

• Topics

– Data quality frameworks

– Automated graphical, and inferential techniques for data

quality evaluation

– Data scale issues

Workshops

• Peter Austin, University of Toronto

– Propensity score methods for estimating treatment

effects using observational data

– The propensity score is the probability of treatment

assignment conditional on observed baseline covariates

– Four different methods of using the propensity score were

discussed: matching, weighting, stratification, and covariate

adjustment

– Emphasized the role of the propensity score as a balancing

score

Workshops

• Erica Moodie, McGill University

– Marginal structural models

– Produce semi-parametric estimates that adjust for time-

dependent confounding in observational studies about the

effect of time-varying treatments on binary outcomes

– Assumptions required for model identification were

discussed

– Three approaches to estimation were presented: inverse

probability weighting, g-computation, and g-estimation

Other Speaker Topics

• Jonas Peters, Max Planck Institute for Intelligent

Systems

– Three Ideas for Causal Inference

– Ideas that aim at solving the problem of causal inference in

observational data:

• additive noise models: assume that the involved

functions are of a particularly simple form

• constraint-based methods: relate conditional

independences in the distribution with a graphical

criterion called d-separation

• invariant prediction: makes use of observing the data

generating process in different "environments"


• Patrick Heagerty, University of Washington

– Pragmatic Trials and the Learning Health Care System

– Stepped wedge cluster randomised trial

• increasingly being used in the evaluation of service delivery type

interventions

• design involves random and sequential crossover of clusters from

control to intervention until all clusters are exposed.

• use is on the increase: HIV, cancer treatment, healthcare associated

infections, healthcare treatments

• well suited to evaluations that do not rely on individual patient

recruitment

Common Trial Designs

Stepped Wedge Design


• Elizabeth Stuart, Johns Hopkins University

– Using big data to estimate population treatment effects

– Analysis methods for improved external validity

• Goal: to make statements about the likely effects of a

treatment in the target population

• Assessing and enhancing external validity with respect to

the characteristics of trial and population subjects

• Some approaches: meta-analysis, cross-design

synthesis, reweighting (reweight trial members to full

population using inverse probability of participation

weights)

Applications

• Xiochun Li, Indiana University

– EMR² : Evidence Mining Research in Electronic Medical

Records Towards Better Patient Care

– Indianapolis Network for Patient Care

– Created in 1995

– Houses clinical data from over 80 hospitals, public health

departments, local laboratories and imaging centers, surgical

centers, and a few large-group practices closely tied to

hospital systems, for approximately 13.4 million unique

patients

– Data are being used for comparative effectiveness and

pharmaco-epidemiology research

Applications

• Danica Marinac-Dabic, Center for Devices and

Radiological Health, Food and Drug Administration,

USA

– MDEpiNet: Strengthening Medical Device Ecosystem for

Surveillance and Innovation

– Public-private partnership that provides leadership in

innovative data source development and analytic

methodologies for implementation of medical device

research and surveillance to enhance patient- centered

outcomes

24

National Medical Device Postmarket

Surveillance Plan

Applications

• Dan Chateau, University of Manitoba

• Implementing a research program using data from multiple

jurisdictions: The Canadian Network for Observational Drug

Effect Studies (CNODES)

Canadian Network for Observational Drug

Effect Studies (CNODES)

• Network of over 60 Canadian pharmacoepidemiologists,

biostatisticians, clinicians, clinical pharmacologists,

pharmacists, IT professionals, data analysts, and students

using linked administrative data in 7 provinces plus CPRD

and US data.

• Timely responses to queries from Canadian public

stakeholders regarding drug safety and effectiveness

• Funded by the Canadian Institutes of Health Research (CIHR)

through the government’s Drug Safety and Effectiveness

Network (DSEN) program to create CNODES, in March 2011

for five years.

Typical CNODES Study

• Query (potential safety signal) from Health Canada • Review and feasibility studies • Distributed network

» Up to 9 sites (7 provinces + GPRD + US MarketScan)

• Common protocol • Different data structures/availability

Typically > 500 potential (but non-specific) confounders

• Meta-analysis combines results across studies • Methods team: provides statistical/epidemiological

expertise across projects

1 -

2 -

C. Blais, S. Jean, C. Sirois, L. Rochette, C. Plante, I.

Larocque, M. Doucet, G. Ruel, M. Simard, P. Gamache,

D. Hamel, D. St-Laurent, V. E´mond, Quebec Integrated

Chronic Disease Surveillance System (QICDSS), an

innovative approach. Chronic Diseases and Injuries in

Canada, 2014:34(4):226-235.

Quebec Integrated Chronic

Disease Surveillance System

Participant Comments

• The challenges of working with big data requires an

interdisciplinary and collaborative approach

• Significant investments of time, expertise, and

resources are needed to work with large datasets

• Important to have a grounding in statistical theory:

e.g., structural models

• Uncertainty: in data acquisition, data quality, and

modeling

• Big data can be blended or combined in creative

ways to address policy-relevant problems

Conclusions: An Era of “Data-Centric

Science”

• Requires new paradigms that address how data are

captured, processed, discovered, exchanged,

distributed, analyzed

• Traditional methods have largely focused on analysts

being able to develop and analyze data within their

own computing environment(s)

• The distributed and heterogeneous nature of large

databases provides substantial challenges

Conclusions: Training Opportunities

• Real-world simulations

– Observational Medical Outcomes

Partnership/Observational Health Data

Sciences

• Distributed data models

– Increased emphasis on standardized

protocols (e.g., common data model)

• Model development, selection,

evaluation

Thanks to Planning Committee

• Constantine Gatsonis

• Thérèse Stukel

• Sharon-Lise Normand

• Special thanks to Nancy Reid for advice

and guidance

Contact Information

Lisa Lix, PhD P.Stat.

e-mail: [email protected]

website: www.chimb.ca

mailto:[email protected]

http://www.chimb.ca/

Download - How Big Data and Causal Inference Work Together in Health ... · –Three Ideas for Causal Inference –Ideas that aim at solving the problem of causal inference in observational

Top Related