khoury ashg2014

50
Separating Signal from Noise in the Age of Genomics & Big Data: A Public Health Approach Muin J. Khoury MD, PhD CDC Office of Public Health Genomics NCI Epidemiology & Genomics Research Program

Upload: muink

Post on 01-Jul-2015

2.843 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Khoury ashg2014

Separating Signal from Noise in the Age of

Genomics & Big Data:

A Public Health Approach

Muin J. Khoury MD, PhD

CDC Office of Public Health Genomics

NCI Epidemiology & Genomics Research Program

Page 2: Khoury ashg2014

Outline

Big Data & Causation in the Age of Genomics

Promises of Genomics & Big Data

Challenges of Genomics & Big Data

A Public Health Approach to Realize Potential of

Genomics & Big Data

Page 3: Khoury ashg2014

A Case Study: Searching for Needles in the

Haystack- The CDC HuGE Navigator

http://www.hugenavigator.net/HuGENavigator/home.do

Page 4: Khoury ashg2014

Text Mining Tool To Find HuGE Articles

in Published Literature

PubMed Signal/Noise ratio very

low

Support Vector Machine (SVM)

tool generated in 2008

Based on >3800 words in text,

extensively validated

Sensitivity & specificity >97%

Since 2008, genetic epidemiology

literature has changed

considerably

Performance of SVM model was

significantly reduced (60%)

In 2014, Retrained SVM now using

> 4500 words pushed sensitivity

and specificity to >90% Yu W et al. BMC Bioinformatics, 2008

Page 5: Khoury ashg2014

Application of Data Mining in the Prediction

of Type 2 Diabetes in the United States

1999-2004 National Health and

Nutrition Examination Survey

Developed and validated SVM

models for diabetes, undiagnosed

diabetes & prediabetes using

numerous variables in survey

Discriminative abilities Using area

under ROC curve of 84% and 73%

Validated known risk factors for

diabetes

Not clear what best models, what

best variables to use and how

applicable to other populations

Proof of concept only Yu W et al. BMC Medical Informatics 2010

Page 6: Khoury ashg2014

The IOM Ecological Model & the Need for Multilevel Analysis of “Causation”

Obesity Example NEJM 2007;357:404-7

IOM Ecological Model

Page 7: Khoury ashg2014

“We will all be surrounded by a personal cloud of billions of data pointsl“ L

Hood (ISB)

Genomics & Big Data

The Genome is Just the Beginning

Page 8: Khoury ashg2014

Big Data: From Association to Prediction

How about Causation?

Association

Replication

Classification

Prediction

?CAUSATION

Does Big Data care about “Causation”?

Intervention is based on cause-effect

relationships

Page 9: Khoury ashg2014

The Promises of Genomics & Big Data

The Economist

Page 10: Khoury ashg2014

The Promises of Genomics & Big Data

Workup of Rare & Familial Diseases

NEJM June2014

Page 11: Khoury ashg2014

The Promises of Genomics & Big Data

Improved Disease Classification

Page 12: Khoury ashg2014

The Promises of Genomics & Big Data

Improved Measurement of the “Environment”

http://www.niehs.nih.gov/research/programs/geh/geh_newsletter/2014/4/spotlight/index.cfm

Page 13: Khoury ashg2014

The Promises of Genomics & Big Data

Better Understanding of Natural History

G Ginsburg

Page 14: Khoury ashg2014

The Promises of Genomics & Big Data Stratified Prevention (One size does not fit All)

No one is average: “population medicine: let’s get over it” (E. Topol)

Page 15: Khoury ashg2014

The Promises of Genomics & Big Data

Precision Medicine

Page 16: Khoury ashg2014

The Promises of Genomics & Big Data

Pathogen Genomics

Page 17: Khoury ashg2014

The Promises of Genomics & Big Data

Public Health Practice

“As cholera swept through London in the

mid-19th century, a physician named John

Snow painstakingly drew a paper map

indicating clusters of homes where the

deadly waterborne infection had struck. In

an iconic feat in public health history, he

implicated the Broad Street pump as the

source of the scourge—a founding event in

modern epidemiology. Today, Snow might

have crunched GPS information and disease

prevalence data and solved the problem

within hours”http://www.hsph.harvard.edu/news/magazine/big-datas-big-

visionary/?utm_source=SilverpopMailing&utm_medium=email&utm_cam

paign=Kiosk%2009.25.14_academic%20(1)&utm_content

Page 18: Khoury ashg2014

Some Promises of Genomics & Big Data

Workup of Rare & Familial Diseases

Improved Disease Classification

Improved Measurement of the “Environment”

Better Understanding of Disease Natural History

Stratified Prevention

Precision Medicine

Pathogen Genomics

Public Health Practice

Page 19: Khoury ashg2014

The Challenges of Genomics & Big Data

Problems of Study Designs & Hidden Biases

“…claims are based upon complex

(and we believe flawed)

analyses…there are far simpler

alternative explanations for the

patterns they observed. We believe

that the authors have not excluded

important alternative explanations“

G. Breen

Schizophrenia is Eight Different Diseases

Not One” USA Today (9/15/2014)

“Eight types of schizophrenia? Not so

fast” Genomes Unzipped (9/30/2014)

Am J Psychiatry Sep 2014

Page 20: Khoury ashg2014
Page 21: Khoury ashg2014

The Challenges of Genomics & Big Data

Analytic Issues: Dealing with Complexity

Prediction of LDL cholesterol response to statin using transcriptomic and

genetic variation. Kyungpil Kim et al. Genome Biology, Sep 2014

Page 22: Khoury ashg2014

The Challenges of Genomics & Big Data

Reproducibility

Lots of Input

Variables

Molecularly defined

Disease subsets & precursors

Millions

of genetic

variants

Page 23: Khoury ashg2014

Am J Clin Nutrition 2013

Page 24: Khoury ashg2014

The Challenges of Genomics & Big Data

Causation, Ecologic Fallacies & Hubris

Page 25: Khoury ashg2014

‘The Scientific Method Itself is Growing

Obsolete.’ (A. Butte, Sep 2014)

“..implicit

assumption that big

data are a substitute

for, rather than a

supplement to,

traditional data

collection and

analysis."

http://blogs.kqed.org/science/

audio/how-big-data-is-

changing-medicine/

Garbage In, Garbage Out (GIGO)

Page 26: Khoury ashg2014

The Challenges of Genomics & Big Data

Beyond Prediction: From Validity to Utility

Page 27: Khoury ashg2014

The Challenges of Genomics & Big Data

Challenges of Population Stratification & Precision

Medicine

Page 28: Khoury ashg2014

Some Challenges of Genomics & Big Data

Problems of Study Designs & Hidden Biases

Analytic Issues: Dealing with Complexity

Reproducibility and Replication

Causation vs Association-Ecologic Fallacies &

Hubris

Translation: from Validity into Utility and

Implementation

Challenges of Population Stratification &

Personalized Medicine

Page 29: Khoury ashg2014

A Public Health Translation Framework

for Genomics & Big Data

Population

Health

Discovery

Evidence based

Recommendation

or Policy

Health care

& Prevention

Programs

Application

Knowledge

Integration

T1

T2

T3T4

T0

Khoury MJ et al, AJPH, 2012

Evaluation

Implementation

ScienceEffectiveness

& Outcomes

Research (CER, PCOR.

Economics, ELSI

Development

Basic, Clinical &

Population

Sciences

Page 30: Khoury ashg2014

A Public Health Approach to Realizing

Promises of Genomics & Big Data

1. Use a Strong Epidemiologic Foundation

The study of distribution and determinants of disease occurrence and outcomes in populations, and using resulting knowledge to improve health and prevent disease

Fundamental science of medicine and public health

Human Genome Epidemiology (HuGE)- Beyond Gene Discovery

New Brand of “Big Data Epidemiology” 2010

Page 31: Khoury ashg2014
Page 32: Khoury ashg2014

• Investigators responsible:

– 40+ high-quality cohorts

– 4+ million people

• Coordinated, interdisciplinary approach

• Tackle important scientific questions, economies of scale, and opportunities to quicken the pace of research

• Focused so far mostly on etiology, but adapting to include outcomes

Epidemiologic Cohort Studies:The NCI Cohort Consortium

• Major role in identifying specific carcinogenic environment agents▫ Asbestos – Lung▫ Benzene – Leukemia▫ Smoking – many dzs

• Exposures/Risk factors assessment prior to onset of disease▫ Overcome

recall/selection biases

• Permit absolute measures of risks/incidence rates▫ Relevant for public

health policies

• Value resource for studying for repeated measures and multiple outcomes

Page 33: Khoury ashg2014

Epidemiology Data Sharing & Harmonization

Nature, August 27, 2014

Page 34: Khoury ashg2014

A Public Health Approach to Realizing

Promises of Genomics & Big Data

2. Develop a Robust Knowledge Integration

Process

Page 35: Khoury ashg2014

A Public Health Approach to Realizing

Promises of Genomics & Big Data

2. Develop a Robust Knowledge Integration

Process

Page 36: Khoury ashg2014

Components of Knowledge Integration

• Knowledge Management: Integration of knowledge from disparate sources & disciplines

• Knowledge Synthesis: Systematic synthesis of scientific findings▫ Accumulating evidence on a cancer outcome

Minimize waste in repeat funding

▫ Identify scientific gapsInform research priorities

• Knowledge Translation▫ Stakeholder engagement ▫ Evidence-based information▫ Decision support tools

Page 37: Khoury ashg2014

Interpretation

“The Bottleneck for Realizing Personalized Medicine”

(Good et al. Genome Biology Sep 2014)

Page 38: Khoury ashg2014

The NIH BD2K Initiative Can Help

Page 39: Khoury ashg2014

A Public Health Approach to Realizing

Promises of Genomics & Big Data

3. Use (and not avoid) Principles of Evidence-

based Medicine and Population Screening

Page 40: Khoury ashg2014

Guidelines We Can Trust (IOM, 2011)

Page 41: Khoury ashg2014

Guidelines We Can Trust in Genomic Medicine (Schully S et al. Genetics in Medicine 2014)

Page 42: Khoury ashg2014

CDC-Sponsored

EGAPP Working Group

• Independent, multidisciplinary, non-federal panel established in 2004

• Established a systematic, evidence-based process to assess validity & utility of genomic tests & family health history applications.

• New methods for evidence synthesis and modeling in 2013, including next generation sequencing and stratified cancer screening based on family history

• 10 recommendation statements to date:• Colorectal cancer, breast cancer, heart disease, clotting

disorders, depression, prostate cancer, diabetes, and more

• Clinical Validity vs Clinical Utility• Uncovered evidence gaps that require additional

research• Principles can be applied to other “Big Data”

Page 43: Khoury ashg2014

Evidence-based Classification of Genomic

Applications in Practice

Tier 1

Tier 2

Tier 3

http://www.cdc.gov/genomics/gtesting/tier.htm

Page 44: Khoury ashg2014

Evidence-based Binning of the Genome

Genetics in Medicine 2011

Page 45: Khoury ashg2014

A Public Health Approach to Realizing

Promises of Genomics & Big Data

4. Develop a Robust T2+ Translational

Research Agenda

Page 46: Khoury ashg2014

Limited Translational Research in Genomics Beyond the Bedside

Khoury MJ, 2007, Schully, 2012. Clyne, M, 2014

T0 ↔ T1 ↔ T2 ↔ T3 ↔ T4

Discovery to Application Guideline to Practice to Application to Guideline Practice Population

Health Impact

<1% of published genomics research

in T2 – T4

Multiple clinical and population

scientific disciplines involved

Page 47: Khoury ashg2014

Cancer Genomics Research Funding T2+

Public Health Genomics 2010

Page 48: Khoury ashg2014

A MultiDisciplinary T2+ Research Agenda

Comparative Effectiveness Research

Patient-centered Outcomes Research

Behavioral, Social & Communication Sciences

Economic Studies

Surveillance & Population Monitoring

Page 49: Khoury ashg2014

A Public Health Approach to Realizing

Promises of Genomics & Big Data

Use a Strong Epidemiologic Foundation

Develop a Robust Knowledge Integration

Process

Use (and not avoid) Principles of Evidence-

based Medicine and Population Screening

Develop a Robust T2+ Research Agenda

(Learning Health systems, Consumer

Involvement etc..)

Page 50: Khoury ashg2014

In Summary

“Big Data” is agnostic to disease causation

Numerous promises for health impact of genomics

& Big Data- Leading edge in genomics in Big Data

beginning to be applied

But numerous challenges face genomics & Big

Data. So we should not overpromise & under

deliver

A “Public Health” translational approach Is needed

to realize potential of genomics & Big Data