khoury ashg2014
TRANSCRIPT
Separating Signal from Noise in the Age of
Genomics & Big Data:
A Public Health Approach
Muin J. Khoury MD, PhD
CDC Office of Public Health Genomics
NCI Epidemiology & Genomics Research Program
Outline
Big Data & Causation in the Age of Genomics
Promises of Genomics & Big Data
Challenges of Genomics & Big Data
A Public Health Approach to Realize Potential of
Genomics & Big Data
A Case Study: Searching for Needles in the
Haystack- The CDC HuGE Navigator
http://www.hugenavigator.net/HuGENavigator/home.do
Text Mining Tool To Find HuGE Articles
in Published Literature
PubMed Signal/Noise ratio very
low
Support Vector Machine (SVM)
tool generated in 2008
Based on >3800 words in text,
extensively validated
Sensitivity & specificity >97%
Since 2008, genetic epidemiology
literature has changed
considerably
Performance of SVM model was
significantly reduced (60%)
In 2014, Retrained SVM now using
> 4500 words pushed sensitivity
and specificity to >90% Yu W et al. BMC Bioinformatics, 2008
Application of Data Mining in the Prediction
of Type 2 Diabetes in the United States
1999-2004 National Health and
Nutrition Examination Survey
Developed and validated SVM
models for diabetes, undiagnosed
diabetes & prediabetes using
numerous variables in survey
Discriminative abilities Using area
under ROC curve of 84% and 73%
Validated known risk factors for
diabetes
Not clear what best models, what
best variables to use and how
applicable to other populations
Proof of concept only Yu W et al. BMC Medical Informatics 2010
The IOM Ecological Model & the Need for Multilevel Analysis of “Causation”
Obesity Example NEJM 2007;357:404-7
IOM Ecological Model
“We will all be surrounded by a personal cloud of billions of data pointsl“ L
Hood (ISB)
Genomics & Big Data
The Genome is Just the Beginning
Big Data: From Association to Prediction
How about Causation?
Association
Replication
Classification
Prediction
?CAUSATION
Does Big Data care about “Causation”?
Intervention is based on cause-effect
relationships
The Promises of Genomics & Big Data
The Economist
The Promises of Genomics & Big Data
Workup of Rare & Familial Diseases
NEJM June2014
The Promises of Genomics & Big Data
Improved Disease Classification
The Promises of Genomics & Big Data
Improved Measurement of the “Environment”
http://www.niehs.nih.gov/research/programs/geh/geh_newsletter/2014/4/spotlight/index.cfm
The Promises of Genomics & Big Data
Better Understanding of Natural History
G Ginsburg
The Promises of Genomics & Big Data Stratified Prevention (One size does not fit All)
No one is average: “population medicine: let’s get over it” (E. Topol)
The Promises of Genomics & Big Data
Precision Medicine
The Promises of Genomics & Big Data
Pathogen Genomics
The Promises of Genomics & Big Data
Public Health Practice
“As cholera swept through London in the
mid-19th century, a physician named John
Snow painstakingly drew a paper map
indicating clusters of homes where the
deadly waterborne infection had struck. In
an iconic feat in public health history, he
implicated the Broad Street pump as the
source of the scourge—a founding event in
modern epidemiology. Today, Snow might
have crunched GPS information and disease
prevalence data and solved the problem
within hours”http://www.hsph.harvard.edu/news/magazine/big-datas-big-
visionary/?utm_source=SilverpopMailing&utm_medium=email&utm_cam
paign=Kiosk%2009.25.14_academic%20(1)&utm_content
Some Promises of Genomics & Big Data
Workup of Rare & Familial Diseases
Improved Disease Classification
Improved Measurement of the “Environment”
Better Understanding of Disease Natural History
Stratified Prevention
Precision Medicine
Pathogen Genomics
Public Health Practice
The Challenges of Genomics & Big Data
Problems of Study Designs & Hidden Biases
“…claims are based upon complex
(and we believe flawed)
analyses…there are far simpler
alternative explanations for the
patterns they observed. We believe
that the authors have not excluded
important alternative explanations“
G. Breen
Schizophrenia is Eight Different Diseases
Not One” USA Today (9/15/2014)
“Eight types of schizophrenia? Not so
fast” Genomes Unzipped (9/30/2014)
Am J Psychiatry Sep 2014
The Challenges of Genomics & Big Data
Analytic Issues: Dealing with Complexity
Prediction of LDL cholesterol response to statin using transcriptomic and
genetic variation. Kyungpil Kim et al. Genome Biology, Sep 2014
The Challenges of Genomics & Big Data
Reproducibility
Lots of Input
Variables
Molecularly defined
Disease subsets & precursors
Millions
of genetic
variants
Am J Clin Nutrition 2013
The Challenges of Genomics & Big Data
Causation, Ecologic Fallacies & Hubris
‘The Scientific Method Itself is Growing
Obsolete.’ (A. Butte, Sep 2014)
“..implicit
assumption that big
data are a substitute
for, rather than a
supplement to,
traditional data
collection and
analysis."
http://blogs.kqed.org/science/
audio/how-big-data-is-
changing-medicine/
Garbage In, Garbage Out (GIGO)
The Challenges of Genomics & Big Data
Beyond Prediction: From Validity to Utility
The Challenges of Genomics & Big Data
Challenges of Population Stratification & Precision
Medicine
Some Challenges of Genomics & Big Data
Problems of Study Designs & Hidden Biases
Analytic Issues: Dealing with Complexity
Reproducibility and Replication
Causation vs Association-Ecologic Fallacies &
Hubris
Translation: from Validity into Utility and
Implementation
Challenges of Population Stratification &
Personalized Medicine
A Public Health Translation Framework
for Genomics & Big Data
Population
Health
Discovery
Evidence based
Recommendation
or Policy
Health care
& Prevention
Programs
Application
Knowledge
Integration
T1
T2
T3T4
T0
Khoury MJ et al, AJPH, 2012
Evaluation
Implementation
ScienceEffectiveness
& Outcomes
Research (CER, PCOR.
Economics, ELSI
Development
Basic, Clinical &
Population
Sciences
A Public Health Approach to Realizing
Promises of Genomics & Big Data
1. Use a Strong Epidemiologic Foundation
The study of distribution and determinants of disease occurrence and outcomes in populations, and using resulting knowledge to improve health and prevent disease
Fundamental science of medicine and public health
Human Genome Epidemiology (HuGE)- Beyond Gene Discovery
New Brand of “Big Data Epidemiology” 2010
• Investigators responsible:
– 40+ high-quality cohorts
– 4+ million people
• Coordinated, interdisciplinary approach
• Tackle important scientific questions, economies of scale, and opportunities to quicken the pace of research
• Focused so far mostly on etiology, but adapting to include outcomes
Epidemiologic Cohort Studies:The NCI Cohort Consortium
• Major role in identifying specific carcinogenic environment agents▫ Asbestos – Lung▫ Benzene – Leukemia▫ Smoking – many dzs
• Exposures/Risk factors assessment prior to onset of disease▫ Overcome
recall/selection biases
• Permit absolute measures of risks/incidence rates▫ Relevant for public
health policies
• Value resource for studying for repeated measures and multiple outcomes
Epidemiology Data Sharing & Harmonization
Nature, August 27, 2014
A Public Health Approach to Realizing
Promises of Genomics & Big Data
2. Develop a Robust Knowledge Integration
Process
A Public Health Approach to Realizing
Promises of Genomics & Big Data
2. Develop a Robust Knowledge Integration
Process
Components of Knowledge Integration
• Knowledge Management: Integration of knowledge from disparate sources & disciplines
• Knowledge Synthesis: Systematic synthesis of scientific findings▫ Accumulating evidence on a cancer outcome
Minimize waste in repeat funding
▫ Identify scientific gapsInform research priorities
• Knowledge Translation▫ Stakeholder engagement ▫ Evidence-based information▫ Decision support tools
Interpretation
“The Bottleneck for Realizing Personalized Medicine”
(Good et al. Genome Biology Sep 2014)
The NIH BD2K Initiative Can Help
A Public Health Approach to Realizing
Promises of Genomics & Big Data
3. Use (and not avoid) Principles of Evidence-
based Medicine and Population Screening
Guidelines We Can Trust (IOM, 2011)
Guidelines We Can Trust in Genomic Medicine (Schully S et al. Genetics in Medicine 2014)
CDC-Sponsored
EGAPP Working Group
• Independent, multidisciplinary, non-federal panel established in 2004
• Established a systematic, evidence-based process to assess validity & utility of genomic tests & family health history applications.
• New methods for evidence synthesis and modeling in 2013, including next generation sequencing and stratified cancer screening based on family history
• 10 recommendation statements to date:• Colorectal cancer, breast cancer, heart disease, clotting
disorders, depression, prostate cancer, diabetes, and more
• Clinical Validity vs Clinical Utility• Uncovered evidence gaps that require additional
research• Principles can be applied to other “Big Data”
Evidence-based Classification of Genomic
Applications in Practice
Tier 1
Tier 2
Tier 3
http://www.cdc.gov/genomics/gtesting/tier.htm
Evidence-based Binning of the Genome
Genetics in Medicine 2011
A Public Health Approach to Realizing
Promises of Genomics & Big Data
4. Develop a Robust T2+ Translational
Research Agenda
Limited Translational Research in Genomics Beyond the Bedside
Khoury MJ, 2007, Schully, 2012. Clyne, M, 2014
T0 ↔ T1 ↔ T2 ↔ T3 ↔ T4
Discovery to Application Guideline to Practice to Application to Guideline Practice Population
Health Impact
<1% of published genomics research
in T2 – T4
Multiple clinical and population
scientific disciplines involved
Cancer Genomics Research Funding T2+
Public Health Genomics 2010
A MultiDisciplinary T2+ Research Agenda
Comparative Effectiveness Research
Patient-centered Outcomes Research
Behavioral, Social & Communication Sciences
Economic Studies
Surveillance & Population Monitoring
A Public Health Approach to Realizing
Promises of Genomics & Big Data
Use a Strong Epidemiologic Foundation
Develop a Robust Knowledge Integration
Process
Use (and not avoid) Principles of Evidence-
based Medicine and Population Screening
Develop a Robust T2+ Research Agenda
(Learning Health systems, Consumer
Involvement etc..)
In Summary
“Big Data” is agnostic to disease causation
Numerous promises for health impact of genomics
& Big Data- Leading edge in genomics in Big Data
beginning to be applied
But numerous challenges face genomics & Big
Data. So we should not overpromise & under
deliver
A “Public Health” translational approach Is needed
to realize potential of genomics & Big Data