converged it summit - nci data sharing
TRANSCRIPT
Cancer Moonshot,Data Sharing,
Genomic Data Commons
October 26th, 2016
Warren Kibbe, PhD
@wakibbe
2
To develop the knowledge base that will lessen the burden of
cancer in the United States and around the world.
NCI Mission
3
Cancer Statistics
In 2015 there will be an estimated
1,700,000 new cancer cases and
600,000 cancer deaths- American Cancer Society 2015
Cancer remains the second most common cause of death in the U.S.
- Centers for Disease Control and Prevention 2015
4
Understanding Cancer
Precision medicine will lead to fundamental understanding of the complex interplay between genetics, epigenetics, nutrition, environment and clinical presentation and direct effective, evidence-based prevention and treatment.
Cancer Moonshot: Genomic Data Commons, Data sharing, Ecosystem
The era of precision oncology is predicated on the integration of research, care, and molecular medicine and the availability of data for modeling, risk analysis, and optimal care
The Genomic Data Commons is a completely new way of making
scientific data available to the cancer research community
6
Cancer Data Sharing & Data Commons
• Making data available for discovery, validation, new therapies
• Working toward a National Learning Health System for Cancer
• Maximizing the impact, reuse, and reproducibility of cancer research
• We need to change the incentives for data sharing
Reduce the risk, improve early detection, outcomes and survivorship in cancer
7
FAIR –
Making data Findable,
Accessible,Attributable,
Interoperable,Reusable,
and provide Recognition
Force11 white paperhttps://www.force11.org/group/fairgroup/fairprinciples
8
NIH Genomic Data Sharing Policy
https://gds.nih.gov/ Went into effect January 25, 2015
NCI guidance:http://
www.cancer.gov/grants-training/grants-management/nci-policies/genomic-data
Requires public sharing of genomic data sets
9
Changing the conversation around data sharing
How do we find data, software, standards? How can we make data, annotations, software, metadata accessible? How do we reuse data standards How do we make more data machine readable?
NIH Data CommonsNCI Genomic Data Commons
Data commons co-locate data, storage and computing infrastructure, and commonly used tools for analyzing and sharing data to create an
interoperable resource for the research community.
*Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson, A Case for Data Commons Towards Data Science as a Service, to appear. Source of image: Interior of one of Google’s Data Center, www.google.com/about/datacenters/.
Cancer Research Data Ecosystem – Cancer Moonshot BRP
Well characterized research data
sets
Cancer cohorts Patient data
EHR, Lab Data, Imaging, PROs, Smart Devices,
Decision Support
Learning from everycancer patient
Active researchparticipation
Research informationdonor
Clinical ResearchObservational studies
ProteogenomicsImaging dataClinical trials
Discovery Patient engaged Research
SurveillanceBig Data
Implementation research
SEERGDC
11
Cancer Research Data Commons Ecosystem
GenomicData Commons
Data StandardsValidation and Harmonization
ImagingData Commons
ProteomicsData Commons
Clinical Data Commons
(Cohorts / Indiv.)
SEER(Populations)
Data Contributors and Consumers
Researchers PatientsClinicians
Institutions
12
13
Digital technology is changing rapidly
14
Cancer therapy is changing
15
Application of Cancer Genomics is changing
16
Precision OncologyTrials Launched 2014:MPACTLung MAPALCHEMISTExceptional Responders
2015:NCI-MATCH
2017:Pediatric-MATCH
NCI-MATCH: Features [Molecular Analysis for Therapy Choice]
Foundational treatment/discovery trial; assigns therapy based on molecular abnormalities, not site of tumor origin for patients without available standard therapy
Regulatory umbrella for phase II drugs/studies from > 20 companies; single agents or combinations
Available nationwide (2400 sites) Accrual began mid-August 2015 Latest match-rate is 25% with 24 open
arms Biopsy results in 13 days, average from
June-Sept 2016
17
Vice President’s Cancer Moonshot
How do we enable meaningful, patient-centered and patient-level
data sharing for cancer and promote access to clinical trials for
all Americans?
18
Cancer Moonshot Outline• Genomic Data Commons June 6, 2016
• Vice President’s Cancer Moonshot Summit – June 29, 2016
• Rethinking Clinical Trial Search- Development of Application Programming Interface (API) to NCI’s Clinical Trials
Reporting Program, for use by:- NCI’s Cancer.gov website- Third party innovators providing clinical trial content to their communities
• Blue Ribbon Panel recommendations – accepted by the National Cancer Advisory Board on September 7th, 2016
• Cancer Moonshot Task Force and BRP recommendations sent to President on October 17th, 2016 https://www.cancer.gov/research/key-initiatives/moonshot-cancer-initiative/milestones
• https://cancer.gov/brp
19
https://cancer.gov/brp
Blue Ribbon Panel Recommendations
Network for Direct Patient Engagement Cancer Immunotherapy Translational Science Network Therapeutic Target Identification to Overcome Drug Resistance A National Cancer Data Ecosystem for Sharing and Analysis Fusion Oncoproteins in Childhood Cancers Symptom Management Research Prevention and Early Detection – Implementation of Evidence-based Approaches Retrospective Analysis of Biospecimens from Patients Treated with Standard of Care Generation of 4D Human Tumor Atlas Development of New Enabling Cancer Technologies
21
The Cancer Genomic Data Commons (GDC) is an existing effort to standardize and simplify submission of genomic data to NCI and follow the principles of FAIR – Findable, Accessible, Attributable, Interoperable, Reusable, and Provide Recognition.
The GDC is part of the NIH Big Data to Knowledge (BD2K) initiative and an example of the NIH Commons
Genomic Data Commons
Microattribution, nanopublications, tracking the use of data, annotation of data, use of algorithms, supports
the data /software /metadata life cycle to provide credit and analyze impact of data, software, analytics,
algorithm, curation and knowledge sharing
Force11 white paperhttps://www.force11.org/group/fairgroup/fairprinciples
NCI Genomic Data Commons The GDC went live on June 6, 2016 with approximately 4.1 PB of data. This includes: 2.6 PB of legacy data; and 1.5 PB of “harmonized” data. 577,878 files about 14194 cases (patients), in 42 cancer types, across 29
primary sites. 10 major data types, ranging from Raw Sequencing Data, Raw Microarray
Data, to Copy Number Variation, Simple Nucleotide Variation and Gene Expression.
Data are derived from 17 different experimental strategies, with the major ones being RNA-Seq, WXS, WGS, miRNA-Seq, Genotyping Array and Expression Array.
Foundation Medicine announced the release of 18,000 genomic profiles to the GDC at the Cancer Moonshot Summit.
24
25
Genomic Data Commons Data Portalhttps://gdc.cancer.gov
The NCI Genomic Data Commons User InterfaceHome Page
The NCI Genomic Data Commons User InterfaceSample Browser
29
Clinical data Biospecimen data
Molecular data Files uploaded
The NCI Genomic Data Commons User InterfaceData Submission Dashboard
Development of the NCI Genomic Data Commons (GDC)To Foster the Molecular Diagnosis and Treatment of Cancer
GDC
Bob Grossman PIUniv. of Chicago
Ontario Inst. Cancer Res.Leidos
Institute of MedicineTowards Precision Medicine
2011
The Mutational Burden of Human Cancer
Mike Lawrence and Gaddy Getz, Broad Institute: Nature 2014
Increasing genomiccomplexity
Childhoodcancers
Known Carcinogens
32
Examplepublication
from TCGA
Discovery of Cancer Drivers With 2% Prevalence
Lung adeno.+ 2,900
Colorectal+ 1,200
Ovarian+ 500
Lawrence et al, Nature 2014
Power Calculation for Cancer Driver Discovery
Need to resequence >100,000 tumors to identify all cancer drivers at >2% prevalence
GDC Infrastructure and Functionality
DataSubmitters
OpenAccess Users
ControlledAccessUsers
eRA Commons & dbGaP
Open Access Data
Metadata+Data Storage
Reporting System
Harmonization
GDC Users GDC System Components
Data Submission
Data Security System
APIsDigital ID System
Controlled Access Data
Exome-seq
Whole genome-seq
RNA-seq
Copy number
Genomealignment
Genomealignment
Genomealignment
Datasegmentation
1° processing
Mutations
Mutations +structural variants
Digital geneexpression
Copy numbercalls
2° processingOncogene vs.
Tumor suppressor
Translocations
Relative RNA levelsAlternative splicing
Gene amplification/ deletion
3° processing
GDC Data HarmonizationMultiple data types and levels of processing
Mutect2pipeline
GDC Data HarmonizationOpen Source, Dockerized Pipelines
Recoveryrate
(% true positives) A0F0
SomaticSniper 81.1% 76.5%VarScan 93.9% 84.3%
MuSE 93.1% 87.3%All Three 96.4% 91.2%
GDC variant callingpipelinesWash UBaylorBroad
GDC Data HarmonizationMultiple pipelines needed to recover all variants
GDC Content TCGA
11,353 cases TARGET 3,178 cases
Current
Foundation Medicine 18,000 cases Cancer studies in dbGAP ~4,000 cases
Coming soon
NCI-MATCH ~5,000 cases Clinical Trial Sequencing Program ~3,000 cases
Planned (1-3 years)
Cancer Driver Discovery Program ~5,000 cases Human Cancer Model Initiative ~1,000 cases APOLLO – VA-DoD ~8,000
cases
~58,000 cases
Towards a Cancer Knowledge System Continue genomic investigations of cancer
• Need > 100,000 cases analyzed• Embrace all genomic platforms• Relationship of relapse and primary biopsies
Incorporate associated clinical annotations• Clinical trial data• Observational, longitudinal standard-of-care data• N-of-1 clinical data
Promote and curate biological investigations of cancer genetic variants
• Driver vs. passenger mutations• Multiple phenotypic assays• Alterations in regulatory pathways – proteomics• Mechanisms of therapeutic resistance• Functional genomic investigations
Integrative models for high-dimensional data
GenomicData
Commons
GDC
Utility of a Cancer Knowledge System
Identifylow-frequencycancer drivers
Define genomicdeterminants of response
to therapy
Compose clinical trialcohorts sharing
targeted genetic lesions
Cancerinformation
donors
GenomicData
Commons
43
Support the Precision Medicine Initiative
• Expand data model to include other data (e.g. imaging and proteomics)
• Allow easy publication of persistent links to data, annotations, algorithms, tools, workflows
• Measure usage and impact
• Change incentives for public contributions
The Genomic Data Commons and Cloud Pilots
44
PMI – Oncology, the GDC and the Cloud Pilots Goals
Support precision medicine-focused clinical research Enable researchers to deposit well-annotated
(Interoperable) genomic data sets with the GDC Provide a single source (and single dbGaP access
request!) to Find and Access these data Enable effective analysis and meta-analysis of these data
without requiring local downloads – data Reuse Understand Contributions, Assess value through usage,
and give Attribution to all users
45
PMI – Oncology, the GDC and the Cloud Pilots Goals
Provide a data integration platform to allow multiple data types, multi-scalar data, temporal data from cancer models and patients through open APIs Work with the Global Alliance for Genomics and Health
(GA4GH) to define the next generation of secure, flexible, meaningful, interoperable, lightweight interfaces – open APIs
Engage the cancer research community in evaluating the open APIs for ease of use and effectiveness
Cancer Research Data Ecosystem – Cancer Moonshot BRP
Well characterized research data
sets
Cancer cohorts Patient data
EHR, Lab Data, Imaging, PROs, Smart Devices,
Decision Support
Learning from everycancer patient
Active researchparticipation
Research informationdonor
Clinical ResearchObservational studies
ProteogenomicsImaging dataClinical trials
Discovery Patient engaged Research
SurveillanceBig Data
Implementation research
SEERGDC
GDC AcknowledgementsNCI Center for Cancer Genomics Univ. of Chicago
Bob GrossmanAllison Heath
Mike FordZhenyu Zhang
Ontario Institute for Cancer Research
Lou StaudtZhining Wang
Martin FergusonJC Zenklusen
Daniela GerhardDeb Steverson
Vincent Ferretti'Francois Gerthoffert
JunJun Zhang
Leidos Biomedical ResearchMark Jensen
Sharon GaheenHimanso Sahni
NCI NCI CBIITTony KerlavageTanya Davidsen
CGC Pilot Team Principal Investigators • Gad Getz, Ph.D - Broad Institute - http://firecloud.org • Ilya Shmulevich, Ph.D - ISB - http://cgc.systemsbiology.net/ • Deniz Kural, Ph.D - Seven Bridges – http://www.cancergenomicscloud.org
NCI Project Officer & CORs• Anthony Kerlavage, Ph.D –Project Officer• Juli Klemm, Ph.D – COR, Broad Institute• Tanja Davidsen, Ph.D – COR, Institute for Systems Biology • Ishwar Chandramouliswaran, MS, MBA – COR, Seven Bridges Genomics
GDC Principal Investigator• Robert Grossman, Ph.D - University of Chicago• Allison Heath, Ph.D - University of Chicago• Vincent Ferretti, Ph.D - Ontario Institute for Cancer Research
Cancer Genomics Project Teams
NCI Leadership Team• Doug Lowy, M.D.• Lou Staudt, M.D., Ph.D.• Stephen Chanock, M.D.• George Komatsoulis, Ph.D.• Warren Kibbe, Ph.D.
Center for Cancer Genomics Partners• JC Zenklusen, Ph.D.• Daniela Gerhard, Ph.D.• Zhining Wang, Ph.D.• Liming Yang, Ph.D.• Martin Ferguson, Ph.D.
49
Rethinking Clinical Trials SearchThanks to: PIFs, various NCI teams (CBIIT, CCCT, OCPL)
50
Improving Cancer Clinical Trials Search
Engage Presidential Innovation Fellows Create an Application Programming Interface (API) for clinical trials Engage with third-party innovators to guide the API development
process Utilize the API on NCI’s website, cancer.gov
*An API is a set of protocols that provides communication between a software application and a computer operating system or between applications
51
The API (https://clinicaltrialsapi.cancer.gov/v1/) makes publicly available trial registration information from NCI’s clinical trial data base (Clinical Trial Reporting Program, CTRP) available to third-party innovators Enables innovators to build new tools tailored to the clinical trial search need
of their users Expands opportunities for patients and doctors to locate cancer trials
Cancer.gov now utilizes the API (http://trials.cancer.gov) , enhancing clinical trial searching on the NCI website
The API source code is publicly available on GitHub (https://github.com/presidential-innovation-fellows/clinical-trials-search)
Cancer Clinical Trials API
52
www.cancer.gov www.cancer.gov/espanol