converged it summit - nci data sharing

55
Cancer Moonshot, Data Sharing, Genomic Data Commons October 26 th , 2016 Warren Kibbe, PhD [email protected] @wakibbe

Upload: warren-kibbe

Post on 15-Apr-2017

207 views

Category:

Health & Medicine


1 download

TRANSCRIPT

Page 1: Converged IT Summit - NCI Data Sharing

Cancer Moonshot,Data Sharing,

Genomic Data Commons

October 26th, 2016

Warren Kibbe, PhD

[email protected]

@wakibbe

Page 2: Converged IT Summit - NCI Data Sharing

2

To develop the knowledge base that will lessen the burden of

cancer in the United States and around the world.

NCI Mission

Page 3: Converged IT Summit - NCI Data Sharing

3

Cancer Statistics

In 2015 there will be an estimated

1,700,000 new cancer cases and

600,000 cancer deaths- American Cancer Society 2015

Cancer remains the second most common cause of death in the U.S.

- Centers for Disease Control and Prevention 2015

Page 4: Converged IT Summit - NCI Data Sharing

4

Understanding Cancer

Precision medicine will lead to fundamental understanding of the complex interplay between genetics, epigenetics, nutrition, environment and clinical presentation and direct effective, evidence-based prevention and treatment.

Page 5: Converged IT Summit - NCI Data Sharing

Cancer Moonshot: Genomic Data Commons, Data sharing, Ecosystem

The era of precision oncology is predicated on the integration of research, care, and molecular medicine and the availability of data for modeling, risk analysis, and optimal care

The Genomic Data Commons is a completely new way of making

scientific data available to the cancer research community

Page 6: Converged IT Summit - NCI Data Sharing

6

Cancer Data Sharing & Data Commons

• Making data available for discovery, validation, new therapies

• Working toward a National Learning Health System for Cancer

• Maximizing the impact, reuse, and reproducibility of cancer research

• We need to change the incentives for data sharing

Reduce the risk, improve early detection, outcomes and survivorship in cancer

Page 7: Converged IT Summit - NCI Data Sharing

7

FAIR –

Making data Findable,

Accessible,Attributable,

Interoperable,Reusable,

and provide Recognition

Force11 white paperhttps://www.force11.org/group/fairgroup/fairprinciples

Page 8: Converged IT Summit - NCI Data Sharing

8

NIH Genomic Data Sharing Policy

https://gds.nih.gov/ Went into effect January 25, 2015

NCI guidance:http://

www.cancer.gov/grants-training/grants-management/nci-policies/genomic-data

Requires public sharing of genomic data sets

Page 9: Converged IT Summit - NCI Data Sharing

9

Changing the conversation around data sharing

How do we find data, software, standards? How can we make data, annotations, software, metadata accessible? How do we reuse data standards How do we make more data machine readable?

NIH Data CommonsNCI Genomic Data Commons

Data commons co-locate data, storage and computing infrastructure, and commonly used tools for analyzing and sharing data to create an

interoperable resource for the research community.

*Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson, A Case for Data Commons Towards Data Science as a Service, to appear. Source of image: Interior of one of Google’s Data Center, www.google.com/about/datacenters/.

Page 10: Converged IT Summit - NCI Data Sharing

Cancer Research Data Ecosystem – Cancer Moonshot BRP

Well characterized research data

sets

Cancer cohorts Patient data

EHR, Lab Data, Imaging, PROs, Smart Devices,

Decision Support

Learning from everycancer patient

Active researchparticipation

Research informationdonor

Clinical ResearchObservational studies

ProteogenomicsImaging dataClinical trials

Discovery Patient engaged Research

SurveillanceBig Data

Implementation research

SEERGDC

Page 11: Converged IT Summit - NCI Data Sharing

11

Cancer Research Data Commons Ecosystem

GenomicData Commons

Data StandardsValidation and Harmonization

ImagingData Commons

ProteomicsData Commons

Clinical Data Commons

(Cohorts / Indiv.)

SEER(Populations)

Data Contributors and Consumers

Researchers PatientsClinicians

Institutions

Page 12: Converged IT Summit - NCI Data Sharing

12

Page 13: Converged IT Summit - NCI Data Sharing

13

Digital technology is changing rapidly

Page 14: Converged IT Summit - NCI Data Sharing

14

Cancer therapy is changing

Page 15: Converged IT Summit - NCI Data Sharing

15

Application of Cancer Genomics is changing

Page 16: Converged IT Summit - NCI Data Sharing

16

Precision OncologyTrials Launched 2014:MPACTLung MAPALCHEMISTExceptional Responders

2015:NCI-MATCH

2017:Pediatric-MATCH

NCI-MATCH: Features [Molecular Analysis for Therapy Choice]

Foundational treatment/discovery trial; assigns therapy based on molecular abnormalities, not site of tumor origin for patients without available standard therapy

Regulatory umbrella for phase II drugs/studies from > 20 companies; single agents or combinations

Available nationwide (2400 sites) Accrual began mid-August 2015 Latest match-rate is 25% with 24 open

arms Biopsy results in 13 days, average from

June-Sept 2016

Page 17: Converged IT Summit - NCI Data Sharing

17

Vice President’s Cancer Moonshot

How do we enable meaningful, patient-centered and patient-level

data sharing for cancer and promote access to clinical trials for

all Americans?

Page 18: Converged IT Summit - NCI Data Sharing

18

Cancer Moonshot Outline• Genomic Data Commons June 6, 2016

• Vice President’s Cancer Moonshot Summit – June 29, 2016

• Rethinking Clinical Trial Search- Development of Application Programming Interface (API) to NCI’s Clinical Trials

Reporting Program, for use by:- NCI’s Cancer.gov website- Third party innovators providing clinical trial content to their communities

• Blue Ribbon Panel recommendations – accepted by the National Cancer Advisory Board on September 7th, 2016

• Cancer Moonshot Task Force and BRP recommendations sent to President on October 17th, 2016 https://www.cancer.gov/research/key-initiatives/moonshot-cancer-initiative/milestones

• https://cancer.gov/brp

Page 19: Converged IT Summit - NCI Data Sharing

19

https://cancer.gov/brp

Page 20: Converged IT Summit - NCI Data Sharing

Blue Ribbon Panel Recommendations

Network for Direct Patient Engagement Cancer Immunotherapy Translational Science Network Therapeutic Target Identification to Overcome Drug Resistance A National Cancer Data Ecosystem for Sharing and Analysis Fusion Oncoproteins in Childhood Cancers Symptom Management Research Prevention and Early Detection – Implementation of Evidence-based Approaches Retrospective Analysis of Biospecimens from Patients Treated with Standard of Care Generation of 4D Human Tumor Atlas Development of New Enabling Cancer Technologies

Page 21: Converged IT Summit - NCI Data Sharing

21

The Cancer Genomic Data Commons (GDC) is an existing effort to standardize and simplify submission of genomic data to NCI and follow the principles of FAIR – Findable, Accessible, Attributable, Interoperable, Reusable, and Provide Recognition.

The GDC is part of the NIH Big Data to Knowledge (BD2K) initiative and an example of the NIH Commons

Genomic Data Commons

Microattribution, nanopublications, tracking the use of data, annotation of data, use of algorithms, supports

the data /software /metadata life cycle to provide credit and analyze impact of data, software, analytics,

algorithm, curation and knowledge sharing

Force11 white paperhttps://www.force11.org/group/fairgroup/fairprinciples

Page 22: Converged IT Summit - NCI Data Sharing

NCI Genomic Data Commons The GDC went live on June 6, 2016 with approximately 4.1 PB of data. This includes: 2.6 PB of legacy data; and 1.5 PB of “harmonized” data. 577,878 files about 14194 cases (patients), in 42 cancer types, across 29

primary sites. 10 major data types, ranging from Raw Sequencing Data, Raw Microarray

Data, to Copy Number Variation, Simple Nucleotide Variation and Gene Expression.

Data are derived from 17 different experimental strategies, with the major ones being RNA-Seq, WXS, WGS, miRNA-Seq, Genotyping Array and Expression Array.

Foundation Medicine announced the release of 18,000 genomic profiles to the GDC at the Cancer Moonshot Summit.

Page 23: Converged IT Summit - NCI Data Sharing
Page 24: Converged IT Summit - NCI Data Sharing

24

Page 25: Converged IT Summit - NCI Data Sharing

25

Page 26: Converged IT Summit - NCI Data Sharing

Genomic Data Commons Data Portalhttps://gdc.cancer.gov

Page 27: Converged IT Summit - NCI Data Sharing

The NCI Genomic Data Commons User InterfaceHome Page

Page 28: Converged IT Summit - NCI Data Sharing

The NCI Genomic Data Commons User InterfaceSample Browser

Page 29: Converged IT Summit - NCI Data Sharing

29

Clinical data Biospecimen data

Molecular data Files uploaded

The NCI Genomic Data Commons User InterfaceData Submission Dashboard

Page 30: Converged IT Summit - NCI Data Sharing

Development of the NCI Genomic Data Commons (GDC)To Foster the Molecular Diagnosis and Treatment of Cancer

GDC

Bob Grossman PIUniv. of Chicago

Ontario Inst. Cancer Res.Leidos

Institute of MedicineTowards Precision Medicine

2011

Page 31: Converged IT Summit - NCI Data Sharing

The Mutational Burden of Human Cancer

Mike Lawrence and Gaddy Getz, Broad Institute: Nature 2014

Increasing genomiccomplexity

Childhoodcancers

Known Carcinogens

Page 32: Converged IT Summit - NCI Data Sharing

32

Examplepublication

from TCGA

Page 33: Converged IT Summit - NCI Data Sharing
Page 34: Converged IT Summit - NCI Data Sharing
Page 35: Converged IT Summit - NCI Data Sharing

Discovery of Cancer Drivers With 2% Prevalence

Lung adeno.+ 2,900

Colorectal+ 1,200

Ovarian+ 500

Lawrence et al, Nature 2014

Power Calculation for Cancer Driver Discovery

Need to resequence >100,000 tumors to identify all cancer drivers at >2% prevalence

Page 36: Converged IT Summit - NCI Data Sharing

GDC Infrastructure and Functionality

DataSubmitters

OpenAccess Users

ControlledAccessUsers

eRA Commons & dbGaP

Open Access Data

Metadata+Data Storage

Reporting System

Harmonization

GDC Users GDC System Components

Data Submission

Data Security System

APIsDigital ID System

Controlled Access Data

Page 37: Converged IT Summit - NCI Data Sharing

Exome-seq

Whole genome-seq

RNA-seq

Copy number

Genomealignment

Genomealignment

Genomealignment

Datasegmentation

1° processing

Mutations

Mutations +structural variants

Digital geneexpression

Copy numbercalls

2° processingOncogene vs.

Tumor suppressor

Translocations

Relative RNA levelsAlternative splicing

Gene amplification/ deletion

3° processing

GDC Data HarmonizationMultiple data types and levels of processing

Page 38: Converged IT Summit - NCI Data Sharing

Mutect2pipeline

GDC Data HarmonizationOpen Source, Dockerized Pipelines

Page 39: Converged IT Summit - NCI Data Sharing

Recoveryrate

(% true positives) A0F0

SomaticSniper 81.1% 76.5%VarScan 93.9% 84.3%

MuSE 93.1% 87.3%All Three 96.4% 91.2%

GDC variant callingpipelinesWash UBaylorBroad

GDC Data HarmonizationMultiple pipelines needed to recover all variants

Page 40: Converged IT Summit - NCI Data Sharing

GDC Content TCGA

11,353 cases TARGET 3,178 cases

Current

Foundation Medicine 18,000 cases Cancer studies in dbGAP ~4,000 cases

Coming soon

NCI-MATCH ~5,000 cases Clinical Trial Sequencing Program ~3,000 cases

Planned (1-3 years)

Cancer Driver Discovery Program ~5,000 cases Human Cancer Model Initiative ~1,000 cases APOLLO – VA-DoD ~8,000

cases

~58,000 cases

Page 41: Converged IT Summit - NCI Data Sharing

Towards a Cancer Knowledge System Continue genomic investigations of cancer

• Need > 100,000 cases analyzed• Embrace all genomic platforms• Relationship of relapse and primary biopsies

Incorporate associated clinical annotations• Clinical trial data• Observational, longitudinal standard-of-care data• N-of-1 clinical data

Promote and curate biological investigations of cancer genetic variants

• Driver vs. passenger mutations• Multiple phenotypic assays• Alterations in regulatory pathways – proteomics• Mechanisms of therapeutic resistance• Functional genomic investigations

Integrative models for high-dimensional data

GenomicData

Commons

Page 42: Converged IT Summit - NCI Data Sharing

GDC

Utility of a Cancer Knowledge System

Identifylow-frequencycancer drivers

Define genomicdeterminants of response

to therapy

Compose clinical trialcohorts sharing

targeted genetic lesions

Cancerinformation

donors

GenomicData

Commons

Page 43: Converged IT Summit - NCI Data Sharing

43

Support the Precision Medicine Initiative

• Expand data model to include other data (e.g. imaging and proteomics)

• Allow easy publication of persistent links to data, annotations, algorithms, tools, workflows

• Measure usage and impact

• Change incentives for public contributions

The Genomic Data Commons and Cloud Pilots

Page 44: Converged IT Summit - NCI Data Sharing

44

PMI – Oncology, the GDC and the Cloud Pilots Goals

Support precision medicine-focused clinical research Enable researchers to deposit well-annotated

(Interoperable) genomic data sets with the GDC Provide a single source (and single dbGaP access

request!) to Find and Access these data Enable effective analysis and meta-analysis of these data

without requiring local downloads – data Reuse Understand Contributions, Assess value through usage,

and give Attribution to all users

Page 45: Converged IT Summit - NCI Data Sharing

45

PMI – Oncology, the GDC and the Cloud Pilots Goals

Provide a data integration platform to allow multiple data types, multi-scalar data, temporal data from cancer models and patients through open APIs Work with the Global Alliance for Genomics and Health

(GA4GH) to define the next generation of secure, flexible, meaningful, interoperable, lightweight interfaces – open APIs

Engage the cancer research community in evaluating the open APIs for ease of use and effectiveness

Page 46: Converged IT Summit - NCI Data Sharing

Cancer Research Data Ecosystem – Cancer Moonshot BRP

Well characterized research data

sets

Cancer cohorts Patient data

EHR, Lab Data, Imaging, PROs, Smart Devices,

Decision Support

Learning from everycancer patient

Active researchparticipation

Research informationdonor

Clinical ResearchObservational studies

ProteogenomicsImaging dataClinical trials

Discovery Patient engaged Research

SurveillanceBig Data

Implementation research

SEERGDC

Page 47: Converged IT Summit - NCI Data Sharing

GDC AcknowledgementsNCI Center for Cancer Genomics Univ. of Chicago

Bob GrossmanAllison Heath

Mike FordZhenyu Zhang

Ontario Institute for Cancer Research

Lou StaudtZhining Wang

Martin FergusonJC Zenklusen

Daniela GerhardDeb Steverson

Vincent Ferretti'Francois Gerthoffert

JunJun Zhang

Leidos Biomedical ResearchMark Jensen

Sharon GaheenHimanso Sahni

NCI NCI CBIITTony KerlavageTanya Davidsen

Page 48: Converged IT Summit - NCI Data Sharing

CGC Pilot Team Principal Investigators • Gad Getz, Ph.D - Broad Institute - http://firecloud.org • Ilya Shmulevich, Ph.D - ISB - http://cgc.systemsbiology.net/ • Deniz Kural, Ph.D - Seven Bridges – http://www.cancergenomicscloud.org

NCI Project Officer & CORs• Anthony Kerlavage, Ph.D –Project Officer• Juli Klemm, Ph.D – COR, Broad Institute• Tanja Davidsen, Ph.D – COR, Institute for Systems Biology • Ishwar Chandramouliswaran, MS, MBA – COR, Seven Bridges Genomics

GDC Principal Investigator• Robert Grossman, Ph.D - University of Chicago• Allison Heath, Ph.D - University of Chicago• Vincent Ferretti, Ph.D - Ontario Institute for Cancer Research

Cancer Genomics Project Teams

NCI Leadership Team• Doug Lowy, M.D.• Lou Staudt, M.D., Ph.D.• Stephen Chanock, M.D.• George Komatsoulis, Ph.D.• Warren Kibbe, Ph.D.

Center for Cancer Genomics Partners• JC Zenklusen, Ph.D.• Daniela Gerhard, Ph.D.• Zhining Wang, Ph.D.• Liming Yang, Ph.D.• Martin Ferguson, Ph.D.

Page 49: Converged IT Summit - NCI Data Sharing

49

Rethinking Clinical Trials SearchThanks to: PIFs, various NCI teams (CBIIT, CCCT, OCPL)

Page 50: Converged IT Summit - NCI Data Sharing

50

Improving Cancer Clinical Trials Search

Engage Presidential Innovation Fellows Create an Application Programming Interface (API) for clinical trials Engage with third-party innovators to guide the API development

process Utilize the API on NCI’s website, cancer.gov

*An API is a set of protocols that provides communication between a software application and a computer operating system or between applications

Page 51: Converged IT Summit - NCI Data Sharing

51

The API (https://clinicaltrialsapi.cancer.gov/v1/) makes publicly available trial registration information from NCI’s clinical trial data base (Clinical Trial Reporting Program, CTRP) available to third-party innovators Enables innovators to build new tools tailored to the clinical trial search need

of their users Expands opportunities for patients and doctors to locate cancer trials

Cancer.gov now utilizes the API (http://trials.cancer.gov) , enhancing clinical trial searching on the NCI website

The API source code is publicly available on GitHub (https://github.com/presidential-innovation-fellows/clinical-trials-search)

Cancer Clinical Trials API

Page 52: Converged IT Summit - NCI Data Sharing

52

Page 53: Converged IT Summit - NCI Data Sharing
Page 54: Converged IT Summit - NCI Data Sharing

54

Questions?

Warren Kibbe, Ph.D.

[email protected]

@wakibbe

Page 55: Converged IT Summit - NCI Data Sharing

www.cancer.gov www.cancer.gov/espanol