big data in biomedicine – an nih perspective

36
Big Data in Biomedicine – An NIH Perspective Philip E. Bourne Ph.D., FACMI Associate Director for Data Science National Institutes of Health [email protected] IEEE BIBM Nov 10 2015, Washington DC http://www.slideshare.net/pebourne

Upload: philip-bourne

Post on 23-Jan-2017

911 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Big Data in Biomedicine – An NIH Perspective

Big Data in Biomedicine – An NIH Perspective

Philip E. Bourne Ph.D., FACMIAssociate Director for Data Science

National Institutes of [email protected]

IEEE BIBM Nov 10 2015, Washington DC

http://www.slideshare.net/pebourne

Page 2: Big Data in Biomedicine – An NIH Perspective

Perspective

Structural bioinformatics researcher

Former custodian of the PDB

Obsessive about open science e.g., PLOS

NIH-wide responsibility for developments in data science – responding to the disruption

Page 3: Big Data in Biomedicine – An NIH Perspective

Bioinformatics 2015 31(1):146-50

Page 4: Big Data in Biomedicine – An NIH Perspective

Big Data in Biomedicine…

This speaks to something more fundamental that more data …

It speaks to new methodologies, new skills, new emphasis, new cultures,

new modes of discovery …

Page 5: Big Data in Biomedicine – An NIH Perspective

Consider this change from my own career experience ….

Page 6: Big Data in Biomedicine – An NIH Perspective

The History of Computational Biomedicine According to Bourne

1980s 1990s 2000s 2010s 2020

Discipline:

Unknown Expt. Driven Emergent Over-sold A Service A Partner A Driver

The Raw Material:

Non-existent Limited /Poor More/Ontologies Big Data/Siloed Open/Integrated

The People:

No name Technicians Industry recognition data scientists Academics

Searls (ed) The Roots in Bioinformatics Series PLOS Comp Biol

Page 7: Big Data in Biomedicine – An NIH Perspective

Premise:

We are entering a period of disruption in biomedical research and we should all be thinking about what this means

to bioinformatics & biomedicine

http://i1.wp.com/chisconsult.com/wp-content/uploads/2013/05/disruption-is-a-process.jpg http://cdn2.hubspot.net/hubfs/418817/disruption1.jpg

Page 8: Big Data in Biomedicine – An NIH Perspective

We are at a Point of Deception …

Evidence:– Google car– 3D printers– Waze– Robotics– Sensors

From: The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson & Andrew McAfee

Page 9: Big Data in Biomedicine – An NIH Perspective

Disruption: Example - Photography

DigitizationDeception

Disruption

Demonetization

Dematerialization

Democratization

Time

Vol

ume,

Vel

ocity

, Var

iety

Digital camera invented byKodak but shelved

Megapixels & quality improve slowly; Kodak slow to react

Film market collapses;Kodak goes bankrupt

Phones replacecameras

Instagram,Flickr become thevalue proposition

Digital media becomes bona fide form of communication

Page 10: Big Data in Biomedicine – An NIH Perspective

Disruption: Biomedical Research

Digitization of Basic & Clinical Research & EHR’s

Deception

We Are Here

Disruption

Demonetization

Dematerialization

Democratization

Open science

Patient centered health care

Page 11: Big Data in Biomedicine – An NIH Perspective

Disruptive Features: Sustainability

Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830

Page 12: Big Data in Biomedicine – An NIH Perspective

Disruptive Features:Reproducibility

Changing Value of Scholarship (?)

Page 13: Big Data in Biomedicine – An NIH Perspective

“And that’s why we’re here today. Because something called precision medicine … gives us one of the greatest opportunities for new medical breakthroughs that we have ever seen.”

President Barack ObamaJanuary 30, 2015

Disruptive Features – New Science

Page 14: Big Data in Biomedicine – An NIH Perspective

An Example of That Promise:Comorbidity Network for 6.2M Danes

Over 14.9 Years

Jensen et al 2014 Nat Comm 5:4022

Page 15: Big Data in Biomedicine – An NIH Perspective

What Are Some General Implications of Such a Future?

Open collaborative science becomes of increasing importance

The value of data and associated analytics becomes of increasing value to scholarship

Opportunities exist to improve the efficiency of the research enterprise and hence fund more research

Cooperation between funders will be needed to sustain the emergent digital enterprise

Current training content and modalities will not match supply to demand

Balancing accessibility vs security becomes more important yet more complex

Page 16: Big Data in Biomedicine – An NIH Perspective

How Should We Respond?

Funders: Encourage change and facilitate an orderly transition

Academic Leaders: Respond and facilitate a cultural shift

Developers: Develop working environments that are more adaptive and capable of answers questions in a more efficient and hopefully accurate way

Users: Use the above environments Publishers: Move beyond papers

Page 17: Big Data in Biomedicine – An NIH Perspective

Take an Example That is Central to What We Do

Molecular Graphics

Is It Optimal for Today’s Science?

http://upload.wikimedia.org/wikipedia/commons/2/2e/Molecular-Graphics-GRIP-75-Console.jpg

Page 18: Big Data in Biomedicine – An NIH Perspective

Good News/Bad News

Good News:– It is harder to think of a

more powerful way to comprehend complex data

– It has excited generations to the promise of science

– It has adapted to changing technologies

Bad News:– It is not an

adaptive/extensible environment

– It is not a collaborative environment

– It is not an integrative environment

– It is the curse of the ribbon

BMC Bioinformatics 2005, 6:21

Page 19: Big Data in Biomedicine – An NIH Perspective

1. A link brings up figures from the paper

0. Full text of PLoS papers stored in a database

2. Clicking the paper figure retrievesdata from the PDB which is

analyzed

3. A composite view ofjournal and database

content results

Is a database really different than a biological journal?

PloS Comp Biol 2005 1(3) e34

4. The composite view haslinks to pertinent blocks

of literature text and back to the PDB

1.

2.

3.

4.

The Knowledge and Data Cycle

Page 20: Big Data in Biomedicine – An NIH Perspective

Take Another Example:The Raw Material of Structural

Bioinformatics

Is this the optimal starting point anymore?

Page 21: Big Data in Biomedicine – An NIH Perspective

Do data resources including the PDB best serve the needs of the user at

this point?

Page 22: Big Data in Biomedicine – An NIH Perspective

Good News/Bad News for the PDB in this Changing Landscape

Bad News:

– Interface complex and uni-data oriented

– Data accessible; methods accessible (sort of); but not together

– Significant redundancy in services offered

Good News:

– Annotation!

– Demand is increasing

– Integrated with other data types

– Restful services

Page 23: Big Data in Biomedicine – An NIH Perspective

General Problem Statement:

How to insure a high quality annotated data source that provides

the optimal environment for accessibility, integration and analysis

by a broad community of diverse users?

Page 24: Big Data in Biomedicine – An NIH Perspective

The Commons is a shared virtual space which is FAIR:

– Find

– Access (use effectively)

– Interoperate

– Reuse

An environment to find and catalyze the use of shared digital research objects

The CommonsConcept

Page 25: Big Data in Biomedicine – An NIH Perspective

The Developer or User Defines the Environment from the Appropriate

Building Blocks

Page 26: Big Data in Biomedicine – An NIH Perspective

Public Beacons

Host Content

AMPLab 1000 Genomes Project

Broad Institute ExAC

Curoverse PGP, GA4GH Example Data

EBI 1000 Genomes Project, UK10K, GoNL, EVS, GEUVADIS, UMCG Cardio GenePanel

Google 1000 Genomes Project, Phase III, Illumina Platinum Genomes

ISB Known VARiants

NCBI NHLBI Exome Sequence Project

OICR 55 cancer datasets

SolveBio 56 public datasets

UCSC ClinVar, LOVD, UniProt

University of Leicester Cafe CardioKit, Cafe Variome Central

WTSI IBD, Native American, Egyptian, UK10K

Over ?? public datasets beaconized across 21 institutions

10s thousands of individuals

Page 27: Big Data in Biomedicine – An NIH Perspective
Page 28: Big Data in Biomedicine – An NIH Perspective

The CommonsComponents

Computing environment – cloud or HPC (High Performance Computing)

– supports access, utilization, sharing and storage of digital objects.

Methods for Interoperability– enables connectivity, shareability and interoperability

between digital objects.

Digital object compliance model – describes the properties of digital objects that

enables them to be discoverable and shareable.

Page 29: Big Data in Biomedicine – An NIH Perspective

The CommonsComponents

Page 30: Big Data in Biomedicine – An NIH Perspective

BD2KCenter

BD2KCenter

BD2KCenter

BD2KCenter

BD2KCenter

BD2KCenter

DDICC

Software

Standards

Infrastructure - The Commons

Labs

Labs

Labs

Labs

Page 31: Big Data in Biomedicine – An NIH Perspective

The ability to store and share and compute on digital research objects Especially useful for large data sets that

are not easily computed locally

Scalable and Elastic

Pay per use - Cost effective

An environment that fosters collaboration

The CommonsComputing Environment: Cloud

Page 32: Big Data in Biomedicine – An NIH Perspective

Commons - Pilots

The Cloud Credits - business model

BD2K Centers

MODs (Model Organism Databases)

HMP Data and tools available in the cloud

NCI Cloud Pilots & Genomic Data Commons

Page 33: Big Data in Biomedicine – An NIH Perspective

The PDB in the Commons

Components:– Annotated collection of data files– API’s to access these data files– Example methods using these APIs

Potential outcomes– Nothing happens?– A new breed of developer starts to use PDB data in new

ways ?– The casual user has a broader set of services that

previously?– Quality declines/increases?

Page 34: Big Data in Biomedicine – An NIH Perspective

I not only use all the brains I have, but all I can borrow.

– Woodrow Wilson

Page 35: Big Data in Biomedicine – An NIH Perspective

ADDS Team

BD2K Representatives

Page 36: Big Data in Biomedicine – An NIH Perspective

NIHNIH……Turning Discovery Into HealthTurning Discovery Into Health

[email protected]://datascience.nih.gov/

http://www.ncbi.nlm.nih.gov/research/staff/bourne/