big data in biomedicine – an nih perspective
TRANSCRIPT
Big Data in Biomedicine – An NIH Perspective
Philip E. Bourne Ph.D., FACMIAssociate Director for Data Science
National Institutes of [email protected]
IEEE BIBM Nov 10 2015, Washington DC
http://www.slideshare.net/pebourne
Perspective
Structural bioinformatics researcher
Former custodian of the PDB
Obsessive about open science e.g., PLOS
NIH-wide responsibility for developments in data science – responding to the disruption
Bioinformatics 2015 31(1):146-50
Big Data in Biomedicine…
This speaks to something more fundamental that more data …
It speaks to new methodologies, new skills, new emphasis, new cultures,
new modes of discovery …
Consider this change from my own career experience ….
The History of Computational Biomedicine According to Bourne
1980s 1990s 2000s 2010s 2020
Discipline:
Unknown Expt. Driven Emergent Over-sold A Service A Partner A Driver
The Raw Material:
Non-existent Limited /Poor More/Ontologies Big Data/Siloed Open/Integrated
The People:
No name Technicians Industry recognition data scientists Academics
Searls (ed) The Roots in Bioinformatics Series PLOS Comp Biol
Premise:
We are entering a period of disruption in biomedical research and we should all be thinking about what this means
to bioinformatics & biomedicine
http://i1.wp.com/chisconsult.com/wp-content/uploads/2013/05/disruption-is-a-process.jpg http://cdn2.hubspot.net/hubfs/418817/disruption1.jpg
We are at a Point of Deception …
Evidence:– Google car– 3D printers– Waze– Robotics– Sensors
From: The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson & Andrew McAfee
Disruption: Example - Photography
DigitizationDeception
Disruption
Demonetization
Dematerialization
Democratization
Time
Vol
ume,
Vel
ocity
, Var
iety
Digital camera invented byKodak but shelved
Megapixels & quality improve slowly; Kodak slow to react
Film market collapses;Kodak goes bankrupt
Phones replacecameras
Instagram,Flickr become thevalue proposition
Digital media becomes bona fide form of communication
Disruption: Biomedical Research
Digitization of Basic & Clinical Research & EHR’s
Deception
We Are Here
Disruption
Demonetization
Dematerialization
Democratization
Open science
Patient centered health care
Disruptive Features: Sustainability
Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830
Disruptive Features:Reproducibility
Changing Value of Scholarship (?)
“And that’s why we’re here today. Because something called precision medicine … gives us one of the greatest opportunities for new medical breakthroughs that we have ever seen.”
President Barack ObamaJanuary 30, 2015
Disruptive Features – New Science
An Example of That Promise:Comorbidity Network for 6.2M Danes
Over 14.9 Years
Jensen et al 2014 Nat Comm 5:4022
What Are Some General Implications of Such a Future?
Open collaborative science becomes of increasing importance
The value of data and associated analytics becomes of increasing value to scholarship
Opportunities exist to improve the efficiency of the research enterprise and hence fund more research
Cooperation between funders will be needed to sustain the emergent digital enterprise
Current training content and modalities will not match supply to demand
Balancing accessibility vs security becomes more important yet more complex
How Should We Respond?
Funders: Encourage change and facilitate an orderly transition
Academic Leaders: Respond and facilitate a cultural shift
Developers: Develop working environments that are more adaptive and capable of answers questions in a more efficient and hopefully accurate way
Users: Use the above environments Publishers: Move beyond papers
Take an Example That is Central to What We Do
Molecular Graphics
Is It Optimal for Today’s Science?
http://upload.wikimedia.org/wikipedia/commons/2/2e/Molecular-Graphics-GRIP-75-Console.jpg
Good News/Bad News
Good News:– It is harder to think of a
more powerful way to comprehend complex data
– It has excited generations to the promise of science
– It has adapted to changing technologies
Bad News:– It is not an
adaptive/extensible environment
– It is not a collaborative environment
– It is not an integrative environment
– It is the curse of the ribbon
BMC Bioinformatics 2005, 6:21
1. A link brings up figures from the paper
0. Full text of PLoS papers stored in a database
2. Clicking the paper figure retrievesdata from the PDB which is
analyzed
3. A composite view ofjournal and database
content results
Is a database really different than a biological journal?
PloS Comp Biol 2005 1(3) e34
4. The composite view haslinks to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
The Knowledge and Data Cycle
Take Another Example:The Raw Material of Structural
Bioinformatics
Is this the optimal starting point anymore?
Do data resources including the PDB best serve the needs of the user at
this point?
Good News/Bad News for the PDB in this Changing Landscape
Bad News:
– Interface complex and uni-data oriented
– Data accessible; methods accessible (sort of); but not together
– Significant redundancy in services offered
Good News:
– Annotation!
– Demand is increasing
– Integrated with other data types
– Restful services
General Problem Statement:
How to insure a high quality annotated data source that provides
the optimal environment for accessibility, integration and analysis
by a broad community of diverse users?
The Commons is a shared virtual space which is FAIR:
– Find
– Access (use effectively)
– Interoperate
– Reuse
An environment to find and catalyze the use of shared digital research objects
The CommonsConcept
The Developer or User Defines the Environment from the Appropriate
Building Blocks
Public Beacons
Host Content
AMPLab 1000 Genomes Project
Broad Institute ExAC
Curoverse PGP, GA4GH Example Data
EBI 1000 Genomes Project, UK10K, GoNL, EVS, GEUVADIS, UMCG Cardio GenePanel
Google 1000 Genomes Project, Phase III, Illumina Platinum Genomes
ISB Known VARiants
NCBI NHLBI Exome Sequence Project
OICR 55 cancer datasets
SolveBio 56 public datasets
UCSC ClinVar, LOVD, UniProt
University of Leicester Cafe CardioKit, Cafe Variome Central
WTSI IBD, Native American, Egyptian, UK10K
Over ?? public datasets beaconized across 21 institutions
10s thousands of individuals
The CommonsComponents
Computing environment – cloud or HPC (High Performance Computing)
– supports access, utilization, sharing and storage of digital objects.
Methods for Interoperability– enables connectivity, shareability and interoperability
between digital objects.
Digital object compliance model – describes the properties of digital objects that
enables them to be discoverable and shareable.
The CommonsComponents
BD2KCenter
BD2KCenter
BD2KCenter
BD2KCenter
BD2KCenter
BD2KCenter
DDICC
Software
Standards
Infrastructure - The Commons
Labs
Labs
Labs
Labs
The ability to store and share and compute on digital research objects Especially useful for large data sets that
are not easily computed locally
Scalable and Elastic
Pay per use - Cost effective
An environment that fosters collaboration
The CommonsComputing Environment: Cloud
Commons - Pilots
The Cloud Credits - business model
BD2K Centers
MODs (Model Organism Databases)
HMP Data and tools available in the cloud
NCI Cloud Pilots & Genomic Data Commons
The PDB in the Commons
Components:– Annotated collection of data files– API’s to access these data files– Example methods using these APIs
Potential outcomes– Nothing happens?– A new breed of developer starts to use PDB data in new
ways ?– The casual user has a broader set of services that
previously?– Quality declines/increases?
I not only use all the brains I have, but all I can borrow.
– Woodrow Wilson
ADDS Team
BD2K Representatives
NIHNIH……Turning Discovery Into HealthTurning Discovery Into Health
[email protected]://datascience.nih.gov/
http://www.ncbi.nlm.nih.gov/research/staff/bourne/