opportunities and challenges for international cooperation around big data
DESCRIPTION
Presented at the Open Session of the Big Data in Biomedicine conference, Barcelona, November 12, 2014.TRANSCRIPT
Opportunities and Challenges for International Collaboration Around
Big Data
Philip E. Bourne, PhDAssociate Director for Data Science
National Institutes of [email protected]
November 12, 2014
A Bottom Up Exemplar
Top Down
Protein sequence and functional annotation
Protein sequence and functional annotation
Gene Ontology annotationGene Ontology annotation
Pathway and reactionannotationPathway and reactionannotation
Protein interactionannotationProtein interactionannotation
Evidence-based proteomicsannotationEvidence-based proteomicsannotation
Cellular modelsCellular models
Variants AnnotationVariants Annotation ClinVar / OMIMMedThesaurus
[adapted from Ioannis Xenarios
What Else Can we Do from the Top Down?
The NIH Data Science Mission
Statement
To foster an ecosystem that enables biomedical* research to be
conducted as a digital enterprise that enhances health, lengthens life and
reduces illness and disability
* Includes biological, biomedical, behavioral, social, environmental, and clinical studies that relate to understanding health and disease.
Elements of The Ecosystem
Community Policy
Infrastructure
• Sustainability• Collaboration• Training
Elements of The Ecosystem
Community Policy
Infrastructure
• Sustainability Collaboration
• Training
VirtuousResearch
Cycle
Policies – Now & Forthcoming
Data Sharing– Genomic data sharing announced
– Data sharing plans on all research awards
– Data sharing plan enforcement
• Machine readable plan
• Repository requirements to include grant numbers
http://www.nih.gov/news/health/aug2014/od-27.htm
Policies - Forthcoming
Data Citation– Goal: legitimize data as a form of scholarship
– Process:
• Machine readable standard for data citation (done)
• Endorsement of data citation for inclusion in NIH bib sketch, grants, reports, etc.
• Example formats for human readable data citations
• Slowly work into NLM/NCBI workflow
BD2KCenter
BD2KCenter
BD2KCenter
BD2KCenter
BD2KCenter
BD2KCenter
DDICC
Software
Standards
Infrastructure - The Commons
Labs
Labs
Labs
Labs
What is the Commons?
A Conceptual Framework for;
Sharing, finding, integrating, reusing and attributing digital research objects
– “Each digital object has a UID that must allow it to
be found, shared and attributed” – The Commons
Document
The Commons is agnostic of computing platform
The Commons: Framework Implementation
Digital Objects (with UIDs)
Search(indexed metadata)
Computing Platform
The
Com
mon
s
The Commons: Framework Implementation
Digital Objects (with UIDs)
Search(indexed metadata)
Computing Platform
The
Com
mon
s
The Commons: Framework Draft Implementations
The CommonsConceptual Framework
Public CloudPlatforms
Super Computing (HPC) Platforms
Other Platforms ?
Google, AWS (Amazon)
Microsoft (Azure), IBM,
other?
Most easily accessed by
NIH PIs
In house compute
solutions
Private clouds, HPC
– Pharma
– The Broad
– Bionimbus
Low access by NIH PIs
Super Computing 2014
ADDS coordinating
meeting with SC centers
NERSC “Commons Pilot”
The Commons: Framework Implementation
Digital Objects (with UIDs)
Search(indexed metadata)
Computing Platform
The
Com
mon
s
The Commons: Framework Draft Implementation
Digital Objects to populate and test the Commons;
– BD2K centers, NCI Cloud pilots (Google & AWS supported)
– Large Public Data Sets, MODs
Search
– BD2K Data and Software Discovery Indices
– Google Search functions
Use cases
The CommonsConceptual Framework
Public CloudPlatforms
The Commons: Framework Draft Implementation
Next Steps
– Determine which BD2K centers are most appropriate for a cloud
Commons pilot
– Develop a plan of action with NCI Cloud pilots
– Working with DDIC/SW Discovery Indices (UIDs, Search)
– Working with Google and AWS (Amazon) to determine what is
needed computationally
• In kind support (short term pilot)
• Conformant clouds (long term sustainable model)
– Developing Use cases!
The CommonsConceptual Framework
Public CloudPlatforms
A Business Model forThe Commons
The Commons: Framework Draft Implementation
Community – BD2K Awards
Community: BD2K Awards Governance
November 3 Kick-off PI Meeting– Emphasis on working groups that span centers and begin
the work of building the ecosystem
• Common API development (with GA4GH)
• Mobile
• Metadata
• Grand challenges
– Emphasize sharing from day 1
– Incentivized to work in the Commons
Community Short Term Interactions
NSF Workshops and Dear Colleague letter
Workshop with NOAA on public – private partnerships
ELIXIR Workshop– Standards
– Training
Workshop Inspiring the Game Developer Community to Engage in and Enhance Biomedical Research, Dec 2014
Sustainability of Data Resources 2015
1) Build a digital framework for data science training:
NIH Data Science Workforce Development Center
2) Develop short-tem training opportunities: Courses, educational resources, etc.
3) Develop the discipline of biomedical data science and support cross-training
Community: TrainingData Science Training Goals
Goals expanded from recommendations in the June 2012 DIWG and Aug 2013 Training workshop reports.
Heads Up on What is Coming in FY15
Calls for using the Commons
Calls for a standards framework development
Calls for software development
Calls to stimulate interactions between communities (diversity, rotations, library)
Calls for high risk, high return projects
Your ideas here…..
NIHNIH……Turning Discovery Into HealthTurning Discovery Into Health