opportunities and challenges for international cooperation around big data

Post on 25-Jun-2015

330 Views

Category:

Education

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presented at the Open Session of the Big Data in Biomedicine conference, Barcelona, November 12, 2014.

TRANSCRIPT

Opportunities and Challenges for International Collaboration Around

Big Data

Philip E. Bourne, PhDAssociate Director for Data Science

National Institutes of Healthphilip.bourne@nih.gov

November 12, 2014

A Bottom Up Exemplar

Top Down

Protein sequence and functional annotation

Protein sequence and functional annotation

Gene Ontology annotationGene Ontology annotation

Pathway and reactionannotationPathway and reactionannotation

Protein interactionannotationProtein interactionannotation

Evidence-based proteomicsannotationEvidence-based proteomicsannotation

Cellular modelsCellular models

Variants AnnotationVariants Annotation ClinVar / OMIMMedThesaurus

[adapted from Ioannis Xenarios

What Else Can we Do from the Top Down?

The NIH Data Science Mission

Statement

To foster an ecosystem that enables biomedical* research to be

conducted as a digital enterprise that enhances health, lengthens life and

reduces illness and disability

* Includes biological, biomedical, behavioral, social, environmental, and clinical studies that relate to understanding health and disease.

Elements of The Ecosystem

Community Policy

Infrastructure

• Sustainability• Collaboration• Training

Elements of The Ecosystem

Community Policy

Infrastructure

• Sustainability Collaboration

• Training

VirtuousResearch

Cycle

The Virtuous Cycle

September 3, 2014 Workshophttp://goo.gl/fkWjhS

Policies – Now & Forthcoming

Data Sharing– Genomic data sharing announced

– Data sharing plans on all research awards

– Data sharing plan enforcement

• Machine readable plan

• Repository requirements to include grant numbers

http://www.nih.gov/news/health/aug2014/od-27.htm

Policies - Forthcoming

Data Citation– Goal: legitimize data as a form of scholarship

– Process:

• Machine readable standard for data citation (done)

• Endorsement of data citation for inclusion in NIH bib sketch, grants, reports, etc.

• Example formats for human readable data citations

• Slowly work into NLM/NCBI workflow

BD2KCenter

BD2KCenter

BD2KCenter

BD2KCenter

BD2KCenter

BD2KCenter

DDICC

Software

Standards

Infrastructure - The Commons

Labs

Labs

Labs

Labs

What is the Commons?

A Conceptual Framework for;

Sharing, finding, integrating, reusing and attributing digital research objects

– “Each digital object has a UID that must allow it to

be found, shared and attributed” – The Commons

Document

The Commons is agnostic of computing platform

The Commons: Framework Implementation

Digital Objects (with UIDs)

Search(indexed metadata)

Computing Platform

The

Com

mon

s

The Commons: Framework Implementation

Digital Objects (with UIDs)

Search(indexed metadata)

Computing Platform

The

Com

mon

s

The Commons: Framework Draft Implementations

The CommonsConceptual Framework

Public CloudPlatforms

Super Computing (HPC) Platforms

Other Platforms ?

Google, AWS (Amazon)

Microsoft (Azure), IBM,

other?

Most easily accessed by

NIH PIs

In house compute

solutions

Private clouds, HPC

– Pharma

– The Broad

– Bionimbus

Low access by NIH PIs

Super Computing 2014

ADDS coordinating

meeting with SC centers

NERSC “Commons Pilot”

The Commons: Framework Implementation

Digital Objects (with UIDs)

Search(indexed metadata)

Computing Platform

The

Com

mon

s

The Commons: Framework Draft Implementation

Digital Objects to populate and test the Commons;

– BD2K centers, NCI Cloud pilots (Google & AWS supported)

– Large Public Data Sets, MODs

Search

– BD2K Data and Software Discovery Indices

– Google Search functions

Use cases

The CommonsConceptual Framework

Public CloudPlatforms

The Commons: Framework Draft Implementation

Next Steps

– Determine which BD2K centers are most appropriate for a cloud

Commons pilot

– Develop a plan of action with NCI Cloud pilots

– Working with DDIC/SW Discovery Indices (UIDs, Search)

– Working with Google and AWS (Amazon) to determine what is

needed computationally

• In kind support (short term pilot)

• Conformant clouds (long term sustainable model)

– Developing Use cases!

The CommonsConceptual Framework

Public CloudPlatforms

A Business Model forThe Commons

The Commons: Framework Draft Implementation

Community – BD2K Awards

Community: BD2K Awards Governance

November 3 Kick-off PI Meeting– Emphasis on working groups that span centers and begin

the work of building the ecosystem

• Common API development (with GA4GH)

• Mobile

• Metadata

• Grand challenges

– Emphasize sharing from day 1

– Incentivized to work in the Commons

Community Short Term Interactions

NSF Workshops and Dear Colleague letter

Workshop with NOAA on public – private partnerships

ELIXIR Workshop– Standards

– Training

Workshop Inspiring the Game Developer Community to Engage in and Enhance Biomedical Research, Dec 2014

Sustainability of Data Resources 2015

1) Build a digital framework for data science training:

NIH Data Science Workforce Development Center

2) Develop short-tem training opportunities: Courses, educational resources, etc.

3) Develop the discipline of biomedical data science and support cross-training

Community: TrainingData Science Training Goals

Goals expanded from recommendations in the June 2012 DIWG and Aug 2013 Training workshop reports.

Heads Up on What is Coming in FY15

Calls for using the Commons

Calls for a standards framework development

Calls for software development

Calls to stimulate interactions between communities (diversity, rotations, library)

Calls for high risk, high return projects

Your ideas here…..

NIHNIH……Turning Discovery Into HealthTurning Discovery Into Health

philip.bourne@nih.gov

top related