nds relevant update from the nih data science (adds) office

26
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 NDS Relevant Update from the NIH Data Science (ADDS) Office Phil Bourne, Ph.D., FACMI Associate Director for Data Science (ADDS)

Upload: philip-bourne

Post on 15-Apr-2017

413 views

Category:

Education


0 download

TRANSCRIPT

Page 1: NDS Relevant Update from the NIH Data Science (ADDS) Office

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

NDS Relevant Update from the NIH Data Science (ADDS) Office

Phil Bourne, Ph.D., FACMIAssociate Director for Data Science (ADDS)

Page 2: NDS Relevant Update from the NIH Data Science (ADDS) Office

How Can NDS Succeed?• Be at the right place at right time• Bring together all the right stakeholders – there are

groups missing now- eg application scientists, publishers

• Define very well the problem(s) you are trying to solve• Start with pilots, but proceed to a soup to nuts

application that has value and can be sustained

Page 3: NDS Relevant Update from the NIH Data Science (ADDS) Office

How can NDS Interface with the NIH ….

Page 4: NDS Relevant Update from the NIH Data Science (ADDS) Office

ADDS Mission StatementTo use data science

to foster an open digital ecosystem

that will accelerate efficient, cost-effective

biomedical research

to enhance health, lengthen life, and reduce illness and disability

Page 5: NDS Relevant Update from the NIH Data Science (ADDS) Office

A couple of announcements …

Page 6: NDS Relevant Update from the NIH Data Science (ADDS) Office

http://www.nih.gov/news/health/oct2015/od-20.htm

Page 7: NDS Relevant Update from the NIH Data Science (ADDS) Office
Page 8: NDS Relevant Update from the NIH Data Science (ADDS) Office

ADDS Strategy • Discovery and Innovation

Enabling major scientific discovery and innovation through the BD2K Initiative

• Workforce developmentStrengthen the ability of a diverse biomedical workforce to develop and benefit from data science

• Policy and processContribute to policies & processes involving data that further the NIH mission

• LeadershipFurther visibility of NIH leadership in data science by the public, DHHS, USG at large, and international funders

• SustainabilityTo foster a sustainable, efficient, and productive data science ecosystem

Sustainability

Workforce Development

Discovery & Innovation

Policy & Process

Leadership

Page 9: NDS Relevant Update from the NIH Data Science (ADDS) Office

ADDS Strategy • Discovery and Innovation

Enabling major scientific discovery and innovation through the BD2K Initiative• Workforce development

Strengthen the ability of a diverse biomedical workforce to develop and benefit from data science• Policy and process

Contribute to policies & processes involving data that further the NIH mission• Leadership

Further visibility of NIH leadership in data science by the public, DHHS, USG at large, and international funders

• SustainabilityTo foster a sustainable, efficient, and productive data science ecosystem: The Commons

Sustainability

Workforce Development

Discovery & Innovation

Policy & Process

Leadership

Page 10: NDS Relevant Update from the NIH Data Science (ADDS) Office

Some Developments…• Centers, standards, training coordination

centers off and running• Looking at funding reference datasets• Hackathons and more…• NLM 2.0

Page 11: NDS Relevant Update from the NIH Data Science (ADDS) Office

Commons Updateenabling the digital enterprise

Page 12: NDS Relevant Update from the NIH Data Science (ADDS) Office

What is The Commons?

• Treats products of research – data, methods, papers etc. as digital objects

• These digital objects exist in a shared virtual space

• Digital objects conform to FAIR principles:– Findable– Accessible (and usable)– Interoperable – Reusable

Page 13: NDS Relevant Update from the NIH Data Science (ADDS) Office

The Commons: Components• Computing environment

– cloud and/or HPC – supports access, utilization, sharing and storage of digital objects.

• Methods for Interoperability– enables connectivity, shareability and interoperability between digital objects.– APIs, Containers (docker etc)

• Digital object compliance model – describes the properties of digital objects that enables them to be discoverable and

shareable– Metadata, UIDs, Clear access controls (human subject data)

• Indexing– Means to find and catalog digital objects

Page 14: NDS Relevant Update from the NIH Data Science (ADDS) Office

The Commons: Components

Page 15: NDS Relevant Update from the NIH Data Science (ADDS) Office

Computing Environment: Cloud The ability to store, share and compute on digital research objects

Especially useful for large data sets that are not easily computed locally

Scalable and Elastic

Pay per use - Cost effective

An environment that fosters collaboration

Page 16: NDS Relevant Update from the NIH Data Science (ADDS) Office

The Commons: Cloud Commercial

AWS, Google, Microsoft, IBM Others

Academic OSC (Open Science Cloud) iDASH (HIPAA compliant)

The Broad Others

Page 17: NDS Relevant Update from the NIH Data Science (ADDS) Office

The Commons: HPC• Supercomputing Centers in the US

– Supported by DOE and NSF• NERSC(San Francisco)• ORNL (Oak Ridge)• TACC (Texas)• SDSC (San Diego)• Argonne (Urbana- Champaign)

• Optimized, high performance systems with IT support

Page 18: NDS Relevant Update from the NIH Data Science (ADDS) Office

The Commons: Interoperability

Page 19: NDS Relevant Update from the NIH Data Science (ADDS) Office

The Commons: Interoperability• Software that supports connectivity and interoperability

between digital (data) objects

– API (Application Programing Interfaces)• Expose and and provide direct access to data• Enable data to be passed to analysis tools or pipelines

– Containers• Package and deploy software tools and pipelines to the cloud

Page 20: NDS Relevant Update from the NIH Data Science (ADDS) Office

The Commons: Digital Object Compliance

Page 21: NDS Relevant Update from the NIH Data Science (ADDS) Office

The CommonsDigital Object Compliance: FAIR

• Attributes of digital objects in the Commons • Initial Phase

• Unique digital object identifiers of some type• A minimal set of searchable metadata • Physically available in a cloud based Commons provider• Clear access rules (especially important for human subjects data)• An entry (with metadata) in one or more indices

– Future Phases• Standard, community based unique digital object identifiers • Conform to community approved standard metadata for enhanced searching• Digital objects accessible via open standard APIs• Are physically and logical available to the commons

Page 22: NDS Relevant Update from the NIH Data Science (ADDS) Office

Commons Pilot Projects

Page 23: NDS Relevant Update from the NIH Data Science (ADDS) Office

Commons Pilot Projects• Evaluating Commons Framework & Populating the Commons

– NIH funded Large Resource groups BD2K groups (cloud)

– HMP Data and tools available in the cloud (AWS)• https://aws.amazon.com/datasets/1903160021374413

– NCI Cloud Pilots & Genomic Data Commons (AWS, Google)

• The Cloud Credits - business model for using cloud resources

Page 24: NDS Relevant Update from the NIH Data Science (ADDS) Office

Commons Credits (business model)

The Commons(infrastructure)Cloud Provider

ACloud Provider

BCloud Provider

C

Investigator

NIH

Provides credits Enables Search

Discovery Index

Uses credits inthe Commons IndexesOption:

Direct Funding

Page 25: NDS Relevant Update from the NIH Data Science (ADDS) Office

• Cost effective - Only pay for IT support used• Drives competition – Better services at lower cost• Supports data access and sharing by driving science into the Commons• Can help determine metrics of data object usage• Facilitates public-private partnership

• Never been tried, so we don’t have data about likelihood of success• Cost Models: Predicated prices among providers• Service Providers: Predicated on service providers willing to make the investment to

become conformant• Persistence: The model is ‘Pay As You Go’ which means if you stop paying it stops going

Cloud Credits: Pros and Cons

Page 26: NDS Relevant Update from the NIH Data Science (ADDS) Office

NIH… Turning Discovery Into [email protected]

https://datascience.nih.gov/@pebourne