komatsoulis internet2 global forum 2015

22
George A. Komatsoulis, Ph.D. National Center for Biotechnology Information (NCBI) National Library of Medicine National Institutes of Health U.S. Department of Health and Human Services NIH Data Initiatives: Harnessing Big (and small) Data to Improve Health

Upload: george-komatsoulis

Post on 14-Apr-2017

165 views

Category:

Science


7 download

TRANSCRIPT

Page 1: Komatsoulis internet2 global forum 2015

George A. Komatsoulis, Ph.D.National Center for Biotechnology Information (NCBI)

National Library of MedicineNational Institutes of Health

U.S. Department of Health and Human Services

NIH Data Initiatives:Harnessing Big (and small) Data to

Improve Health

Page 2: Komatsoulis internet2 global forum 2015

NIH: The Nations Medical Research AgencyMission: ”To seek fundamental

knowledge about the nature and behavior of living systems and the application of that knowledge to enhance health, lengthen life, and reduce illness and disability.”

Composed of 27 Institutes and Centers

Annual Budget = $30.3B80% of NIH budget goes to

about 50,000 grants

Page 3: Komatsoulis internet2 global forum 2015

The Current “Big Data” problem in biology

Page 4: Komatsoulis internet2 global forum 2015

All of this has happened before…

1960 1970 1980 1990 2000 2010 2020

Page 5: Komatsoulis internet2 global forum 2015

… and other scientific fields have similar problemsSensor Stream = 500 EB/dayStores 69 TB/day

Collection = 14 EB/dayStore 1PB/day

Total Data = 14 PBStore an average of 3.3TB/day for 10 years!

Page 6: Komatsoulis internet2 global forum 2015

Although biomedicine has some unique challenges

Page 7: Komatsoulis internet2 global forum 2015

Launched to support biomedical data science researchSupport for multiple facets of data science:

BD2K CentersData and Software DiscoveryStandards and InteroperabilityTraining and Workforce DevelopmentThe Commons

Led by Dr. Phil Bourne, NIH Associate Director for Data Science

NIH BD2K: Big Data to Knowledge

Page 8: Komatsoulis internet2 global forum 2015

Public Data Repositories

Local Data

U N I V E R S I T YU N I V E R S I T Y

Locally Developed Software

Publicly AvailableSoftware

Local storage andcompute resources

Standard Model of Biomedical Computation

Page 9: Komatsoulis internet2 global forum 2015

The Commons: A shared virtual spaceIs scalable and exploits new computing modelsIs more cost effective given digital growthSimplifies sharing digital research objects such

as data, software, metadata and workflowsMakes digital research objects more FAIR:

Findable, Accessible, Interoperable and Reusable

DOES NOT replace existing, well-curated databases Phil Bourne, 2014

Page 10: Komatsoulis internet2 global forum 2015

The

Com

mon

sDigital Objects

(with identifiers)

Search(Indexed Metadata and API)

Computing Platform

The Commons: Conceptual Framework

Ope

n AP

Is

Softw

are

Enca

psul

ation

Page 11: Komatsoulis internet2 global forum 2015

The

Com

mon

s

Digital Objects(with identifiers)

Search(Indexed Metadata and API)

Computing Platform

CommonsFederation

(Infrastructure)

BD2K Centers

DDICC(Search)

ExistingResources

Indexes Methods

Content

Page 12: Komatsoulis internet2 global forum 2015

CommonsFederation

(Infrastructure)

BD2K Centers

DDICC(Search)

ExistingResources

Indexes Methods

Content

Investigator

Works In

Searches

Page 13: Komatsoulis internet2 global forum 2015

CommonsFederation

(Infrastructure)

Conformant ProviderA

Conformant ProviderB

Conformant ProviderC

Page 14: Komatsoulis internet2 global forum 2015

The Commons: Business Model

Researcher

Discovery IndexThe Commons

Cloud ProviderC

Cloud ProviderB

Cloud ProviderA

NIH

Provides Digital Objects

Retrieves/Uses Digital Objects

Option: Fund Providers to

Support NIH Directed Resources

Indexes Commons

ProvideCredits

UsesCredits

FindsObjects

Commons Implemented as a federation of ‘conformant’ cloud providers and HPC environments

Funded primarily by providing credits to investigators

Page 15: Komatsoulis internet2 global forum 2015

Cost effective - Only pay for IT support usedDrives competition – Better services at lower

costSupports Data sharing by driving science into

the CommonsFacilitates public-private partnershipScalable to most categories of data expected in

the next 5 years.

Potential Advantages of this Model

Page 16: Komatsoulis internet2 global forum 2015

Novelty: Never been tried, so we don’t have data about likelihood of success

Cost Models: Predicated on stable or declining prices among providers True for the last several years, but we can’t guarantee that it will

continue, particularly if there is significant consolidation in industry Service Providers:

Predicated on service providers willing to make the investment to become conformant

Market research suggests 3-5 providers within 2-3 months of program launch

Persistence: The model is ‘Pay As You Go’ which means if you stop paying it stops going Giving investigators an unprecedented level of control over what lives (or

dies) in the Commons

Potential Disadvantages of this Model

Page 17: Komatsoulis internet2 global forum 2015

Investigator

Reseller of CloudServices

The Commons

Cloud ProviderC

Cloud ProviderB

Cloud ProviderA

Investigator Institution

Directs resellerto distributecredits

Instructs provider toput credits on investigator account

1

2

Review

NIH

3

45

6

7

Approves CreditRequest

Requests Credits

Uses credits

Distributes CreditsTo Investigator

Business Model in Practice

Page 18: Komatsoulis internet2 global forum 2015

What does it mean to be conformant?Minimum set of requirements for

Business relationships (reseller, investigators)Interfaces (upload, download, manage, compute)Capacity (storage, compute)Networking and ConnectivityInformation AssuranceAuthentication and authorization

Still need to work out details of how to manage approval of conformance

A conformant cloud ≠ an IaaS providerDraft specification out for comment among vendors

Page 19: Komatsoulis internet2 global forum 2015

Phase 0: Build the plumbingPhase 1: Pilot the model on a small number of

investigators experienced with cloud computing, probably within the context of BD2K awards

Phase 2: Open the Commons credit process to grantees from a subset of NIH Institutes and Centers

Phase 3: Open the process to all NIH grantees

Pilot of the Commons Business Model

Page 20: Komatsoulis internet2 global forum 2015

NCI GDC and Cloud Pilots

QA/QCValidation

Aggregation

Authoritative NCI Reference Data Set

Data Coordinating Center

NCI Genomic Data Commons (under development)

NCI Clouds

High PerformanceComputing

Search/Retrieve

Download

Analysis

Page 21: Komatsoulis internet2 global forum 2015

Secure Computational CapacityPre-loaded Data

Secure Computational CapacityPre-loaded Data

Secure Computational CapacityPre-loaded Data

NCI Genomics Consortium

NCI Genomic Data Repositories

Page 22: Komatsoulis internet2 global forum 2015

NIH Office of ADDSVivien Bonazzi, Ph.D.Philip Bourne, Ph.DMichelle Dunn, Ph.DMark Guyer, Ph.D.Jennie Larkin, Ph.D.Leigh FinneganBeth Russell

NCBIDennis Benson, Ph.D.Alan GraeffDavid Lipman, MDJim Ostell, Ph.D.Don PreussSteve Sherry

Acknowledgements