Download - Komatsoulis internet2 executive track
George A. Komatsoulis, Ph.D.National Center for Biotechnology Information
National Library of MedicineNational Institutes of Health
U.S. Department of Health and Human Services
NIH PerspectiveExecutive Track
internet2 Global Forum 2015
The Commons Business Model
The
Com
mon
sDigital Objects
(with identifiers)
Search(Indexed Metadata and API)
Computing Platform
The Commons: Conceptual Framework
Ope
n AP
Is
Softw
are
Enca
psul
ation
The
Com
mon
s
Digital Objects(with identifiers)
Search(Indexed Metadata and API)
Computing Platform
CommonsFederation
(Infrastructure)
BD2K Centers
DDICC(Search)
ExistingResources
Indexes Methods
Content
CommonsFederation
(Infrastructure)
BD2K Centers
DDICC(Search)
ExistingResources
Indexes Methods
Content
Investigator
Works In
Searches
CommonsFederation
(Infrastructure)
Conformant ProviderA
Conformant ProviderB
Conformant ProviderC
The Commons: Business Model
Researcher
Discovery IndexThe Commons
Cloud ProviderC
Cloud ProviderB
Cloud ProviderA
NIH
Provides Digital Objects
Retrieves/Uses Digital Objects
Option: Fund Providers to
Support NIH Directed Resources
Indexes Commons
ProvideCredits
UsesCredits
FindsObjects
Commons Implemented as a federation of ‘conformant’ cloud providers and HPC environments
Funded primarily by providing credits to investigators
Cost effective - Only pay for IT support usedDrives competition – Better services at lower
costSupports Data sharing by driving science into
the CommonsFacilitates public-private partnershipScalable to most categories of data expected in
the next 5 years.
Potential Advantages of this Model
Novelty: Never been tried, so we don’t have data about likelihood of success
Cost Models: Predicated on stable or declining prices among providers True for the last several years, but we can’t guarantee that it will
continue, particularly if there is significant consolidation in industry Service Providers:
Predicated on service providers willing to make the investment to become conformant
Market research suggests 3-5 providers within 2-3 months of program launch
Persistence: The model is ‘Pay As You Go’ which means if you stop paying it stops going Giving investigators an unprecedented level of control over what lives (or
dies) in the Commons
Potential Disadvantages of this Model
What does it mean for a vendor to be conformant?Minimum set of requirements for
Business relationships (reseller, investigators)Interfaces (upload, download, manage, compute)Capacity (storage, compute)Networking and ConnectivityInformation AssuranceAuthentication and authorization
Likely to be reviewed self-certification in pilot phaseA conformant cloud ≠ an IaaS provider
Likely to evolve into multiple ‘Levels of Compliance’ corresponding to increasing degrees of making data/software meet ‘FAIR’ criteria.
Some of our current thinking for basic compliance Objects are physically or logically available in the Commons Objects are indexed with a usable identifier Objects have basic search metadata attached to index entries Objects have clear access rules Objects have basic semantic metadata available
Higher levels could include Objects indexed with standards based identifiers (ORCID, doi, etc.) Objects are open to the public (or as open as reasonable given data type) Objects conform to agreed upon standards (CDISC, DICOM, etc.) Data objects are accessible via standard APIs Software is encapsulated (containers, other technology) for easier usage
We want and need your feedback on these matters!
What it mean for a scientist to be compliant?
Phase 0: Build the plumbingPhase 1: Pilot the model on a small number of
investigators experienced with cloud computing, probably within the context of BD2K awards
Phase 2: Open the Commons credit process to grantees from a subset of NIH Institutes and Centers
Phase 3: Open the process to all NIH grantees
Pilot of the Commons Business Model
dbGaP Cloud Policy
dbGaP: A Database of Genotypes and Phenotypes
Approved March 23, 2015“In light of the advances made in security protocols for cloud
computing in the past several years and given the expansion in the volume and complexity of genomic data generated by the research community, the National Institutes of Health (NIH) is now allowing investigators to request permission to transfer controlled-access genomic and associated phenotypic data obtained from NIH-designated data repositories under the auspices of the NIH Genomic Data Sharing (GDS) Policy to public or private cloud systems for data storage and analysis.”
Responsibility for ensuring the security and integrity remains with the institution.
NIH Position Statement on the use of cloud computing services
What can a CIO do to support biomedical research on their campus?
Help maintain perspective
1960 1970 1980 1990 2000 2010 2020
Connect us with our colleagues in other disciplinesSensor Stream = 500 EB/dayStores 69 TB/day
Collection = 14 EB/dayStore 1PB/day
Total Data = 14 PBStore an average of 3.3TB/day for 10 years!
But don’t lose sight of the differences associated with biological research
NIH Office of ADDSVivien Bonazzi, Ph.D.Philip Bourne, Ph.DMichelle Dunn, Ph.DMark Guyer, Ph.D.Jennie Larkin, Ph.D.Leigh FinneganBeth Russell
NCBIDennis Benson, Ph.D.Alan GraeffDavid Lipman, MDJim Ostell, Ph.D.Don PreussSteve Sherry
Acknowledgements