data management, open access and data sharing today and …€¦ · data management, open access...

28
NATIONAL INSTITUTES OF HEALTH: Data Management, Open Access and Data Sharing Today and Beyond Presentation to the ASEE March 7, 2017 Patricia Flatley Brennan, RN, PhD Director National Library of Medicine

Upload: others

Post on 27-Sep-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

NATIONAL INSTITUTES OF HEALTH: Data Management,

Open Access and Data Sharing

Today and Beyond

Presentation to the ASEEMarch 7, 2017

Patricia Flatley Brennan, RN, PhDDirector

National Library of Medicine

Page 2: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full
Page 3: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

DataScience@NIH

• Opportunities

• Challenges

• Directions

Discovery!

Page 4: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Reusable

Findable

Accessible

Interoperable

Data

Page 5: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

FIN

DA

BLE •Persistent

identifier

•Metadata

•Curation

•Indexing

•Catalogs

•Source on the

fly

AC

CESSIB

LE

•Security

•Authentication

•Authorization

•Metadata persists

•Public-private solutions IN

TER

OP

ER

AB

LE •Formal, accessible, shared and broadly applicable language

•Vocabularies follow FAIR principles

REU

SA

BLE •Meet domain-

relevant community standards

•Provenance

•Sustainable

•Shift to Discovery

Page 6: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Computation• Analytics

– Biostatistics

– Statistics

– Distributed analytics

– Machine learning

– Optimization

• Visualization– Visual Analytics

– Depicting results

• Management – Business process

– Preserving provenance of analytical strategies

– Maintaining version control

Page 7: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Infrastructure

• Commons

• Identity management and authentication

• Planning and forecasting tools

• Business analyticsCourtesy Warren Kibbe

Page 8: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

National Library of Medicine (NLM)

• Mission: To acquire, organize, disseminate, and preserve the biomedical knowledge of the world for the benefit of the public’s health

• Open science is key to NLM’s mission of supporting scientific discovery

• NLM products highlighted:

– PubMed Central

– ClinicalTrials.gov

– PubChem

Page 9: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Our Future

Builds on

Our Past

Data2016 - ∞

Network

1984-2015

Digitize

~1970-1983

Index

1838-~1960

Page 10: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full
Page 11: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Create mechanisms to:

– Determine high value data sets

– Forecast their costs

– Set preservation strategies

– Anticipate utilization

Assess Value

Mine

high-value

datasets

Page 12: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Create mechanisms to:

– Determine high value data sets

– Forecast their costs

– Set preservation strategies

– Anticipate utilization

Preserve

with a

purpose

Page 13: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Promote Standards

Page 14: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Tools

for

Discovery &

Analysis

Page 15: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Promote Training

Page 16: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Engage with Others

Page 17: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full
Page 18: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

PubMed• 26.5 M records• Only bibliographic

recordsMEDLINE

• ~5600 selected journal titles

• Only bibliographic records

• 93% of PubMed

National Library of Medicine Collection• ~17,000 serial titles• All Serials – journals, annuals, statistics, etc.• Discoverable in NLM Catalog & LocatorPlus• NLM provides ILL and ensures archiving

NLM Collection

PMC = PubMed Central Archive • ~2,000 full participation

journals• 4 million full text articles• ~1M federally funded

public access articles• Bibliographic records

display in PubMed

2008: NIH Public Access Policy & PMC

International collaboration:Europe PMC & PMC Canada

Page 19: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Enhancing Access

• Research results more readily accessible to the public, providers, educators, and the scientific community

• Permanent preservation of research findings

• Common format used to store data from diverse sources

• Collections available for bulk download and text mining

Page 20: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

PubMed Central®

Page 21: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

• A registry and results database of publicly and privately supported clinical studies of human participants conducted around the world

• Sponsor or researcher registers study when it begins; updates information throughout; reports summary results upon completion

• Over 233,000 studies in all 50 States and 195 countries; nearly 24,000 studies with posted summary results

Page 22: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

23

Page 23: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

ClinicalTrials.gov: API & Data Use• API allows third parties to submit queries for specified trials

and display records: e.g., BreastCancerTrials.org, Foundation

Fighting Blindness, Colon Cancer Alliance

Page 24: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

• Comprehensive source of chemical structures of small organic molecules and their biological activities

• Organized as three linked databases– PubChem Substance, PubChem Compound,

PubChem BioAssay

• Records link to other databases, including scientific literature in PubMed

Page 25: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

• Provides guidance on citing the original generators of the data sets

– Corresponding scientists get credit

– Readers can locate data source

• Includes approximately 93 million compounds, 225 million substances

Page 26: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

NLM & the Future of Data Science

• Evolving data storage, communications, and computer security technologies

• Methods for generation, formalization, management, and sharing of knowledge resources

• Training for data scientists, data-informed investigators, data librarians

• Partnership with other NIH components and agencies promoting best practices for data storage, access, discovery and analysis.

• Strategic planning underway

Page 27: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

DataScience@NIH pivots to NLM

• Ensure integration of lessons learned from BD2K and Cloud Pilot

• Create mechanisms to determine high value data sets, locate them, forecast their cost and utilization

• Implement efficient, secure preservation strategies that facilitate access and reuse

• Re-engage and stimulate intramural and extramural efforts in standards

• Develop new methods for for data management and data-driven discovery

• Grow a talented workforce

• Foster open science

• Engage with government, national, and global collaboratives

Page 28: Data Management, Open Access and Data Sharing Today and …€¦ · Data Management, Open Access and Data Sharing Today and Beyond ... PMC = PubMed Central Archive •~2,000 full

Reaching me http://nlmdirector.nlm.nih.gov

[email protected]

@NLMdirector

emey

87

/ Ic

on

Arc

hiv

e /

CC

BY-

NC

-ND

-4.0