the nih as a digital enterprise: implications for pag

30
The NIH as a Digital Enterprise: Implications for PAG Philip E. Bourne, PhD Associate Director for Data Science National Institutes of Health PAG San Diego January 11, 2015

Upload: philip-bourne

Post on 13-Jul-2015

1.693 views

Category:

Education


0 download

TRANSCRIPT

The NIH as a Digital Enterprise:Implications for PAG

Philip E. Bourne, PhDAssociate Director for Data Science

National Institutes of Health

PAG San DiegoJanuary 11, 2015

What do we mean by the notion of a Digital Enterprise?

Start by considering how far we have come in just one researcher’s

career….

Biomedical Research is Becoming More Digital and FAIR

Finding

Accessing

Integrating

Reusing

digital research objects

This move from an observational science to a more analytical science

is being driven by ever increasing amounts of digital data

The NIH Fire Hose Slide

And This May Just be the Beginning

Evidence:– Google car

– 3D printers

– Waze

– Robotics

From: The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson & Andrew McAfee

Further Perturbation:The Story of Meredith

http://fora.tv/2012/04/20/Congress_Unplugged_Phil_Bourne

Stephen Friend

47/53 “landmark” publications could not be replicated

[Begley, Ellis Nature, 483, 2012] [Carole Goble]

ADDS Mission Statement

To foster an open ecosystem that enables biomedical* research to be

conducted as a digital enterprise that enhances health, lengthens life and

reduces illness and disability

* Includes biological, biomedical, behavioral, social, environmental, and clinical studies that relate to understanding health and disease.

Some Goals of the Digital Enterprise

Cost savings through sharing of best practices

Sustainability of digital assets

Collaboration through identification of collaborators at the point of data collection not publication

Improved reproducibility through data and methods sharing

Integration of data types and data and literature to accelerate discovery

Some of Today’s Observations

Bad News– We do not yet have a

data sustainability plan

– Global policies define the why but not the how

– We do not know how all the data we currently have are used

– We can’t estimate future supply and demand

– We need to ramp up training programs in data science

Good news– Genuine willingness to

address the problem

– Global communities are emerging

– Efficiencies can be achieved

– BD2K is the beginnings of a plan

– We are beginning to quantify the issues

Sustainability 101

Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830

What is the NIH Doing to Fulfill That Promise?

Elements of The Digital Enterprise

Community Policy

Infrastructure

• Sustainability• Collaboration• Training

Elements of The Digital Enterprise

Community Policy

Infrastructure

• Sustainability Collaboration

• Training

VirtuousResearch

Cycle

Policies – Now & Forthcoming

Data Sharing– Genomic data sharing announced

– Data sharing plans on all research awards

– Data sharing plan enforcement

• Machine readable plan

• Repository requirements to include grant numbers

http://www.nih.gov/news/health/aug2014/od-27.htm

Policies - Forthcoming

Data Citation– Goal: legitimize data as a form of scholarship

– Process:

• Machine readable standard for data citation (done)

• Endorsement of data citation for inclusion in NIH bib sketch, grants, reports, etc.

• Example formats for human readable data citations

• Slowly work into NLM/NCBI workflow

BD2KCenter

BD2KCenter

BD2KCenter

BD2KCenter

BD2KCenter

BD2KCenter

DDICC

Software

Standards

Infrastructure - The Commons

Labs

Labs

Labs

Labs

The Commons

Digital Objects (with UIDs)

Search(indexed metadata)

Computing Platform

Th

e C

omm

ons

Vivien BonazziGeorge Komatsoulis

The Commons: Compute Platforms

The CommonsConceptual Framework

Public CloudPlatforms

Super Computing (HPC) Platforms

Other Platforms ?

Google, AWS (Amazon)

Microsoft (Azure), IBM,

other?

In house compute

solutions

Private clouds, HPC

– Pharma

– The Broad

– Bionimbus

Traditionally low access

by NIH

The Commons: Business Model

[George Komatsoulis]

How Might PAG’s Participate?

Consider contributing digital research objects into the Commons – data, software, standards, narrative, course materials …

Initiate your own moves from cylinders of excellence to more integrated and multi-functional data sources

Work to define new business models for the scientific enterprise

Accelerate This Kind of Study

Pfenning et al 2014 Science 346 1333

Generic Needs

Homogenization of disparate large unstructured datasets

Deriving structure from unstructured data

Feature mapping and comparison from image data

Visualization and analysis of multi-dimensional phenotypic datasets

Causal modeling of large scale dynamic networks and subsequent discovery

Utilize data that are sparsely and irregularly sampled and noisy

BD2K will offer reference datasets and points of domain expertise to explore these questions

1) Build an OPEN digital framework for data science training:

NIH Data Science Workforce Development Center

1) Develop short-term training opportunities: Courses, educational resources, etc.

1) Develop the discipline of biomedical data science and support cross-training – OPEN courseware

Community: TrainingData Science Training Goals

All goals have a diversity component and manate

Associate Director for Data Science

Commons BD2K Efficiency

Sustainability Education Innovation Process

• Cloud – Data & Compute

• Search• Security • Reproducibility

Standards• App Store

• Coordinate• Hands-on• Syllabus• MOOCs

• Community• Centers• Training Grants• Catalogs• Standards• Analysis

• Data Resource Support

• Metrics• Best

Practices• Evaluation• Portfolio

Analysis

The Biomedical Research Digital Enterprise

Partnerships

Collaboration

Programmatic Theme

Deliverable

Example Features • IC’s• Researchers• Federal

Agencies• International

Partners• Computer

Scientists

Scientific Data Council External Advisory Board

Training

NIHNIH……Turning Discovery Into HealthTurning Discovery Into Health

[email protected]

Potential Outcomes

Mobility: improve the outcomes of surgeries in children with cerebral palsy and gait pathology

Wellness: markers derived from constantly monitored eHealth/mobile health devices – apply to smoking cessation, weight loss

Cancer: further personalization of treatment

Mental Health: better identify factors that resist and promote brain disease e.g., schizophrenia, bipolar disorder, major depression, attention deficit hyperactivity disorder (ADHD), obsessive compulsive disorder (OCD), autism

Addiction: utilizing social media to track and treat drug use and addiction