where is open going?

Where is Open Going? Philip E. Bourne [email protected] http://www.slideshare.net/pebourne/ 3/01/14 2014 SPARC Annual Meeting 1

Upload: philip-bourne

Post on 06-May-2015




7 download


Keynote presentation for the SPARC 2014 conference in Kansas City on March 3, 2014


Page 1: Where is Open Going?

2014 SPARC Annual Meeting 1

Where is Open Going?Philip E. Bourne

[email protected]://www.slideshare.net/pebourne/


Page 2: Where is Open Going?

2014 SPARC Annual Meeting 2

Where is Open Going?

The answer depends on who you ask

Here is my biased viewpoint


Page 3: Where is Open Going?

2014 SPARC Annual Meeting 3

My Background/Bias• Mostly Biomedical

• RCSB PDB/IEDB Database Developer – Views on community, quality, sustainability …

• PLOS Journal Co-founder – Open Science Advocate• Associate Vice Chancellor for Innovation – Business

models, interaction with the private sector,sustainability• Professor – Mentoring, reward system, value (or not) of


• NIH Strategist/Transformer - ??


Page 4: Where is Open Going?

2014 SPARC Annual Meeting 4

Perhaps the first question to ask is:

What is the endpoint?


Page 5: Where is Open Going?

2014 SPARC Annual Meeting 5

Where Is Open Going?


Page 6: Where is Open Going?

2014 SPARC Annual Meeting 6

What Does The Democratization of Science Imply?

• The obvious – participation by all• Not so obvious

– More scrutiny – New types of rewards– More equal value placed on all participants– The removal of artificial boundaries that corral

knowledge (through power and resources) within silos that do not make sense as complexity increases


Page 7: Where is Open Going?

2014 SPARC Annual Meeting 7

Consider some personal examples that illustrate these implications


Page 8: Where is Open Going?

More Scrutiny – Highlights Lack of Reproducibility

• I can’t immediately reproduce the research in my own laboratory:

• It took an estimated 280 hours for an average user to approximately reproduce the paper

• Workflows are maturing and becoming helpful• Data and software versions and accessibility

prevent exact reproducibility

Daniel Garijo et al. 2013 Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome PLOS ONE 8(11) e80278 .

2014 SPARC Annual Meeting 83/01/14

Page 9: Where is Open Going?

2014 SPARC Annual Meeting 9

Why New Types of Rewards?

• I have a paper with 16,000 citations that no one has ever read

• I have papers in PLOS ONE that have more citations than ones in PNAS

• I have data sets I am proud of few places to put them

• I edited a journal but it did not count for much


Page 10: Where is Open Going?

2014 SPARC Annual Meeting 10

Equal Value Placed on Participants

• The UC System has Research Scientists (RS) & Project Scientists (PS) as well as tenured faculty -– RS/PS have no senate rights yet:– RS/PS frequently teach– RS/PS frequently have more grant money– RS/PS typically perform more service– RS/PS are most of the data scientists you know


Page 11: Where is Open Going?

2014 SPARC Annual Meeting 11

Are Increasingly Found on the Google Bus


Page 12: Where is Open Going?

2014 SPARC Annual Meeting 12

Institutional Boundaries

• Academia – Departments of physics, math, biology, chemistry etc. persist but scholars rarely confine themselves to these disciplines

• NIH – 27 institutes and centers, many dedicated to specific diseases & conditions – yet a specific gene may transcend ICs


Page 13: Where is Open Going?

2014 SPARC Annual Meeting 13

I have argued that the democratization of science is compelling

I have not argued for the value of open access to this picture because you know that already


Page 14: Where is Open Going?

2014 SPARC Annual Meeting 14

I Would Also Argue That This Process is About to Accelerate

• Others provide a more compelling argument:– Google car– 3D printers– Waze– Robotics


Page 15: Where is Open Going?

2014 SPARC Annual Meeting 15

From the Second Machine Age


From: The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson & Andrew McAfee

Page 16: Where is Open Going?

2014 SPARC Annual Meeting 16

So what will this look like for an institution?


Institutions will become digital enterprises

Page 17: Where is Open Going?

2014 SPARC Annual Meeting 17

Components of The Academic Digital Enterprise

• Consists of digital assets– E.g. datasets, papers, software, lab notes

• Each asset is uniquely identified and has provenance, including access control– E.g. publishing simply involves changing the access

control• Digital assets are interoperable across the



Page 18: Where is Open Going?

2014 SPARC Annual Meeting 18

Life in the Academic Digital Enterprise

• Jane scores extremely well in parts of her graduate on-line neurology class. Neurology professors, whose research profiles are on-line and well described, are automatically notified of Jane’s potential based on a computer analysis of her scores against the background interests of the neuroscience professors. Consequently, professor Smith interviews Jane and offers her a research rotation. During the rotation she enters details of her experiments related to understanding a widespread neurodegenerative disease in an on-line laboratory notebook kept in a shared on-line research space – an institutional resource where stakeholders provide metadata, including access rights and provenance beyond that available in a commercial offering. According to Jane’s preferences, the underlying computer system may automatically bring to Jane’s attention Jack, a graduate student in the chemistry department whose notebook reveals he is working on using bacteria for purposes of toxic waste cleanup. Why the connection? They reference the same gene a number of times in their notes, which is of interest to two very different disciplines – neurology and environmental sciences. In the analog academic health center they would never have discovered each other, but thanks to the Digital Enterprise, pooled knowledge can lead to a distinct advantage. The collaboration results in the discovery of a homologous human gene product as a putative target in treating the neurodegenerative disorder. A new chemical entity is developed and patented. Accordingly, by automatically matching details of the innovation with biotech companies worldwide that might have potential interest, a licensee is found. The licensee hires Jack to continue working on the project. Jane joins Joe’s laboratory, and he hires another student using the revenue from the license. The research continues and leads to a federal grant award. The students are employed, further research is supported and in time societal benefit arises from the technology.

From What Big Data Means to Me JAMIA 2014 21:194


Page 19: Where is Open Going?

2014 SPARC Annual Meeting 19

Let us now turn to the biomedical sciences and look at what might happen if the NIH were to become a digital enterprise


Page 20: Where is Open Going?

2014 SPARC Annual Meeting 20

As of Today

• Assumed the role of Associate Director for Data Science (ADDS): NIH Data Science Point Person  

Reports to NIH Director Lead the BD2K initiative Trans-NIH responsibilities for data

Eric Green, Acting


[Modified slide from Eric Green]

Page 21: Where is Open Going?

2014 SPARC Annual Meeting 21

The focus is on data, but I do not think that can be separated from the research life cycle as you will see…


Page 22: Where is Open Going?

2014 SPARC Annual Meeting 22

I Want To Engage With This Community To:

• Help me understand the most pressing problems

• Begin a dialog • Inform you of what I am currently thinking• Inform you of relevant NIH initiatives that are

underway or planned• Have you change my thinking appropriately


Page 23: Where is Open Going?

2014 SPARC Annual Meeting 23

The NIH process thus far …

An external advisory group provided a valuable blueprint for what should be done



Page 24: Where is Open Going?

2014 SPARC Annual Meeting 24

Blueprint Recommendations• Promote central and federated catalogs

– Establish minimal metadata framework– Tools to facilitate data sharing– Elaborate on existing data sharing policies

• Support methods and applications– Fund all phases of software development– Leverage lessons from National Centers

• Training– More funding– Enhance review of training apps– Quantitative component to all awards

• On campus IT strategic plan– Catalog of existing tools– Informatics laboratory– Ditto big data

• Sustainable funding commitment



Page 25: Where is Open Going?

2014 SPARC Annual Meeting 25

Let me outline in general terms where I see my effort being spent going forward



Page 26: Where is Open Going?

2014 SPARC Annual Meeting 26

ADDS Initial Thrusts

• How data are currently being used• Lightweight metadata standards• Data & software registries• Expanded policies on data sharing, open source

software• Training programs & reward systems• Institutional incentives• Private sector incentives• Data centers serving community needs3/01/14

Page 27: Where is Open Going?

2014 SPARC Annual Meeting 27

ADDS Initial Thrusts

• How data are currently being used• Lightweight metadata standards• Data & software registries• Expanded policies on data sharing, open source

software• Training programs & reward systems• Institutional incentives• Private sector incentives• Data centers serving community needs3/01/14

Page 28: Where is Open Going?

2014 SPARC Annual Meeting 28

We need to start by asking, how are we using the data now?

Only then can we make rational decisions about data – large or small


Page 29: Where is Open Going?

2014 SPARC Annual Meeting

How Data Are Used

* http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm

Jan. 2008 Jan. 2009 Jan. 2010Jul. 2009Jul. 2008 Jul. 2010

1RUZ: 1918 H1 Hemagglutinin

Structure Summary page activity forH1N1 Influenza related structures

3B7E: Neuraminidase of A/Brevig Mission/1/1918 H1N1 strain in complex with zanamivir

29[Andreas Prlic]3/01/14

Page 30: Where is Open Going?

2014 SPARC Annual Meeting 30

We Need to Learn from Industries Whose Livelihood Addresses the Question of Use


Page 31: Where is Open Going?

2014 SPARC Annual Meeting 31

ADDS Initial Thrusts – More Detail• Now:

– Data centers (under review)– Data science training grants (call out)– Pilot data catalog consortium (call out)– Genomic Data Sharing Policy (being finalized)– Piloting “NIH-drive”

• What Is Planned:– Extended public-private programs specifically for data science

activities– Interagency activities– International exchange programs– Cold Spring Harbor-like training facilities – by-coastal?– Programs for better data descriptions– Reward institutions/communities– Policies to get clinical trial data into the public domain


Page 32: Where is Open Going?

2014 SPARC Annual Meeting 32

ADDS Initial Thrusts – More Detail• Now:

– Data centers (under review)– Data science training grants (call out)– Pilot data catalog consortium (call out)– Genomic Data Sharing Policy (being finalized)– Piloting “NIH-drive”

• What Is Planned:– Extended public-private programs specifically for data science

activities– Interagency activities– International exchange programs– Cold Spring Harbor-like training facilities – by-coastal?– Programs for better data descriptions– Reward institutions/communities– Policies to get clinical trial data into the public domain


Page 33: Where is Open Going?

2014 SPARC Annual Meeting 33

Pilot NIH-Drive

• Investigator A from the NCI makes frequent reference to the over expression of genes x and y.

• Investigator B from the NHLBI makes frequent reference to the under expression of genes x and y

• Automatic notification of a potential common interest before publication or database deposition


Page 34: Where is Open Going?

2014 SPARC Annual Meeting 34

Let me come back to the big picture..


Page 35: Where is Open Going?

2014 SPARC Annual Meeting 35

First consider what we do (or wish we could do) every day:

We take actions on digital data increasingly across boundaries


Page 36: Where is Open Going?

2014 SPARC Annual Meeting 36

Actions on Biomedical Data Implies:

• Insuring data quality and hence trust• Making data sustainable• Making data open and accessible• Making data findable• Providing suitable metadata and annotation• Making data queryable• Making data analyzable• Presenting data as to maximize its value• Rewarding good data practices3/01/14

Page 37: Where is Open Going?

2014 SPARC Annual Meeting 37

Actions on Biomedical Data Implies:

• Insuring data quality and hence trust • Making data sustainable • Making data open and accessible • Making data findable • Providing suitable metadata and annotation• Making data queryable• Making data analyzable • Presenting data as to maximize its value• Rewarding good data practices3/01/14

Page 38: Where is Open Going?

2014 SPARC Annual Meeting 38

Boundaries on Biomedical Data Implies:

• Working across biological scales• Working across biomedical disciplines• Working across basic and clinical research and

practice• Working across institutional boundaries• Working across public and private sectors• Working across national and international

borders• Working across funding agencies3/01/14

Page 39: Where is Open Going?

2014 SPARC Annual Meeting 39

Boundaries on Biomedical Data Implies:

• Working across biological scales • Working across biomedical disciplines• Working across basic and clinical research and

practice• Working across institutional boundaries• Working across public and private sectors • Working across national and international

borders• Working across funding agencies3/01/14

Page 40: Where is Open Going?

2014 SPARC Annual Meeting 40

These issues have been around a long time

The good news is that “Big Data” has bought more attention to the problem


Page 41: Where is Open Going?

2014 SPARC Annual Meeting 41

What Are Big Data?

• Large datasets from high throughput experiments

• Large numbers of small datasets• Data which are “ill-formed”• The why (causality) is replaced by the what• A signal that a fundamental change is taking

place – a tipping point?


Page 42: Where is Open Going?

2014 SPARC Annual Meeting 42

The NIH is Starting to Think About the Digital Enterprise, Witness…



Page 43: Where is Open Going?

2014 SPARC Annual Meeting 43

What Will Define the NIH Digital Enterprise?

• NCBI/NLM• Trans-NIH collaboration – a culture change• Long-term NIH strategic planning • The BD2K Initiative• A “hub” of data science activities • International cooperation• Interagency cooperation• Data sharing policies• External forces….3/01/14

Page 44: Where is Open Going?

2014 SPARC Annual Meeting 44

This is great, but what will it look like to the end user and to those interested in scholarly communication?


Page 45: Where is Open Going?


1. A link brings up figures from the paper

0. Full text of PLoS papers stored in a database

2. Clicking the paper figure retrievesdata from the PDB which is


3. A composite view ofjournal and database

content results

One Possible End Point

1. User clicks on thumbnail2. Metadata and a

webservices call provide a renderable image that can be annotated

3. Selecting a features provides a database/literature mashup

4. That leads to new papers

4. The composite view haslinks to pertinent blocks

of literature text and back to the PDB





PLoS Comp. Biol. 2005 1(3) e343/01/14

Page 46: Where is Open Going?

2014 SPARC Annual Meeting 46

To get to that end point we have to consider the complete research lifecycle


Page 47: Where is Open Going?


The Research Life Cycle will Persist


3/01/14 2014 SPARC Annual Meeting

Page 48: Where is Open Going?

2014 SPARC Annual Meeting 48

Tools and Resources Will Continue To Be Developed



Lab Notebooks



Analysis Tools




Page 49: Where is Open Going?

2014 SPARC Annual Meeting 49

Those Elements of the Research Life Cycle will Become More Interconnected

Around a Common Framework



Lab Notebooks



Analysis Tools




Page 50: Where is Open Going?

New/Extended Support Structures Will Emerge



Lab Notebooks



Analysis Tools



Commercial &Public Tools


By Discipline

Data JournalsDiscipline-

Based MetadataStandards

Community Portals

Institutional Repositories

New Reward Systems

Commercial Repositories


3/01/14 2014 SPARC Annual Meeting 50

Page 51: Where is Open Going?

2014 SPARC Annual Meeting 51

We Have a Ways to Go



Lab Notebooks



Analysis Tools



Commercial &Public Tools


By Discipline

Data JournalsDiscipline-

Based MetadataStandards

Community Portals

Institutional Repositories

New Reward Systems

Commercial Repositories



Page 52: Where is Open Going?

2014 SPARC Annual Meeting 52

Where is Open Going?

• Slowly towards the democratization of science• Which changes how institutions think and

operate – they become digital enterprises• This in turn impacts the scholarly research

lifecycle and hence scholarly communication

• I will be working to help the NIH be a leading institution in this change


Page 53: Where is Open Going?

Thank You!Questions?

[email protected]