fairer research

40
FAIRer Research Professor Carole Goble CBE FREng FBCS The University of Manchester, UK [email protected] STM Conference, London, 3 rd Dec 2014

Upload: carole-goble

Post on 12-Aug-2015

111 views

Category:

Science


0 download

TRANSCRIPT

Page 1: FAIRer Research

FAIRer ResearchProfessor Carole Goble CBE FREng FBCS

The University of Manchester, [email protected]

STM Conference, London, 3rd Dec 2014

Page 2: FAIRer Research

“An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995

datasetsdata collectionsstandard operating proceduressoftwarealgorithmsconfigurationstools and appscodesworkflowsscriptscode librariesservices,system software infrastructure, compilershardware

Page 3: FAIRer Research

Systems Biology

Page 4: FAIRer Research

Systems Biology

Modelling Cycle

45 organisations 112 organisations 37 organisations

Page 5: FAIRer Research

http://www.seek4science.org

Aggregated Commonsshare and interlinking methods, models, data,

samples…multi-stewardship, multi-disciplinary, mixed

Standards

DCATFOAF

Data

Models

Articles

ExternalDatabases

Metadata

YellowPages

Page 6: FAIRer Research

Investigations

AssaysStudies

Towards Interoperable Bioscience Data, Nature Genetics, 2012

Standards, Structure, Interlink

Just Enough Results Model for things produced and used in experiments

Page 7: FAIRer Research

http://www.fair-dom.org

Findable, Accessible, Interoperable, Reusable

Data, SOPs, Models, Methods

Multi-tenant CommonsPlatform

http://datafairport.org

Page 8: FAIRer Research

Data discovery

Data assembly, cleaning, and refinement

Modeling

Statistical analysis

Data collection

InsightsInsights Scholarly Communication & Reporting

Scholarly Communication & Reporting

Material & Methods

Page 9: FAIRer Research

BioSTIF

instruments and laboratory

Data discovery

Data assembly, cleaning, and refinement

Modeling

Statistical analysis

Data collection

InsightsInsights Scholarly Communication & Reporting

Scholarly Communication & Reporting

Material & Methods

Page 10: FAIRer Research

Workflow Commons

Page 11: FAIRer Research

"Mapping present and future predicted distribution patterns for a meso-grazer guild in the Baltic Sea" by Sonja Leidenberger et al

Page 12: FAIRer Research

• 35 kinds of annotations• 5 Main Workflows• 14 Nested Workflows• 11 Configuration files• 25 Scripts• 10 Software dependencies • 1 Workflow management

system• 1 Web Service • Dataset: 90 galaxies observed

in 3 bands

José Enrique Ruiz (IAA-CSIC)

Galaxy Luminosity Profiling

Dependencies

Components

Page 13: FAIRer Research

Rinse and Repeat Research

• Sweep Datasets

• Sweep Variables

• Sweep Steps

Page 14: FAIRer Research

SHARING SENSITIVITYSENSITIVE SHARING

IMPLICATIONS FOR METRICSBEING FAIR

Page 15: FAIRer Research

scientific ego-system for open sciencetrust, reciprocity, competition

famecompetitiveadvantage

productivitycredit

adoption kudos

for love

blamescooped uncredited misinterpretation scrutinyshameinsecuritycost/time/skillsdistractionresponsibilitydisruption staff churninertia

Page 16: FAIRer Research

Howard Ratner, STM Innovations Seminar 2012was: Chair STM Future Labs Committee, CEO EVP Nature Publishing Group,

now: Director of Development for CHORUS (Clearinghouse for the Open Research of US)

http://www.youtube.com/watch?v=p-W4iLjLTrQ&list=PLC44A300051D052E5

http://www.myexperiment.org/packs/196.html

Page 17: FAIRer Research

http://www.researchobject.org/

Outputs are first class citizens to be managed, credited and tracked: data, software

A Framework to Bundle and Relate multi-hosted (digital) resources of a scientific experiment or investigation using standard mechanisms & uniform access protocols. Carriers of Research Context

Research Objects

Page 18: FAIRer Research

What is the RO Framework?

• A framework of models and conventions

• Representations

• API specifications

• Implementations mapped into legacy / commodity platforms

Page 19: FAIRer Research

Knowledge TurnsUnit of Scholarly Currency, RO CommonsCirculate in the Scholarly EcosystemCitation? Credit? Link to Publishers?

Goble, De Roure, Bechhofer, Accelerating Knowledge Turns, I3CK, 2013

Page 20: FAIRer Research

Goble, De Roure, Bechhofer, Accelerating Knowledge Turns, I3CK, 2013

Collaboration to support safe use of patient and research data for medical research

Farr CommonsResearch Object packages codes, study, and metadata to exchange coded descriptions of clinical study cohorts

Knowledge TurnsUnit of Scholarly Currency, RO CommonsCirculate in the Scholarly EcosystemCitation? Credit? Link to Publishers?arch, Discover, Index, Harvest, Port

Page 21: FAIRer Research

Schopf, Treating Data Like Software: A Case for Production Quality Data, JCDL 2012

Goble, De Roure, Bechhofer, Accelerating Knowledge Turns, I3CK, 2013

Profile FocusBody of knowledge around methods, workflows, software, data, person, rather than publication.Citation, credit

Page 22: FAIRer Research

Release ResearchEvolution, Emergence, Discourse, ThreadedComparison, Historical review, Anti-SalamiForks, Merges, Fixivity, Citation? Credit?Flow across groups, projects and articles

Schopf, Treating Data Like Software: A Case for Production Quality Data, JCDL 2012

Page 23: FAIRer Research

Reproduce ResearchRepeat, Replicate, Recompute, Reuse….Entropy, Citation? Credit?

icanhascheezburger.com

Zhao, et al . Why workflows break - Understanding and combating decay in Taverna workflows, 8th Intl Conf e-Science 2012

Can I repeat & defend my results?

Can I review, reproduce and compare my results/method with your results/method?

Can I review, replicate and certify

your results?

Can I transfer your results into my

research and reuse this method?

Hettne et al Structuring research methods and data with the research object model: genomics workflows as a case study 2014 http://www.jbiomedsem.com/content/pdf/2041-1480-5-41.pdf

Page 24: FAIRer Research

ReproduceRepeat, Replicate, Recompute, Reuse….Entropy, Citation? Credit?

icanhascheezburger.com

Page 26: FAIRer Research

Checklists aka Minimum Information Models, Reporting GuidelinesMinim Checklist Ontology, http://purl.org/net/mim/ns

Zhao et. al. A Checklist-Based Approach for Quality Assessment of Scientific Information 3rd In. Workshop on Linked Science, 2013

Hettne et al Structuring research methods and data with the research object model: genomics workflows as a case study 2014 http://www.jbiomedsem.com/content/pdf/2041-1480-5-41.pdf

Page 27: FAIRer Research

Checklists

Versio

nin

gPro

venance

Dependencies

Progressive

MetadataProfiles

Depth: how deeply described

Coverage: how much is covered.

More specialised detail, fewer

services

More Stakeholders & ServicesCitation minimum

LibraryPublishers

Experiments

Science

PROVPAVVoID GIT

PAVNISO-JATS

Docker

DC

EXPO, ISA, JERM, OBI

MIAME, SBML, SED-ML

wfdesc

MIM Ontology

wfprov

VIVO-ISF

PID

Page 28: FAIRer Research

Standards

Machine-processable

Technology Independent

Multi-platform

Incremental

Page 29: FAIRer Research

W3C OADM

DOIs

URIsHandles

ORCID

OAI-ORE

RRIDs

Page 30: FAIRer Research

host

service

Open Source/Store

Sci as a Service

Integrative fws

Virtual Machines

Portable Packaging

ReproZip

Workflows,makefilesProvStore

Page 31: FAIRer Research

OMEX archive

bundle

Page 32: FAIRer Research
Page 33: FAIRer Research

Nanopub: represents structured data along with its provenance in a single publishable and citable entry

Galaxy workflows: re-enact the analysis

Research Object: aggregates the (digital) resources contributing to findings of (computational) research (results, data and software) as citable compound digital objects

http://isa-tools.github.io/soapdenovo2/http://sandbox.wf4ever-project.org/portal/ro?ro=http://sandbox.wf4ever-project.org/rodl/ROs/SOAP2denovo2-Aureus/

[Alejandra Gonzalez-BeltranPhilippe Rocca-Serra]

Page 34: FAIRer Research

• Id & Cite fluid things

• Uniform handling 1st class citizens

• Compound, multi-authored

• Mixed, leaky containers

• Span outcomes, evolve outputs, emergence

• Profiles• Bridge

researchers, platforms, resources

Bechhofer, Why linked data is not enough for scientists, DOI: 10.1016/j.future.2011.08.004

Page 35: FAIRer Research

[Norman Morrison]

Page 36: FAIRer Research

Focus on Personal

Productivity not Public

Good

Auto-magical

Stealthy not Sneakyreduce the frictioninstrumentation

Training Time

Page 37: FAIRer Research

From made RO to born RO

Page 38: FAIRer Research

• Open research is like Open software• Multi-part, multi-contributor, updating• Tardis & Commons • Implications for metrics? publishing?• Learning from open software development

Page 39: FAIRer Research

http://www.force11.org

Page 40: FAIRer Research

• Barend Mons• Sean Bechhofer• Philip Bourne• Matthew Gamble• Raul Palma• Jun Zhao• Alan Williams• Stian Soiland-Reyes• Paul Groth• Tim Clark• Juliana Freire• Alejandra Gonzalez-Beltran• Philippe Rocca-Serra• Ian Cottam• Susanna Sansone• James Howison• James Herbsleb• Kristian Garza

All the members of the Wf4Ever teamiSOCO: Intelligent Software Components S.A., SpainUniversity of Manchester, School of Computer Science, Manchester, United KingdomUniversity of Oxford, Department of Zoology, Oxford, UKPoznan Supercomputing and Networking Center. Poznan, PolandIAA: Instituto de Astrofísica de Andalucía, Granada, SpainLeiden University Medical Centre, Centre for Human and Clinical Genetics, The Netherlands

Colleagues in Manchester’s Information Management GroupRO Advisory Board Members

http://www.researchobject.orghttp://www.wf4ever-project.orghttp://www.fair-dom.orghttp://www.datafairport.org