fairer research
TRANSCRIPT
FAIRer ResearchProfessor Carole Goble CBE FREng FBCS
The University of Manchester, [email protected]
STM Conference, London, 3rd Dec 2014
“An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995
datasetsdata collectionsstandard operating proceduressoftwarealgorithmsconfigurationstools and appscodesworkflowsscriptscode librariesservices,system software infrastructure, compilershardware
Systems Biology
Systems Biology
Modelling Cycle
45 organisations 112 organisations 37 organisations
http://www.seek4science.org
Aggregated Commonsshare and interlinking methods, models, data,
samples…multi-stewardship, multi-disciplinary, mixed
Standards
DCATFOAF
Data
Models
Articles
ExternalDatabases
Metadata
YellowPages
Investigations
AssaysStudies
Towards Interoperable Bioscience Data, Nature Genetics, 2012
Standards, Structure, Interlink
Just Enough Results Model for things produced and used in experiments
http://www.fair-dom.org
Findable, Accessible, Interoperable, Reusable
Data, SOPs, Models, Methods
Multi-tenant CommonsPlatform
http://datafairport.org
Data discovery
Data assembly, cleaning, and refinement
Modeling
Statistical analysis
Data collection
InsightsInsights Scholarly Communication & Reporting
Scholarly Communication & Reporting
Material & Methods
BioSTIF
instruments and laboratory
Data discovery
Data assembly, cleaning, and refinement
Modeling
Statistical analysis
Data collection
InsightsInsights Scholarly Communication & Reporting
Scholarly Communication & Reporting
Material & Methods
Workflow Commons
"Mapping present and future predicted distribution patterns for a meso-grazer guild in the Baltic Sea" by Sonja Leidenberger et al
• 35 kinds of annotations• 5 Main Workflows• 14 Nested Workflows• 11 Configuration files• 25 Scripts• 10 Software dependencies • 1 Workflow management
system• 1 Web Service • Dataset: 90 galaxies observed
in 3 bands
José Enrique Ruiz (IAA-CSIC)
Galaxy Luminosity Profiling
Dependencies
Components
Rinse and Repeat Research
• Sweep Datasets
• Sweep Variables
• Sweep Steps
SHARING SENSITIVITYSENSITIVE SHARING
IMPLICATIONS FOR METRICSBEING FAIR
scientific ego-system for open sciencetrust, reciprocity, competition
famecompetitiveadvantage
productivitycredit
adoption kudos
for love
blamescooped uncredited misinterpretation scrutinyshameinsecuritycost/time/skillsdistractionresponsibilitydisruption staff churninertia
Howard Ratner, STM Innovations Seminar 2012was: Chair STM Future Labs Committee, CEO EVP Nature Publishing Group,
now: Director of Development for CHORUS (Clearinghouse for the Open Research of US)
http://www.youtube.com/watch?v=p-W4iLjLTrQ&list=PLC44A300051D052E5
http://www.myexperiment.org/packs/196.html
http://www.researchobject.org/
Outputs are first class citizens to be managed, credited and tracked: data, software
A Framework to Bundle and Relate multi-hosted (digital) resources of a scientific experiment or investigation using standard mechanisms & uniform access protocols. Carriers of Research Context
Research Objects
What is the RO Framework?
• A framework of models and conventions
• Representations
• API specifications
• Implementations mapped into legacy / commodity platforms
Knowledge TurnsUnit of Scholarly Currency, RO CommonsCirculate in the Scholarly EcosystemCitation? Credit? Link to Publishers?
Goble, De Roure, Bechhofer, Accelerating Knowledge Turns, I3CK, 2013
Goble, De Roure, Bechhofer, Accelerating Knowledge Turns, I3CK, 2013
Collaboration to support safe use of patient and research data for medical research
Farr CommonsResearch Object packages codes, study, and metadata to exchange coded descriptions of clinical study cohorts
Knowledge TurnsUnit of Scholarly Currency, RO CommonsCirculate in the Scholarly EcosystemCitation? Credit? Link to Publishers?arch, Discover, Index, Harvest, Port
Schopf, Treating Data Like Software: A Case for Production Quality Data, JCDL 2012
Goble, De Roure, Bechhofer, Accelerating Knowledge Turns, I3CK, 2013
Profile FocusBody of knowledge around methods, workflows, software, data, person, rather than publication.Citation, credit
Release ResearchEvolution, Emergence, Discourse, ThreadedComparison, Historical review, Anti-SalamiForks, Merges, Fixivity, Citation? Credit?Flow across groups, projects and articles
Schopf, Treating Data Like Software: A Case for Production Quality Data, JCDL 2012
Reproduce ResearchRepeat, Replicate, Recompute, Reuse….Entropy, Citation? Credit?
icanhascheezburger.com
Zhao, et al . Why workflows break - Understanding and combating decay in Taverna workflows, 8th Intl Conf e-Science 2012
Can I repeat & defend my results?
Can I review, reproduce and compare my results/method with your results/method?
Can I review, replicate and certify
your results?
Can I transfer your results into my
research and reuse this method?
Hettne et al Structuring research methods and data with the research object model: genomics workflows as a case study 2014 http://www.jbiomedsem.com/content/pdf/2041-1480-5-41.pdf
ReproduceRepeat, Replicate, Recompute, Reuse….Entropy, Citation? Credit?
icanhascheezburger.com
Checklists aka Minimum Information Models, Reporting GuidelinesMinim Checklist Ontology, http://purl.org/net/mim/ns
Zhao et. al. A Checklist-Based Approach for Quality Assessment of Scientific Information 3rd In. Workshop on Linked Science, 2013
Hettne et al Structuring research methods and data with the research object model: genomics workflows as a case study 2014 http://www.jbiomedsem.com/content/pdf/2041-1480-5-41.pdf
Checklists
Versio
nin
gPro
venance
Dependencies
Progressive
MetadataProfiles
Depth: how deeply described
Coverage: how much is covered.
More specialised detail, fewer
services
More Stakeholders & ServicesCitation minimum
LibraryPublishers
Experiments
Science
PROVPAVVoID GIT
PAVNISO-JATS
Docker
DC
EXPO, ISA, JERM, OBI
MIAME, SBML, SED-ML
wfdesc
MIM Ontology
wfprov
VIVO-ISF
PID
Standards
Machine-processable
Technology Independent
Multi-platform
Incremental
W3C OADM
DOIs
URIsHandles
ORCID
OAI-ORE
RRIDs
host
service
Open Source/Store
Sci as a Service
Integrative fws
Virtual Machines
Portable Packaging
ReproZip
Workflows,makefilesProvStore
OMEX archive
bundle
Nanopub: represents structured data along with its provenance in a single publishable and citable entry
Galaxy workflows: re-enact the analysis
Research Object: aggregates the (digital) resources contributing to findings of (computational) research (results, data and software) as citable compound digital objects
http://isa-tools.github.io/soapdenovo2/http://sandbox.wf4ever-project.org/portal/ro?ro=http://sandbox.wf4ever-project.org/rodl/ROs/SOAP2denovo2-Aureus/
[Alejandra Gonzalez-BeltranPhilippe Rocca-Serra]
• Id & Cite fluid things
• Uniform handling 1st class citizens
• Compound, multi-authored
• Mixed, leaky containers
• Span outcomes, evolve outputs, emergence
• Profiles• Bridge
researchers, platforms, resources
Bechhofer, Why linked data is not enough for scientists, DOI: 10.1016/j.future.2011.08.004
[Norman Morrison]
Focus on Personal
Productivity not Public
Good
Auto-magical
Stealthy not Sneakyreduce the frictioninstrumentation
Training Time
From made RO to born RO
• Open research is like Open software• Multi-part, multi-contributor, updating• Tardis & Commons • Implications for metrics? publishing?• Learning from open software development
http://www.force11.org
• Barend Mons• Sean Bechhofer• Philip Bourne• Matthew Gamble• Raul Palma• Jun Zhao• Alan Williams• Stian Soiland-Reyes• Paul Groth• Tim Clark• Juliana Freire• Alejandra Gonzalez-Beltran• Philippe Rocca-Serra• Ian Cottam• Susanna Sansone• James Howison• James Herbsleb• Kristian Garza
All the members of the Wf4Ever teamiSOCO: Intelligent Software Components S.A., SpainUniversity of Manchester, School of Computer Science, Manchester, United KingdomUniversity of Oxford, Department of Zoology, Oxford, UKPoznan Supercomputing and Networking Center. Poznan, PolandIAA: Instituto de Astrofísica de Andalucía, Granada, SpainLeiden University Medical Centre, Centre for Human and Clinical Genetics, The Netherlands
Colleagues in Manchester’s Information Management GroupRO Advisory Board Members
http://www.researchobject.orghttp://www.wf4ever-project.orghttp://www.fair-dom.orghttp://www.datafairport.org