from peer-reviewed to peer-reproduced: a role for research objects in scholarly publishing in the...
TRANSCRIPT
From peer-reviewed to peer-reproduced: a role for research objects in scholarly
publishing in the life sciences
Alejandra González-BeltránOxford e-Research Centre, University of Oxford
-ontology.org
Bioinformatics Open Source Conference (BOSC), Dublin, Ireland
July 10-11 2015
"AGBell Notebook" by Alexander Graham Bell. (d. 1922) - page 40-41 of Alexander Graham Bell Family Papers in the Library of Congress' Manuscript Division.
Licensed under Public Domain via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:AGBell_Notebook.jpg#/media/File:AGBell_Notebook.jpg
http://petcaretips.net/bonding-rabbit-to-pets.html
Many things have been said about the challenges of
science reproducibilityand how it can go wrong…
Difficulties when the description of the experimental steps
is only available in lab notebooks and scientific articles;
lack of data, lack of software tools required for analysis
Can data models and computational workflows help in capturing the experimental processes and reproduce findings?
How?
experimentaldescription
(design & steps)
conclusions
computational workflows
aggregation & workflow preservation
Can data models and computational workflows help in capturing the experimental processes and reproduce findings?
How?
Can data models and computational workflows help in capturing the experimental processes and reproduce findings?
How?
Can data models and computational workflows help in capturing the experimental processes and reproduce findings?
How?
• open peer-review• availability of
• data • analysis scripts• documentation
Evaluation of SOAPdenovo2 tool for the de novo assembly of genomes from small DNA segments reads by next generation sequencing, implementing improvements over SOAPdenovo1 assembler.
pre-publication history
https://github.com/aquaskyline/SOAPdenovo2
http://sourceforge.net/projects/soapdenovo2/
Experimental Description
Experimental DescriptionEXCELERATE interoperability component
http://www.ncbi.nlm.nih.gov/books/NBK279831/
http://elixir-uk.org/interoperability-infrastructure
genomeassemblyalgorithm
genomesize
Predictor Variables (Factor Name, Factor Type)
The experimental plan - computational case
genomeassemblyalgorithm
genomesize
SOAPdenovo2
SOAPdenovo1
ALL-PATHS-LG
bacterial genome
insect genome
human genome
Predictor Variables (Factor Name, Factor Type)
The experimental plan - computational case
genomeassemblyalgorithm
genomesize
SOAPdenovo2
SOAPdenovo1
ALL-PATHS-LG
bacterial genome
insect genome
human genome
bacterial genome
insect genome
human genome
bacterial genome
insect genome
human genome
Predictor Variables (Factor Name, Factor Type)
3x3 factorial design9 study groups
The experimental plan - computational case
genomeassemblyalgorithm
genomesize
SOAPdenovo2
SOAPdenovo1
ALL-PATHS-LG
bacterial genome
insect genome
human genome
bacterial genome
insect genome
human genome
bacterial genome
insect genome
human genome
Predictor Variables (Factor Name, Factor Type)
The experimental plan - computational case
S. aureus
R. sphaeroides
B. impatiens
Chinese Han genome (or YH genome)
genomeassemblyalgorithm
genomesize
SOAPdenovo2
SOAPdenovo1
ALL-PATHS-LG
bacterial genome
insect genome
human genome
bacterial genome
insect genome
human genome
bacterial genome
insect genome
human genome
Predictor Variables (Factor Name, Factor Type)
The experimental plan - computational case
Response Variables (with units)
genome coverage (%)
computation run time (h)
peak memory consumption (Gb)
contig N50 (kb or bp)
scaffold N50 (kb or bp)
number of errors
The experimental steps
Unambiguous identification of resources (e.g. record from public repositories); persistent identifiers if available (ORCIDs, DOIs); we suggest a dedicated article section
Experimental workflows - identification of processes, their inputs and outputsExperimental design: identify experimental goal, independent and response variables
The experimental steps
Unambiguous identification of resources (e.g. record from public repositories); persistent identifiers if available (ORCIDs, DOIs); dedicated article section
Experimental workflows - identification of processes, their inputs and outputsExperimental design: identify experimental goal, independent and response variables
Reproducing SOAPdenovo2 results with Galaxy workflows
S. aureus pipeline
Reproducing SOAPdenovo2 results with Galaxy workflows
S. aureus pipeline
2241 400
30
119.0 11 106 24 68
0
Reproducing SOAPdenovo2 results with Galaxy workflows
Publishing findings as nanopublications
assertion
provenance
publication info
nanopublication A NP represents structured data along with its provenance in a single publishable and citable entity
Publishing findings as nanopublications
assertion
provenance
publication info
nanopublication A NP represents structured data along with its provenance in a single publishable and citable entity
Abstract & Conclusions
assertion provenance
Generation of nanopublications for all the results of the response variablesNanoMaton
templates for nanopublications
Prevent priming; report all findings corresponding to the identified response variables
Remain neutral and report all findings of similar importance with the same weight
“genome coverage increased over the human data when comparing SOAPdenovo2 against SOAPdenovo1”
Link conclusions to experimentaldescription
http://www.researchobject.org/
Aggregation and workflow preservation as
ResearchObject: enables the aggregation of the digital
resources contributing to findings of computational
research, including results, data and software, as citable
compound digital objects
http://isa-tools.github.io/soapdenovo2
Aggregation and workflow preservation as
http://www.researchobject.org/
From narrative to self-described structured data
Model & workflow assisted experimental description and review processDepth and breadth of semantic resources, clear meaning of experimental elements
Ruibang Luo, University of Hong Kong
Tin-Lap Lee, Chinese University of Hong Kong
Tak-wah Lam, University of Hong Kong
SOAPdenovo2
Scott Edmunds, GigaSciencePeter Li, GigaScience
Marco Roos, Leiden University
Mark Thompson, Leiden University
Rajaram Kaliyaperumal, Leiden University
Eelke van der Horst, Leiden University
Jun Zhao, Lancaster University
María Susana Avila García, Oxford University
Philippe Rocca-Serra, Oxford UniversitySusanna-Assunta Sansone, Oxford University
Alejandra Gonzalez-Beltran, Oxford University
Team
Questions?You can email us...
View our bloghttp://isatools.wordpress.com
Follow us on Twitter@isatools
View our websites
View our Git repo & contributehttp://github.com/ISA-tools
Thanks for your attention!