ismb workshop 2014
DESCRIPTION
This talk explores how principles derived from experimental design practice, data and computational models can greatly enhance data quality, data generation, data reporting, data publication and data review.TRANSCRIPT
![Page 1: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/1.jpg)
What was the plan? A role for data standards, models and computational
workflows in scholarly data publishing
Alejandra González-Beltrán, PhD Philippe Rocca-Serra, PhD Oxford e-Research Centre, University of Oxford
{alejandra.gonzalezbeltran,philippe.rocca-serra}@oerc.ox.ac.uk
ISMB Workshop: What Bioinformaticians need to know about
digital publishing beyond the PDF2
July15th, 2014 Boston, USA
![Page 2: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/2.jpg)
Data Scientist
Visualization
Analysis
Planning
Data Management
Data CollectionPublication
Use existing data
Perform new experiment
The experimental workflow
![Page 3: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/3.jpg)
Data Scientist
Visualization
Analysis
Planning
Data Management
Data CollectionPublication
Use existing data
Perform new experiment
The experimental workflow
metadata
![Page 4: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/4.jpg)
Data Scientist
Visualization
Analysis
Planning
Data Management
Data CollectionPublication
Use existing data
Perform new experiment
The experimental workflow
metadata
![Page 5: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/5.jpg)
Data Scientist
Visualization
Analysis
Planning
Data Management
Data CollectionPublication
Use existing data
Perform new experiment
Data Interoperability
The experimental workflow
Reproducibility
Data Review
![Page 6: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/6.jpg)
The experimental workflow
Data Scientist
Visualization
Analysis
Planning
Data Management
Data CollectionPublication
Use existing data
Perform new experiment
Data Scientist
Visualization
Analysis
Planning
Data Management
Data CollectionPublication
Use existing data
Perform new experiment
Data Reusability
![Page 7: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/7.jpg)
The experimental plan - life sciences case
experimental design!sample characteristic(s)!
experimental variable(s)!
2-week systemic rat study using male Wistar rats (N=15 per dose group)
14 proprietary drug candidates from participating companies and 2 reference toxic compounds
InnoMed PredTox Project
![Page 8: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/8.jpg)
The experimental plan - life sciences case
experimental design!sample characteristic(s)!
experimental variable(s)!
technology(s)!measurement(s)!protocols(s)!data file(s)!…!
![Page 9: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/9.jpg)
The experimental plan - computational case
•open peer-review •availability of
•data •analysis scripts •documentation
Evaluation of SOAPdenovo2 tool for the de novo assembly of genomes from small DNA segments reads by next generation sequencing, implementing improvements over SOAPdenovo1 assembler.
![Page 10: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/10.jpg)
genome assembly algorithm
genome size
Predictor Variables!(Factor Name, Factor Type)
The experimental plan - computational case
![Page 11: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/11.jpg)
genome assembly algorithm
genome size
SOAPdenovo2
SOAPdenovo1
ALL-PATHS-LG
Predictor Variables!(Factor Name, Factor Type)
The experimental plan - computational case
![Page 12: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/12.jpg)
genome assembly algorithm
genome size
SOAPdenovo2
SOAPdenovo1
ALL-PATHS-LG
bacterial genome
insect genomehuman genome
Predictor Variables!(Factor Name, Factor Type)
The experimental plan - computational case
![Page 13: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/13.jpg)
genome assembly algorithm
genome size
SOAPdenovo2
SOAPdenovo1
ALL-PATHS-LG
bacterial genome
insect genomehuman genome
bacterial genome
insect genomehuman genomebacterial genome
insect genomehuman genome
Predictor Variables!(Factor Name, Factor Type)
3x3 factorial design 9 study groups
The experimental plan - computational case
![Page 14: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/14.jpg)
genome assembly algorithm
genome size
SOAPdenovo2
SOAPdenovo1
ALL-PATHS-LG
bacterial genome
insect genomehuman genome
bacterial genome
insect genomehuman genomebacterial genome
insect genomehuman genome
Predictor Variables!(Factor Name, Factor Type)
The experimental plan - computational case
S. aureusR. sphaeroides
B. impatiens
Chinese Han genome (or YH genome)
![Page 15: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/15.jpg)
genome assembly algorithm
genome size
SOAPdenovo2
SOAPdenovo1
ALL-PATHS-LG
bacterial genome
insect genomehuman genome
bacterial genome
insect genomehuman genomebacterial genome
insect genomehuman genome
Predictor Variables!(Factor Name, Factor Type)
The experimental plan - computational case
Response Variables!
genome coverage
computation run time
memory consumption
![Page 16: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/16.jpg)
http://www.am
a-roch
ester.o
rg/W
P/wp-co
nten
t/up
load
s/20
13/01/three-pillars.png
![Page 17: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/17.jpg)
17
A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework (ISA-Tab and/or tools) to facilitate standards-compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including: !
• stem cell discovery • system biology • transcriptomics • toxicogenomics • also by communities working to build a library of cellular
signatures
!• environmental health • environmental genomics • metabolomics • metagenomics • nanotechnology • proteomics
![Page 18: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/18.jpg)
General-purpose, configurable format designed to support: !• description of the experimental metadata, making the annotation explicit and discoverable !• provenance tracking !
• use of community standards, such as minimal reporting guidelines and terminologies !• designed to be converted to - a growing number of - other metadata formats, e.g. used by the European Bioinformatics Institute (EBI) repositories !
![Page 19: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/19.jpg)
H. Sapiens
H. Sapiens
H. Sapiens
H1
H1
H2
35
35
33
Years
Years
Years
H1.sample1
H1.sample2
H2.sample1
Labeling
Labeling
H1.sample1.labeled
H2.sample1.labeled
h1-s1.cel
h1-s2.cel
h2-s1.cel
Scanning
Scanning
Scanning
...
H. Sapiens
33 Years
H1
H2
H1.sample1
H1.sample2
H2.sample1
Labeling
Labeling
H1.sample1.labeled
H2.sample1.labeled
h1-s1.cel
h1-s2.cel
h2-s1.cel
H. Sapiens
35 Years
Scanning
Scanning
Scanning
...
...
...
![Page 20: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/20.jpg)
H. Sapiens
H. Sapiens
H. Sapiens
H1
H1
H2
35
35
33
Years
Years
Years
H1.sample1
H1.sample2
H2.sample1
Labeling
Labeling
H1.sample1.labeled
H2.sample1.labeled
h1-s1.cel
h1-s2.cel
h2-s1.cel
Scanning
Scanning
Scanning
...
H. Sapiens
33 Years
H1
H2
H1.sample1
H1.sample2
H2.sample1
Labeling
Labeling
H1.sample1.labeled
H2.sample1.labeled
h1-s1.cel
h1-s2.cel
h2-s1.cel
H. Sapiens
35 Years
Scanning
Scanning
Scanning
...
...
...
obi:material entity
obi:material sample
obi:material processing
obi:processed material
obi:planned process
isa:raw data file
bfo:derives from
![Page 21: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/21.jpg)
![Page 22: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/22.jpg)
![Page 23: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/23.jpg)
http://gigasciencejournal.com
http://gigadb.org/dataset/100035
![Page 24: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/24.jpg)
http://gigasciencejournal.com
http://gigadb.org/dataset/100035
![Page 25: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/25.jpg)
Experimental metadata
or structured component
(in-house curated, machine-readable
formats)
Article or narrative
component (PDF and HTML)
A new online-only publication for descriptions of scientifically valuable datasets in the life, environmental and biomedical sciences, but not limited to these!
Credit for sharing your data
Focused on reuse and reproducibility
Peer reviewed, curated
Promoting Community Data Repositories
Open Access
![Page 28: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/28.jpg)
SOAPdenovo2
http://isa-tools.github.io/soapdenovo2
Galaxy workflows to re-enact the data analysis
![Page 29: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/29.jpg)
http://isa-tools.github.io/soapdenovo2
SOAPdenovo2
Nanopub: represents structured data along with its
provenance in a single publishable and citable entity
![Page 30: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/30.jpg)
http://isa-tools.github.io/soapdenovo2
SOAPdenovo2
ResearchObject: enables the aggregation of the digital
resources contributing to findings of computational
research, including results, data and software, as citable
compound digital objects
![Page 31: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/31.jpg)
Reproducing SOAPdenovo2 results Galaxy workflows
S. aureus pipeline
![Page 32: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/32.jpg)
Reproducing SOAPdenovo2 results Galaxy workflows
![Page 33: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/33.jpg)
Reproducing SOAPdenovo2 results Galaxy workflows
![Page 34: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/34.jpg)
2241 400
30
119.0 11 106 24 68
0
Reproducing SOAPdenovo2 results Galaxy workflows
![Page 35: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/35.jpg)
“genome coverage increased over the human data when comparing SOAPdenovo2 against SOAPdenovo1”!
Response Variables!
genome coverage
computation run time
memory consumption
![Page 36: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/36.jpg)
OntoMaton:(a(Bioportal(powered(Ontology(widget(for(Google(
Spreadsheets(Maguire(et(al,((2013(
Bioinforma?cs(
widget for ontology
annotation and tagging on
Google spreadsheets
relying on BioPortal and Linked Open Vocabularies
services
![Page 37: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/37.jpg)
OntoMaton:(a(Bioportal(powered(Ontology(widget(for(Google(
Spreadsheets(Maguire(et(al,((2013(
Bioinforma?cs(
widget for ontology
annotation and tagging on
Google spreadsheets
relying on BioPortal and Linked Open Vocabularies
services
NanoMaton https://github.com/ISA-tools/NanoMaton
Ontology for Biomedical Investigations
SemanticsScience Integrated Ontology
![Page 38: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/38.jpg)
Data Scientist
Visualization
Analysis
Planning
Data Management
Data CollectionPublication
Use existing data
Perform new experiment
Findable, Accessible, Interoperable, Reusable!FAIR data
![Page 39: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/39.jpg)
Contributing to !Metabolights and ISA
• BBRSC UK-China Award & BGI funded Hackathon!• venue: BGI Hong-Kong!• Participants:!
• Metabolights/BGI/ISA/Birmingham/Hong-Kong University!
• Outcome: !• ISAtab web viewer code!• Functional Specifications & Code for DoE
Wizard API!• 4 datasets coded in ISA format!• Conversion Metabolights datasets to RDF
![Page 40: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/40.jpg)
![Page 41: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/41.jpg)
funders
acknowledgements
Scott Edmunds, GigaScience
Peter Li, GigaScience
Jun Zhao, Lancaster University
María Susana Avila García, Oxford University
Marco Roos, Leiden UniversityMark Thompson, Leiden University
Ruibang Luo, University of Hong Kong
Tin-Lap Lee, Chinese University of Hong Kong
Tak-wah Lam, University of Hong Kong
![Page 42: ISMB Workshop 2014](https://reader034.vdocuments.site/reader034/viewer/2022042623/54c63cf14a795920538b4696/html5/thumbnails/42.jpg)
Questions?You can email us...
View our blog http://isatools.wordpress.com
Follow us on Twitter @isatools
View our websites
View our Git repo & contribute http://github.com/ISA-tools
Thanks for your attention!