transmart isa-june2012
DESCRIPTION
TRANSCRIPT
Managing Experimental Metadata using ISA data structures
TranSMART-ISA TeleconferenceJune 19th, 2012
Philippe Rocca-Serra Ph.D
on the behalf of the ISA Team, University of Oxfordhttp://www.isa-tools.org; http://github.com/ISA-toolshttp://isacommons.org/[email protected]
Tuesday, 19 June 2012
Capture all salient features of the experimental workflow
Make annotation explicit and discoverable
Structure the descriptions for consistency, tracking independent variables dependent variables
using cross reference and resolvable
identifiers
Why ISA format and Tools?
Tuesday, 19 June 2012
Why ISA format and Tools?
–Supporting data provenance tracking–Node/Edge underlying concept–Tabular as a compromise: a presentation layer inspired by
Object model (FuGE,MAGE-OM)–A Generic representation, applied to:
•microarray based experiments (MAGE)• sequencing based experiments (SRA)•flow cytometry based experiments (FuGE-Flow Cyt)•mass spectrometry and NMR spectroscopy experiments
Tuesday, 19 June 2012
TranSMART-ISA TeleconferenceJune 19th, 2012
Why ISA format and Tools?
investigation
assay(s) assay(s)
data data
external files in native or other for-
mats
pointers to data file names/location
investigationhigh level concept to link related studies
studythe central unit, containing information on the subject under study, its characteristics and any treatments applied.a study has associated assays
assaytest performed either on material taken from the sub-ject or on the whole initial subject, which produce quali-tative or quantitative meas-urements (data)
H. Sapiens
33 Years
H. Sapiens
H. Sapiens
H. Sapiens
H1
H1
H2
35
35
33
Years
Years
Years
H1.sample1
H1.sample2
H2.sample1
Labeling
Labeling
H1.sample1.labeled
H2.sample1.labeled
h1-s1.cel
h1-s2.cel
h2-s1.cel
H1
H2
H1.sample1
H1.sample2
H2.sample1
Labeling
Labeling
H1.sample1.labeled
H2.sample1.labeled
h1-s1.cel
h1-s2.cel
h2-s1.cel
H. Sapiens
35 Years
MAGE-Tab Pride-xml
SRA-xml
ISA metadata specifications:•workflow and process orientated•compatible with checklist enforcement•compatible with external vocabulary resources•compatible by design with existing schemas
Currently finalizing conversion to RDF to explore the growing Linked Data universe, in collaboration with the W3C HCLSIG, Toxbank Consortium)
Tuesday, 19 June 2012
ISA syntax and Table definition
• Material Transformations: – Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled
Extract Name.)
Material Node Material Node
Protocol REF
Parameter Value […]
Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]
Characteristics[…]Factor Value[…] (independent variables)Material TypeComment[…]
5
Date (day effect)
Performer (operator effect)
TranSMART-ISA TeleconferenceJune 19th, 2012
Tuesday, 19 June 2012
ISA syntax and Table definition
• Data Acquisition & Data Transformations:– Input are Materials or Data and Outputs Data Nodes (Raw Data File, Derived Data File, Derived Array Data
Matrix File)
Protocol REF
Material Node Data File Node
Parameter Value […]
Comment[…]Characteristics[…]Factor Value[…] (independent variables)Comment[…]Material Type
6
Date (day effect)
Performer (operator effect)
TranSMART-ISA TeleconferenceJune 19th, 2012
Tuesday, 19 June 2012
Nanotechnology Informatics Working
Group
Some of the internal projects:Some of the public groups/resources:
A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards-compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including:•environmental health•environmental genomics•metabolomics•metagenomics•nanotechnology•proteomics
• stem cell discovery• system biology• transcriptomics• toxicogenomics• also by communities working to build a library of
cellular signatures
Who uses ISA format and Tools?
Tuesday, 19 June 2012
www.biosharing.org www.isacommons.org
Towards interoperable bioscience data
Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B, Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S, Evelo C, Forster M, Gaudet P, Gilbert J, Goble C, Griffin J, Jacob D, Kleinjans J, Harland L, Haug K, Hermjakob H, Sui S, Laederach A, Liang S, Marshall S, Merrill E, McGrath A, Reilly D, Roux M, Shamu C, Shang C, Steinbeck C, Trefethen A, Williams-Jones B, Wolstencroft K, Xenarios J, Hide W.
Feb 2012www.isacommons.org
doi:10.1038/ng.1054
Development timeline
Community involvement and uptake
Core developments
2008 2009 2010
1st ISA-Tab workshop 3rd ISA-Tab workshop
2nd ISA-Tab workshop
Final ISA-Tab spec Database instance at EBI
ISA software v1
2011
1st public instance: Harvard Stem Cell Discovery Engine
RDF format starts
Conversions to Pride-XML/SRA-XML/MAGE-Tab and more
User workshops/visits - start
Growing number of systems starts to adopt ISA-Tab
Publications
‘Omics data sharing(Science)
ISA-Tab and ISA software suite(Bioinformatics)
Stem Cell Discovery Engine(NAR)
2007 2012
Strawman ISA-Tab spec
Other tools implement ISA-Tab
Workshop reports ISA Commons(Nature Genetics)
Links to analysis tools starts
Tuesday, 19 June 2012
The ISA tools... modular with a suite of supporting tools
Create
Experimentalist uses editor to report investigation.
Configure
Curator creates template
Validate
Convert from ISA
Check adherance to template
Users browse investigations, query and view experimental metadata, and access associated data files
Curator stores metadata in database using BII data management tool
Load
Convert to MAGE-TAB, PRIDE-ML, SRA-XML for submission to international public repositories
Browse
Requires Configuration XMLPerform analysis of data in context with the metadata using the Galaxy or R analysis engines.
Analyze
isacreator
converter
Convert to ISA
Convert from MAGE-Tab to ISATab. More formats coming soon...
converter
TranSMART-ISA TeleconferenceJune 19th, 2012
Tuesday, 19 June 2012
Create configuration xml files
TransMART-ISA TeleconferenceJune 19th, 2012
Tuesday, 19 June 2012
The ISAconfigurator...
TranSMART-ISA TeleconferenceJune 19th, 2012
Tuesday, 19 June 2012
The ISAconfigurator...
TranSMART-ISA TeleconferenceJune 19th, 2012
Tuesday, 19 June 2012
Use of the configuration xml
In technical terms, configuration xml schema (XSD) is consumed by an XML beans goal in maven and Java stubs are created which are then used to load the XML files into memory
The configuration is also used to define the form view using a similar mechanism....
<xml><field>sample</field><field>protocol ref</field><field>extract name</field><field>label</field>...</xml>
Java ObjectTableReferenceObject
XML definition(s) Import into Java Object Model using classes created by XML beans
Construct spreadsheet model. Columns, rows, etc.
Assign cell editors. Ontology terms are given the ontology selection tool as a cell editor, file fields are given a file chooser etc.
TranSMART-ISA TeleconferenceJune 19th, 2012
Tuesday, 19 June 2012
isacreatorCreate & Edit ISA-Tab
TranSMART-ISA TeleconferenceJune 19th, 2012
Tuesday, 19 June 2012
Data Reporting Scenarios
1. Starting from scratch: spreadsheet function2. Mapping from 3rd-party tab data: mapping/ETL tool3. Templating based on study design information: wizard(*)
(*)(“early intervention is best”)
TranSMART-ISA TeleconferenceJune 19th, 2012
Tuesday, 19 June 2012
isacreator
Developed to be a user friendly way to enter standards-compliant metadata: it has lots of features...
But these are just some of them...we also have a data entry wizard and an import utility...
TranSMART-ISA TeleconferenceJune 19th, 2012
The ISAcreator...
Tuesday, 19 June 2012
Ontologies in ISAcreator
We use the NCBO Bioportal and the EBI’s OLS to do searching and browsing on ontologies.
Ontology Resource ManagerThe resource manager provides seamless searching of ontology resources, regardless of their origins, their underlying
data schema or the mechanism (REST, SOAP or local file store) through which they are accessed.
NCBOBioPortal
Ontology Lookup Service (OLS)
Plugin
Ontology browsing & searching
Ontology tagging
Search, Hierarchy and Annotator services
Ontology field restriction
ISAcreator manages ontology metadata such as version information as well as individual term accessions, source, uri and so forth.
Ontology search code is usable outside of ISAcreator. In fact, the ISAconfigurator imports ISAcreator as a maven dependency and reuses it’s components to do ontology restriction...plugins can also make use of our ontology search and browse functionalities
TranSMART-ISA TeleconferenceJune 19th, 2012
Tuesday, 19 June 2012
Plugins in ISAcreator
•Plugins can be developed for 3 different purposes:
In ISAcreator, we use the Apache Felix implementation of the OSGi framework...it’s really good.
Search (adds extra search space for ontology tool)
Custom cell editors (for spreadsheet)
Extra general functionality (which appears in a plugin menu)
•2 Examples of ISA plugins:
• Access to local metadata stores: Novartis Plugin to Ontology Widget
• Annotation of findings: Metabolite Identification Plugin (Metabolights Repository
contribution to ISA project).
TranSMART-ISA TeleconferenceJune 19th, 2012
Tuesday, 19 June 2012
Plugins...example Novartis Metastore Search
Search function on the Novartis Metastore... integrates search results on the metastore in the Ontology search tool.
So, with the Novartis plugin in your Plugin directory, you’ll be able to search the Novartis metastore directly within ISAcreator, and it will handle all the tasks involved with recording term source, etc.
TranSMART-ISA TeleconferenceJune 19th, 2012
Tuesday, 19 June 2012
ISAcreator - Metabolite Identification plugin
5 Credits: Kenneth Haug: Metabolights
Tuesday, 19 June 2012
Summary• All Open Source, Open Access Project (https://github.com/ISA-tools)
• OSGI Plugin Architecture: Apache Felix
• Ontology Support: Select, Browse, Tag from public or private metadata stores
• Annotation of Molecular finding: Metabolite Identification Plugin for ISAcreator
• Several libraries (java, python, perl, R,) for parsing ISA files.
• Integration with R: R-ISATAB package
TranSMART-ISA TeleconferenceJune 19th, 2012
Tuesday, 19 June 2012
Summary: TransMART - ISA
• ISA Study maps to TransMart
• Samples and Timepoint
• Study Groups
• Subject Demographics
• ISA assays map to TransMART Biomarkers
• ISA already has configurations supporting OMICS data:
• microarray
• NGS
• RNA-Seq, ChIP-Seq, MeDIP-Seq
• microbial diversity
• protein/metabolite profiling using Mass spectrometry
TranSMART-ISA TeleconferenceJune 19th, 2012
Tuesday, 19 June 2012
Why integrating ISA with tranSMART ?
• Susie Stephens (J&J): "A use case: someone was viewing results of analyses in TranSMART, and then wanted to go back to the raw or processed data and the experimental information in the ISA system. Or where results make a scientist curious to know whether a different/similar data set exists”
• Michael R. Barnes (Director of Bioinformatics, Queen Mary University of London): "We are now quite bought in to TranSMART as we will be running it for a large funded MRC collaboration. The benefit of interoperability between TranSMART and ISA tools would be self evident. The fewer different standards used in a workflow the better, although TranSMART might be able to integrate diverse data sources, if the sources don't all contain the same fields then combined analysis is reduced to the common denominator fields between data sets. ISA-Tab could be a 'standard of choice' for TranSMART, although it could not be an exclusive standard."
TranSMART-ISA TeleconferenceJune 19th, 2012
Tuesday, 19 June 2012
Preparing for Linked Open Data
✴ ISA2RDF (Toxbank collaboration) contribution to an ecosystem of software tools supporting the ISA syntax
✴ reliance to internet resolvable identifiers
✴ W3C bio/life science Note on Gene Expression RDF - (PMID: 22449719)
✴ TODO:
✴ Specify comparator groups + analysis methods and resulting measurements and statistical measures
TranSMART-ISA TeleconferenceJune 19th, 2012
Tuesday, 19 June 2012
Preparing for Linked Open Data
✴ ISA2RDF (Toxbank collaboration) contribution to an ecosystem of software tools supporting the ISA syntax
✴ reliance to internet resolvable identifiers
✴ W3C bio/life science Note on Gene Expression RDF - (PMID: 22449719)
✴ TODO:
✴ Specify comparator groups + analysis methods and resulting measurements and statistical measures
TranSMART-ISA TeleconferenceJune 19th, 2012
Tuesday, 19 June 2012
Preparing for Linked Open Data
✴ ISA2RDF (Toxbank collaboration) contribution to an ecosystem of software tools supporting the ISA syntax
✴ reliance to internet resolvable identifiers
✴ W3C bio/life science Note on Gene Expression RDF - (PMID: 22449719)
✴ TODO:
✴ Specify comparator groups + analysis methods and resulting measurements and statistical measures
TranSMART-ISA TeleconferenceJune 19th, 2012
Tuesday, 19 June 2012
TranSMART-ISA TeleconferenceJune 19th, 2012
Our next steps...as a community
Analysis
blood serum
SCAN
HYB
TRANS
LABEL
EX
SAMP
SCAN
TRANS
SAMP
missing protocols and no information about what was being measured.
well described process from sample to data file.
Making visual comparisons is straightfor-ward using this approach. The longest path is constructed based on all other known datasets in the pool of workflows being compared.
liver kidney blood serum blood plasma
low doseaspirin
SCAN
HYB
TRANS
LABEL
EX
SAMP
SCAN
TRANS
EX
SAMP
SCAN
HYB
TRANS
LABEL
EX
SAMP
SCAN
HYB
TRANS
LABEL
EX
SAMP
SCAN
HYB
TRANS
LABEL
EX
SAMP
SCAN
HYB
TRANS
LABEL
EX
SAMP
SCAN
HYB
TRANS
SAMP
SCAN
TRANS
SAMP
liver kidney blood serum blood plasma
kidney
x5 x5 x5 x5
x5 x5 x5 x5
x5 x5
RDF export & Visualisation Further adoption
Tuesday, 19 June 2012
TranSMART-ISA TeleconferenceJune 19th, 2012
Questions??
You can email [email protected]
View our bloghttp://isatools.wordpress.com
Follow us on Twitter@isatools
View our websitehttp://www.isa-tools.org
Thanks for listening...
View our Git repo & contributehttp://github.com/ISA-tools
Tuesday, 19 June 2012