susanna sansone at datacite: the isa-commons - experiences from the field

24
The ISA Commons: experiences from the field Susanna-Assunta Sansone, PhD Principal Investigator, Team Leader, University of Oxford e-Research Centre, Oxford, UK http://uk.linkedin.com/in/sasansone #biosharing DataCite Summer Meeting DIGITAL RESEARCH DATA IN PRACTICE: solutions for improving discovery, access and use June 14, 2012 Copenhagen bioscience !

Upload: gigascience-bgi-hong-kong

Post on 27-Jan-2015

106 views

Category:

Technology


0 download

DESCRIPTION

Susanna-Assunta Sansone's talk at the DataCite Summer meeting in Copenhagen on "The ISA-Commons - experiences from the field", 14th June 2012

TRANSCRIPT

Page 1: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

The ISA Commons: experiences from the field

Susanna-Assunta Sansone, PhD

Principal Investigator, Team Leader, University of Oxford e-Research Centre,

Oxford, UK

http://uk.linkedin.com/in/sasansone #biosharing

DataCite Summer Meeting DIGITAL RESEARCH DATA IN PRACTICE: solutions for improving discovery, access and use

June 14, 2012 Copenhagen

bioscience !

Page 2: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

•  Reproducible research •  annotated research data and methods offer new

discovery opportunities and prevent unnecessary repetition of work;

•  improved data sharing underpins science of the future; •  but !.. shared data have little or no value if they are

not interpretable and, consequently, reusable

Image from datacite.org

Page 3: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

3!

Reproducibility

Ioannidis et al., Repeatability of published microarray gene expression analyses. Nature Genetics 41(2), 149-55 (2009) doi:10.1038/ng.295

Page 4: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

4!

Reproducibility

Ioannidis et al., Repeatability of published microarray gene expression analyses. Nature Genetics 41(2), 149-55 (2009) doi:10.1038/ng.295

Page 5: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

5!

Reproducibility

Ioannidis et al., Repeatability of published microarray gene expression analyses. Nature Genetics 41(2), 149-55 (2009) doi:10.1038/ng.295

Page 6: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

6!

Reproducibility

6!

Page 7: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

7!

Across studies and groups

7!

Page 8: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

8!

8!

Reproducibility

Page 9: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

NO to ‘data blobs’

YES to verifiable, complete and structured information

Image from datacite.org

Page 10: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

!  Capture all salient features of the experimental workflow

!  Make annotation explicit and discoverable

!  Structure the descriptions for consistency, tracking !  independent variables !  dependent variables using !  cross reference and

resolvable identifiers

Structured description of datasets

Page 11: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

!  We must strike a balance between •  depth and breadth of

information; and •  sufficient information

required to reuse the data

Not too much, not too little, just ‘right’

Page 12: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

12

Example of experiments by InnoMed PredTox a FP6 public-private consortium

Page 13: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

Different community, different norms and standards, e.g.:

report the same core, essential information

use the same word and refer to the same ‘thing’ allow data to flow from

one system to another

Challenges: lack of coordination, fragmentation and uneven coverage

Page 14: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

Growing number of reporting standards

+ 130

Estimated

+ 150

Source: MIB

BI,

EQU

ATOR

+ 303

Source: BioPortal

MIAME!MIAPA!

MIRIAM!MIQAS!MIX!

MIGEN!

CIMR!MIAPE!

MIASE!

MIQE!

MISFISHIE….!

REMARK!

CONSORT!

MAGE-Tab!GCDML!

SRAxml!SOFT! FASTA!

DICOM!

MzML !SBRML!

SEDML…!

GELML!

ISA-Tab!

CML!

MITAB!

AAO!CHEBI!

OBI!

PATO! ENVO!MOD!

BTO !IDO…!

TEDDY!

PRO!XAO!

DO

VO!

Page 15: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

15

A catalogue to map the landscape of standards and the systems implementing them: Over 400 bio-standards (public and in curation)

Field*, Sansone* et al., Omics data sharing. Science 326, 234-36 (2009) doi:0.1126/science.1180598

Page 16: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

16

A catalogue to map the landscape of standards and the systems implementing them: Over 400 bio-standards (public and in curation)

Field*, Sansone* et al., Omics data sharing. Science 326, 234-36 (2009) doi:0.1126/science.1180598

Page 17: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

Source of the figure: EBI website

!  Bioscience is interdisciplinary and integrative in character •  need to deal with new and existing datasets •  deal with a variety of data types

Bioscience is not one domain!

!"#$%&'()'*

+,-*

&+'.!&*

'/("*

Page 18: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

Is it possible to achieve a common, structured

representation of diverse bioscience experiments that:

•  transcends individual bioscience domains, but also

•  follows the appropriate community norms and

standards?

Page 19: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards-compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including:

•  environmental health •  environmental genomics •  metabolomics •  metagenomics •  nanotechnology •  proteomics,

We aim to achieve a common representation of experimental content that transcends individual bioscience domains

Sansone et al., Towards interoperable bioscience data. Nature Genetics 44, 121-126 (2012) doi:10.1038/ng.1054

•  stem cell discovery •  system biology •  transcriptomics •  toxicogenomics •  also by communities working to build

a library of cellular signatures

Page 20: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards-compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including:

•  environmental health •  environmental genomics •  metabolomics •  metagenomics •  nanotechnology •  proteomics

Nanotechnology Informatics Working

Group

Some of the internal projects: Some of the public groups/resources:

4

Stem Cell Commons

Stem Cell Commons

•  stem cell discovery •  system biology •  transcriptomics •  toxicogenomics •  also by communities working to build

a library of cellular signatures

Page 21: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards-compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including:

•  environmental health •  environmental genomics •  metabolomics •  metagenomics •  nanotechnology •  proteomics

Nanotechnology Informatics Working

Group

Some of the internal projects: Some of the public groups/resources:

4

Stem Cell Commons

Stem Cell Commons

•  stem cell discovery •  system biology •  transcriptomics •  toxicogenomics •  also by communities working to build

a library of cellular signatures

Page 22: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

Metadata tracking framework, designed to support the use us several standards checklists, terminologies conversions to (a growing number of) other metadata formats, used by public repositories, e.g. Currently finalizing conversion to RDF to explore the growing Linked Data universe, in collaboration with the W3C HCLSIG)

MAGE-Tab Pride-xml

SRA-xml SOFT

Page 23: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

23

empowering researchers to use standards

To mint DOIs

Page 24: Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

www.biosharing.org

www.isacommons.org

TOWARDS INTEROPERABLE BIOSCIENCE DATA

Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B, Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S, Evelo C, Forster M, Gaudet P, Gilbert J, Goble C, Griffin J, Jacob D, Kleinjans J, Harland L, Haug K, Hermjakob H, Sui S, Laederach A, Liang S, Marshall S, Merrill E, McGrath A, Reilly D, Roux M, Shamu C, Shang C, Steinbeck C, Trefethen A, Williams-Jones B, Wolstencroft K, Xenarios J, Hide W.

Feb 2012 www.isacommons.org

doi:10.1038/ng.1054

Development timeline!

Community involvement and uptake !

Core developments!

2008 2009 2010

1st ISA-Tab workshop!3rd ISA-Tab workshop !

2nd ISA-Tab workshop !

Final ISA-Tab spec! Database instance !at EBI!

ISA software v1!

2011

1st public instance: !Harvard Stem Cell !Discovery Engine!

RDF format starts!

Conversions to !Pride-XML/SRA-XML/!MAGE-Tab and more!

User workshops/visits - start!Growing number of systems starts to adopt ISA-Tab!

Publications!

‘Omics data sharing!(Science)!

ISA-Tab and !ISA software suite!(Bioinformatics)!

Stem Cell !Discovery !Engine!(NAR)!

2007 2012

Strawman ISA-Tab spec !

Other tools implement !ISA-Tab!

Workshop reports!ISA Commons!(Nature Genetics)!

Links to analysis tools starts!

!