isa-tab proposalisatab.sourceforge.net/docs/isa-tab-specifications_v0… · web viewthis document,...

24

Click here to load reader

Upload: buitruc

Post on 26-Jul-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ISA-TAB proposalisatab.sourceforge.net/docs/ISA-TAB-specifications_v0… · Web viewThis document, part of the , is a . working draft. authored

ISA-TAB Proposal – draft v0.1

ISA-TAB Proposal – draft v0.1

This document, part of the <<ISA-TAB_package_v0.1_zip>>, is a working draft authored by an initial group (1). Many issues still need to be resolved. Comments and suggestions should be sent to Philippe Rocca-Serra: [email protected]. A project page and mailing list is being created to progress the developments.

This work builds on the existing paradigm development for the MAGE-TAB, - a tab-delimited format to exchange microarray data (2). Before reading this document, an in-depth knowledge of the MAGE-TAB specification is required (3).

1. Background.................................................................................................................................................... 21.1 Purpose..................................................................................................................................................... 21.2 Rationale................................................................................................................................................... 21.3 Definitions.................................................................................................................................................. 21.4 ISA-TAB Structure – overview and examples............................................................................................3

1.4.1 Reference file................................................................................................................................ 31.4.3 Study file....................................................................................................................................... 31.4.4 Assay file...................................................................................................................................... 3

1.5 Relations with MAGE-TAB and biomedical tabular formats.......................................................................31.6 Minimal content and terminology...............................................................................................................42 References..................................................................................................................................................... 43 ISA-TAB Structure - details............................................................................................................................ 63.1 Reference file............................................................................................................................................. 6

3.1.1 Contact Section............................................................................................................................. 73.1.2 Protocol Section............................................................................................................................ 83.1.3 Factor Section............................................................................................................................... 83.1.4 Measurements/Endpoints Section................................................................................................93.1.5 Publication Section....................................................................................................................... 93.1.6 Ontology Source Section..............................................................................................................93.1.7 ISA-TAB Files Section.................................................................................................................. 9

3.3 Study file.................................................................................................................................................. 113.3.1 First Section................................................................................................................................ 113.3.2 Second Section........................................................................................................................... 12

3.4 Assay file................................................................................................................................................. 133.4.1 First Section................................................................................................................................ 133.4.1 Second Section........................................................................................................................... 13

3.4.1.1 Technology Type: DNA microarray.........................................................................................133.4.1.2 Technology Type: Gel Electrophoresis...................................................................................153.4.1.3 Technology Type: Mass Spectrometry...................................................................................163.4.1.4 Technology Type: NMR Spectroscopy...................................................................................17

4 Annex........................................................................................................................................................... 18

1

Page 2: ISA-TAB proposalisatab.sourceforge.net/docs/ISA-TAB-specifications_v0… · Web viewThis document, part of the , is a . working draft. authored

ISA-TAB Proposal – draft v0.1

1. Background

1.1 PurposeThis document presents the first working draft of the Investigation / Study / Assay (ISA) tab-delimited (TAB) format proposal; a general framework with which communicate both the metadata (contact details, sample characteristics, technologies used, etc.) and the associated data files from an experiment. ISA-TAB is a superset of MAGE-TAB; see section 1.5. It builds on this existing paradigm and it shares the same motivation for the use of spreadsheet.

With this proposal it is our intention to address the pressing need of a group of collaborative repositories for a common framework for transcriptomics-, proteomics- and metabol/nomics-based experiments (hereafter referred as ‘omics-based’ experiments). However, it is not our intention to ‘compete’ against XML-based formats, whether existing or under development, such as the Functional Genomics Experiment Markup Language (FuGE-ML, 4, 5). In its final form, ISA-TAB could be seen as a framework to communicate the ‘omics-based’ experiments, while the suite of FuGE-ML based modules required to fully describe omics-based experiments are under development. When these become available, ISA-TAB could continue serving those with little or no bioinformatics support as well as finding utility as a user-friendly presentation layer for the XML-based formats (via an XSL transformation) i.e. like demonstrated with HTML rendering of MAGE-ML documents (6).

1.2 RationaleThe rationale behind this initial work stems from the current requirement for a common framework for omics-based experiments. Such a framework will fulfill the needs of:

The BioMAP project at EBI (7) which will create a common submission framework for ArrayExpress (8), PRIDE (9), and in the near future, a metabolomics repository;

A group of collaborative repositories (10,11); some committed to pipelining omics-based experimental data into EBI public repositories; others willing to exchange data or to enable their user base to import data from public repositories into their local systems.

It is envisaged that the ISA-TAB will also be tested for genomics- and metagenomics-based experiments (12), along with various conventional assays (often associated with omics-based experiments).

1.3 DefinitionsInvestigation, Study and Assay are the three key entities (10) around which the ISA-TAB framework is built. They assist in structuring and classifying information relevant to both the sample and the different technologies employed. Study is the central unit, containing information on samples, their characteristics and any treatments applied. A Study has associated Assays, which are tests performed either on material taken from the sample or on the whole initial sample, which produce qualitative or quantitative measurements (data). Assay can be characterized as the smallest complete unit of experimentation; i.e. one hybridization equals one assay; each technical replicate represents an additional assay; one LC-MS run equals one assay; a single clinical chemistry assay is (of course) one assay; a multiplexed (^n) microarray equals n assays; and a MALDI MS chip with n spots could perform up to n assays (i.e. all spots analyzed). Investigation is a higher-order object that helps to group related Studies.

It should be noted that the word ‘experiment’ has been deliberately avoided. A comparison of ArrayExpress and PRIDE revealed that ‘experiment’ is used to refer to objects at different levels of granularity in each; i.e. to refer to a set of multiple related hybridizations in ArrayExpress, but only a single gel-based separation run in PRIDE. Following the abstractions proposed here, an experiment in ArrayExpress would be equivalent to a Study.

The choice of Study as the central unit of the ISA-TAB proposal is supported by its use in existing biomedical formats, such as the Study Data Tabulation Model (SDTM), which encompasses both the Standard for Exchange of Nonclinical Data (SEND, 13) and the Clinical Data Interchange Standards Consortium (CDISC, 14). SDTM has been endorsed by the US Food and Drug Administration (FDA) as the preferred way to organize, structure and format both clinical and nonclinical (toxicological) data submissions (15, 16). See also section 1.5.

2

Page 3: ISA-TAB proposalisatab.sourceforge.net/docs/ISA-TAB-specifications_v0… · Web viewThis document, part of the , is a . working draft. authored

ISA-TAB Proposal – draft v0.1

1.4 ISA-TAB Structure – overview and examplesThe ISA-TAB uses a number of files to capture the information:

Reference file; Investigation file; Study file; Assay file (with associated data files and other relevant files).

These files are described briefly in the subsections below and more fully in section 3. For submission or transfer, files can be packaged into an ISArchive as shown in Figure 1. Work is ongoing to define a set of rules to regulate the creation of such files.

Currently, example files are provided in <<ISA-TAB_package_v0.1_zip>>, including: Two examples of a Study with Assays:

o <<ISA_TAB_Gauguier-PMID17618414.xls>>o <<ISA_TAB_Griffin-PMID17203948.xls>>

One example where the two Studies above are grouped under one Investigation (when i.e. these are part of the same collaborative project or data from each Study are compared)

o <<ISA_TAB_fatty-liver-Investigation.xls>>

1.4.1 Reference fileThe Reference File records all declarative information referenced from the other ISA-TAB files. This file covers information not only about contacts, protocols and equipment, but also terminologies (controlled vocabularies or ontologies) and other annotation resources.

1.4.2 Investigation fileThe Investigation file is intended to contain only a small amount of information, because its role is simply to group related Studies as appropriate. For this reason, it is optional and only becomes necessary when two or more Study files are created. The need for this flexible solution was clear in several use cases. In the toxicogenomics domain, for example, acute toxicity studies are followed by long term toxicity studies and in vitro toxicity studies. For clarity, these would be linked to the same Investigation. Another example comes from the environmental genomics domain, where several studies carried out in the same area can be usefully related under the same Investigation. The limitations that stem from the lack of such an object can be seen in ArrayExpress, where related experiments cannot be explicitly linked to one another. Despite this, MAGE-TAB has an Investigation Design File (IDF); this is used as synonym for experiment and is therefore equivalent to a single Study.

1.4.3 Study fileThe Study file is the central file, containing information on the samples studied, their source(s), the sampling methodology, sample characteristics and any treatments or manipulations performed. We acknowledge that in some cases it can be hard to define which information should belong to the Study and which to the Assay. More explanation is offered in the next section; and a set of rules to regulate the creation of such files is under development, as stated above.

1.4.4 Assay fileThe Assay file contains information about a protocol and further information generated through its execution (using material from the sample, or the whole initial sample), including references to data files (whether raw, processed or normalized). In the case of microarray assays the Array Description File (ADF) and Final Gene Expression Data Matrix (FGDM) are also referenced. As stated previously, it can be hard to determine whether particular sample treatments and manipulations belong in the Study or the Assay file. In general, treatments or manipulations performed immediately prior to executing the assay protocol itself, such as protein or nucleic acid extraction or labeling, should be described in the Assay file.

1.5 Relations with MAGE-TAB and biomedical tabular formatsISA-TAB is a superset of MAGE-TAB. The paradigm and syntax have been maintained as far as possible, although improvements have been suggested in a few places. However, while the ADF and the FGDM files have

3

Page 4: ISA-TAB proposalisatab.sourceforge.net/docs/ISA-TAB-specifications_v0… · Web viewThis document, part of the , is a . working draft. authored

ISA-TAB Proposal – draft v0.1

been left intact, the others have been refactored, resulting in a more general framework suitable for use with different technologies:

The Reference file contains many of the fields from the IDF component of MAGE-TAB, however it is not equivalent to that file.

The Investigation in the IDF is used as a synonym for ‘Experiment’ (sensu ArrayExpress). The content of the Sample Design Reference File (SDRF) has been divided between the Study and

Assay files; the Study file containing contextualizing information for Assays as described above.

Where omics-based technologies are used in clinical or nonclinical studies, ISA-TAB will complement existing biomedical formats such as the SDTM. It is inevitable that some information will be duplicated between those two frameworks, but this is not generally seen as a major problem. Ultimately a reference system will be needed to link the two files, but we have deliberately left that feature out of this first proposal.

1.6 Minimal content and terminologyOther important issues include; deciding on the ‘minimum information’ that the ISA-TAB files should require; and the terminology needed for use in each field. These issues are beyond the scope of this proposal but are the focus of related efforts. ‘Minimal information’ checklists are under development both by individual communities for their particular domains of interest and collaboratively through the Minimal Information for Biological and Biomedical Investigation (MIBBI, 17) project. The Ontology for Biomedical Investigations (OBI, 18) will provide both 'universal' terms that are applicable across various biological and technological domains and ‘domain-specific’ terms that are relevant only to a particular domain.

The terms used here to describe each ISA-TAB field are not necessarily final, having been created primarily for the purpose of explaining the proposal. These requirements will be submitted to the larger OBI community, to be refined and then explicitly defined, and will be presented in future versions.

2 References1. NET project members: http://www.ebi.ac.uk/net-project 2. Rayner TF, Rocca-Serra P, Spellman PT, Causton HC, Farne A, Holloway E, Irizarry RA, Liu J, Maier DS,

Miller M, Petersen K, Quackenbush J, Sherlock G, Stoeckert CJ Jr, White J, Whetzel PL, Wymore F, Parkinson H, Sarkans U, Ball CA, Brazma A. A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB. BMC Bioinformatics. 2006 Nov 6;7:489.

3. MAGE-TAB specification: http://www.mged.org/mage-tab4. Jones AR, Miller M, Aebersold R, Apweiler R, Ball CA, Brazma A, Degreef J, Hardy N, Hermjakob H,

Hubbard SJ, Hussey P, Igra M, Jenkins H, Julian RK Jr, Laursen K, Oliver SG, Paton NW, Sansone SA, Sarkans U, Stoeckert CJ Jr, Taylor CF, Whetzel PL, White JA, Spellman P, Pizarro A. The Functional Genomics Experiment model (FuGE): an extensible framework for standards in functional genomics.Nat Biotechnol. 2007 Oct;25(10):1127-1133.

5. FuGE working group: http://fuge.sourceforge.net 6. HTML rendering of MAGE-ML documents: http://www.ebi.ac.uk/~rocca/MAGE-XSLT/HTML%20rendering

%20of%20MAGE.htm7. BioMAP project: http://www.ebi.ac.uk/net-project/projects.html#biomap8. ArrayExpress: www.ebi.ac.uk/ arrayexpress 9. Pride: www. ebi .ac.uk/ pride / 10. Reporting Structure for Biological Investigation (RSBI) working group:

http://www.mged.org/Workgroups/rsbi/index.html11. Sansone SA, Rocca-Serra P, Tong W, Fostel J, Morrison N, Jones AR; RSBI Members. A strategy

capitalizing on synergies: the Reporting Structure for Biological Investigation (RSBI) working group. OMICS. 2006 Summer;10(2):164-71.

12. Genomic Standards Consortium (GSC): http://gensc.org/gsc/gcat/xtr/gsc 13. SEND: http://www.cdisc.org/models/send/v2.3 14. CDISC: http://www.cdisc.org/standards 15. FDA Data Standard Council: http://www.fda.gov/oc/datacouncil 16. Pharmacogenomic Data Submissions - Companion Guidance http://www.fda.gov/cder/guidance/7735dft.pdf 17. MIBBI: http:// mibbi .sourceforge.net 18. OBI: http://obi.sourceforge.net

4

Page 5: ISA-TAB proposalisatab.sourceforge.net/docs/ISA-TAB-specifications_v0… · Web viewThis document, part of the , is a . working draft. authored

ISA-TAB Proposal – draft v0.1

Figure 1For submission or transfer, files can be packaged into an ISArchive, as shown in this figure.

5

ISArchive

All data files resulting from Assays, including raw data files a, processed and normalized data files

ADF, for microarray applications only

Reference

Investigation

Study 1

Study 2

Assay(s)

Assay(s)

The Investigation file is optional and only required to group two - or more- related Study, like in this example.resulting from Assays, including raw data files a, processed and normalized data files

Page 6: ISA-TAB proposalisatab.sourceforge.net/docs/ISA-TAB-specifications_v0… · Web viewThis document, part of the , is a . working draft. authored

ISA-TAB Proposal – draft v0.1

3 ISA-TAB Structure - detailsEach file has a predefined structure, with fields being organized on a per-column and per-row basis. These files are described in details in the subsections below.

3.1 Reference fileIn this file the fields are organized on a per-column and divided in sections, as described in details below. Example:

Contacts

Person Last Name Griffin FleckensteinPerson First Name Julian ScottPerson Mid InitialPerson Email [email protected] Phone 44*(0)*1223766002Person FaxPerson Address University of Cambridge Imperial College LondonPerson Affiliation Department of Molecular Biology Genetics and Genomics

Research Institute

Person Role submitter;investigator investigatorPerson Role Source REF OBI OBI

ProtocolsProtocol Name standard procedure 1 grifin procedure 2Protocol Type animal procedure nucleic_acid_extractionProtocol Type Term Source REF OBIProtocol Description All animal procedures conformed

toHome Office, UK, guidelines for animal welfare. Male Wistarrats (n ) 3 for each time point; control animals fed controldiet for the same time period; Charles River UK Ltd.) were fedeither standard laboratory chow, or chow supplemented with1% orotic acid (Sigma Aldrich, UK) ad libitum.5-6 Rats werekilled by cervical dislocation at days 0, 1, 3 and 14, and the leftlateral lobe of the liver excised. Tissues were snap frozen andstored at -80 °C.

Total RNA was extracted by RNA Isolation Kit (Stratagene) from the livers of Wistar rats at day 0 (n ) 3), day 1 and 3 (n ) 2), and day 14 (n ) 3).

Protocol ContactProtocol Parameter

diet;population densityExtracted Product; Amplification

Protocol Parameter Type Term Souce Ref

Protocol Instruments

Protocol Instrument Name

Protocol Instrument ComponentProtocol Instrument Component Term Source REFProtocol Instrument ParameterProtocol Instrument Parameter Term Source REF

6

Page 7: ISA-TAB proposalisatab.sourceforge.net/docs/ISA-TAB-specifications_v0… · Web viewThis document, part of the , is a . working draft. authored

ISA-TAB Proposal – draft v0.1

Protocol Software

Protocol Software NameProtocol Software VersionProcessing Method ParameterProcessing Method Parameter Term Source REF

FactorsFactor Name Time TreatmentFactor TypeFactor Type Term Source REF OBI OBI

Measurements/EndpointsMeasurements/Endpoints Name

Gene ExpressionMetabolite Characterization

Measurements/Endpoints Term Source REF OBI OBI

PublicationPubMed ID 17203948Publication DOI 10.1021/pr0601640Publication Author list

Griffin JL, Scott J, Nicholson JK.Publication Title The influence of pharmacogenetics on fatty liver disease in the

wistar and kyoto rats: a combined transcriptomic and metabonomic study.

Publication Status indexed for MEDLINE

Ontology Source ReferenceTerm Source Name CTO MOTerm Source File

http://obo.sourceforge.net/cgi-bin/detail.cgi?cell

http://mged.sourceforge.net/ontologies/MGEDontology.php

Term Source Version 1.3.0.1Term Source Description The Cell Type Ontology The Microarray Ontology

ISA-TAB files study-1.txt;study-1-assay-1-Mx.txt;study-1-assay-2-Tx.txt;study-1-assay-3-ClinChem.txt

3.1.1 Contact SectionPerson Last NameThe last name of each person associated with the investigation or study.Person First NameThe first name of each person associated with the investigation or study.Person Mid InitialsThe middle initials of each person associated with the investigation or study.Person EmailThe email address of each person associated with the investigation or study. Email could be used as internal identifier to reference persons within the TABs. However, if this is effective and practical, there might be privacy issues related to relying on email as person tracker.Person PhoneThe telephone number of each person associated with the investigation or study.Person FaxThe Fax number of each person associated with the investigation or study.

7

Page 8: ISA-TAB proposalisatab.sourceforge.net/docs/ISA-TAB-specifications_v0… · Web viewThis document, part of the , is a . working draft. authored

ISA-TAB Proposal – draft v0.1

Person AddressThe street address of each person associated with the investigation or study.Person AffiliationThe organization affiliation for each person associated with the investigation or study.Person RolesThe role(s) performed by each person. Terms for this field should come from OBI. Multiple annotations or values attached to one person may be provided by using a semicolon (";") a separator, for example: "submitter;funder;sponsor”.Person Roles Term Source REFSource REF have to match one the Term Source Names declared in the annotation section, described below.

3.1.2 Protocol SectionProtocol NameThe names of the protocols used within the ISA-TAB document. These will be referenced in the Study and Assay files in the "Protocol REF" columns. Used as an identifier within the ISA-TAB document, these can also be accession values. In such case decisions about how to deal with the protocol information are left to data curators and tool implementation. For instance, an importer tool could be designed so that, in case an existing protocol is mentioned by means of a public accession, only those fields which are non empty in the ISA-TAB are updated in the target repository.Protocol TypeThe type of the protocol. Terms for this field should come from OBI.Protocol Type Term Source REFSource REF have to match one the Term Source Names declared in the annotation section, described below.Protocol DescriptionA free-text description of the protocol. This text is included in a single tab-delimited field. If you wish to include tab or newline characters as part of this text, you must enclose the whole text within double quotes (").Protocol ContactIf used, the contact should be declared in the Contact section and referenced here.Protocol ParametersA semicolon-delimited list of parameter names; these names are used in the Study and Assay files (as "Parameter Value [<parameter name>]" headings) to list the values used for each protocol parameter. If more than one parameter was used for a given protocol, they should be separated with semicolons (";"). Used as an identifier within the ISA-TAB document. Terms for this field should come from OBI.Protocol Parameter Term Source REFSource REF have to match one the Term Source Names declared in the annotation section, described below. Protocol Instrument NameThe instrument used by the protocol.Protocol Instrument ComponentTo indicate key part of an instrument set up. Terms for this field should come from OBI (or PSI-MS or MSI_NMR)Protocol Instrument Component Term Source REFSource REF have to match one the Term Source Names declared in the annotation section, described below.Protocol Instrument ParameterTo indicate an important parameter attached to the Instrument. Terms for this field should come from OBI.Protocol Instrument Parameter Term Source REFSource REF have to match one the Term Source Names declared in annotation section, described below.Protocol Software NameThe name of the software used in a protocol.Protocol Software VersionThe version of the software used in a protocol.

3.1.3 Factor SectionFactor NameThe name of the Factors used in the Study and/or Assays files. Factors should correspond to the independent variable manipulated by experimentalists and intended to affect biological systems in such a way that an assay is devised to measure the responses of the biological system to the perturbation by following a response variable (also known as dependent variable).

8

Page 9: ISA-TAB proposalisatab.sourceforge.net/docs/ISA-TAB-specifications_v0… · Web viewThis document, part of the , is a . working draft. authored

ISA-TAB Proposal – draft v0.1

Factor TypeThe study factor type should be supplied to allow for classification of Factor in categories. Terms for this field should come from OBI.Factor Type Term Source REFSource REF have to match one the Term Source Names declared in the annotation section, described below.

3.1.4 Measurements/Endpoints SectionMeasurements/Endpoint NameThis allows to declare and list all the response variables (aka dependant variable) that will be assessed or quantified: i.e. gene expression, protein expression, and clinical chemistry endpoints. Terms for this field should come from OBI.Measurements/Endpoint Term Source REFSource REF have to match one the Term Source Names declared in the annotation section, described below. Note that the declaration of measurement variables could be made in this Section OR at the level of the Assay Section. This is open to discussion.

3.1.5 Publication SectionPubMed IDThe PubMed IDs of the publication(s) associated with this investigation (where available).Publication DOIA Digital Object Identifier (DOI) for each publication (where available).Publication Author ListThe list of authors associated with each publication.Publication TitleThe title of publication associated to the investigation.Publication StatusA term describing the status of each publication (i.e. "submitted", "in preparation", "published"). Terms for this field should come from OBI.Publication Status Term Source REFSource REF have to match one the Term Source Names declared in the in the annotation section, described below.

3.1.6 Ontology Source SectionThis section is also from the MAGE-TAB specifications. It should be noted that not all sources of terms are ontologies, but these also include controlled vocabularies. We recommend using the ontologies posted under the OBO Foundry to maximize interoperability of the resources.Term Source NameThe names of the Term Sources (ontologies or databases) used within the ISA-TAB document. This name will be used in all corresponding "Term Source REF" fields. Examples: OBI, GO, DO. Used as an identifier within the ISA-TAB document. Term Source Namespace LocationA file name or valid pointer to an official resource to allow cross validation and version tracking of terms used in a submission.Term Source VersionThe version of the Term Source used throughout the ISA-TAB document.

3.1.7 ISA-TAB Files SectionA field to list all Tab separated files making up the ISA-TAB. The purpose of this listing is to ensure validation of the archive. For each study, provide a semi-colon (;) separated list of file names.

9

Page 10: ISA-TAB proposalisatab.sourceforge.net/docs/ISA-TAB-specifications_v0… · Web viewThis document, part of the , is a . working draft. authored

ISA-TAB Proposal – draft v0.1

3.2 Investigation fileThis file has fields organized to report information on a per-column basis. Since it is an optional component, the Investigation File has a very simple lightweight structure as detailed below.Investigation TitleA concise name given to the investigationInvestigation DescriptionA textual description of the investigationDate of Investigation SubmissionTo provide the date on which the investigation was reported to the repository.Investigation Public Release DateTo provide the date on which the investigation should be released publicly.Investigation ContactThe contact should be declared in the Contact section of the Reference file and referenced here.PubMed ID REFThe PubMed IDs of the publication(s) associated with this investigation (where available).Publication DOI REFA Digital Object Identifier (DOI) for each publication (where available).Study File NamesThe name of the Study file component.Assay File NamesThe names of the Assay file component(s).

10

Page 11: ISA-TAB proposalisatab.sourceforge.net/docs/ISA-TAB-specifications_v0… · Web viewThis document, part of the , is a . working draft. authored

ISA-TAB Proposal – draft v0.1

3.3 Study fileIn this file the fields are grouped in two sections. In the First Section, the fields are organized to provide information on a per-column and in the Second Section on a per-row basis. Example:

Study IdentifierGenerated by database only, or temporary supplied by users

Study Title

The Influence of Pharmacogenetics on Fatty Liver Disease in the Wistar and Kyoto Rats: A Combined Transcriptomic and Metabonomic Study

Study Description

Analysis of liver tissue from rats exposed to orotic acid for 1, 3, and 14 days was performed by DNA microarrays and high resolution 1H NMR spectroscopy based metabonomics of both tissue extracts and intact tissue (n ) 3).

Study Design time course designStudy Design Term Source REF OBIContact Jules GrifinPubMed ID REF 123434Publication DOI REF 10.1021/pr0601640Date of Study Submission DD/MM/YYYYStudy Public Release Date 13/12/2006

Source Name Characteristics[Material Type]Term Source

REF Characteristics[Organism]Study1.animal1 whole_organism MO {Term Name} Rattus norvegicusStudy1.animal2 whole_organism MO {Term Name} Rattus norvegicusStudy1.animal3 whole_organism MO {Term Name} Rattus norvegicusStudy1.animal4 whole_organism MO {Term Name} Rattus norvegicusStudy1.animal5 whole_organism MO {Term Name} Rattus norvegicusStudy1.animal6 whole_organism MO {Term Name} Rattus norvegicusStudy1.animal7 whole_organism MO {Term Name} Rattus norvegicusStudy1.animal8 whole_organism MO {Term Name} Rattus norvegicusStudy1.animal9 whole_organism MO {Term Name} Rattus norvegicusStudy1.animal10 whole_organism MO {Term Name} Rattus norvegicusStudy1.animal11 whole_organism MO {Term Name} Rattus norvegicusStudy1.animal12 whole_organism MO {Term Name} Rattus norvegicus

3.3.1 First SectionStudy IdentifierA unique identifier: temporary identifier supplied by users or generated by repository / database.It could be (but no necessarily) an identifier complying with LSID specifications.Study TitleA concise phrase used to encapsulate the purpose and goal of the study.Study DescriptionA textual description of the study, including section such as objective or goals.Study DesignA controlled term allowing classification of the study. Terms for this field should come from OBI.Study Design Term Source REFStudy Design Term Source REF has to match one the Term Source Names declared in the annotation section of the Reference file.

11

Page 12: ISA-TAB proposalisatab.sourceforge.net/docs/ISA-TAB-specifications_v0… · Web viewThis document, part of the , is a . working draft. authored

ISA-TAB Proposal – draft v0.1

ContactThe contact should be declared in the Contact section of the Reference file and referenced here.PubMed ID REFThe PubMed IDs of the publication(s) associated with this investigation (where available).Publication DOI REFA Digital Object Identifier (DOI) for each publication (where available).Study Public Release DateTo provide the date on which the study should be released publicly.

3.3.2 Second SectionSource NameSources are considered as the starting biological material used in a study. Source items can be qualified using the following header: Characteristics [], Term Source REF, Unit, Provider, Description, and Comment [].Sample NameSamples represent major outputs resulting from a protocol application but which cannot be treated as an Extract or a Labeled Extract. Sample items can be qualified using the following header: Characteristics [], Term Source REF, Unit and Comment [].Characteristics [<category term>]Used as a qualifying field following Source Name, Sample Name. This column contains terms describing each material according to the characteristics category indicated in the column header. For example, a column headed "Characteristics [OrganismPart]" would contain individual OrganismPart terms. These terms may be user-defined (the default), from an external ontology source (indicated using a Term Source REF column), or a measurement (indicated using a Unit [] column). Protocol REFThis column contains references to Protocol Names defined in the Reference File, or accession numbers of protocols already present in public repositories, such ArrayExpress or Pride. Valid qualifying headers for Protocol REF item are Parameter [Value], Performer, Date, and Comment [] (which is optional)Factor Value [<factor name>]Factor Value column is key to provide the actual values of the independent variables when manipulated by the experimentalists. Those study variables aka as Factors should have been declared in the Reference File and the Factor Name should be recalled between the square brackets of the column header. In order to fully qualify the Factor Values, the following headers can be used to refine annotation and description: Term Source REF, Unit, Unit Term Source REF. The latter two fields enable full description of numerical values. Finally, FactorValue should either match biomaterial Characteristics or Protocol Parameter.

12

Page 13: ISA-TAB proposalisatab.sourceforge.net/docs/ISA-TAB-specifications_v0… · Web viewThis document, part of the , is a . working draft. authored

ISA-TAB Proposal – draft v0.1

3.4 Assay fileIn this file the fields are grouped in two sections. In the First Section the fields are organized on a per-column and in the Second Section on a per-row basis. Example:Double click to Activate and browse the example:

3.4.1 First SectionStudy IdentifierThis allows cross-referencing the Study file to ensure efficient information tracking.Assay Measurement/Endpoint TypeThis field helps qualifying the endpoint, what is being measured, i.e. are gene expression, protein expression, methylation status, hepatic function, DNA damage. Terms for this field should come from OBI.Assay Measurement/Endpoint Term Source REFSource REF have to match one the Term Source Names declared in the Annotation Section of the Reference file.Technology TypeTo describe the kind of technology used to perform an assay. Example are DNA micro hybridization or Mass Spectrometry which are technologies which can be used to monitor gene expression or genotype for the first one and perform protein identification or metabolite profiling in the second case. Terms for this field should come from OBI.Technology Type Term Source REFSource REF have to match one the Term Source Names declared in the annotation section of the Reference file. ContactsThe contact should be declared in the Contact section of the Reference file and referenced here.

3.4.1 Second SectionThis section depends on the Assay Measurement/Endpoint Type and Technology Type fields. The following subsections provide a list of fields focusing on microarray, gel electrophoresis and mass spectrometry. However, additional work is required to complete this section and to define the format of the resulting data matrices for the different technologies, like it has been done for the microarray (FGDM in MAGE-TAB), peptide and protein identification.

3.4.1.1 Technology Type: DNA microarrayWhen dealing with DNA microarray technology, the allowed fields essentially correspond to those defined by the MAGE-TAB format (3).Extract NameUsed as an identifier within the ISA-TAB document. This column contains user-defined names for each Extract material. Valid qualifying headers for Extract item are Characteristics[], Material Type, Description, Comment[]Labeled Extract NameUsed as an identifier within the ISA-TAB document. This column contains user-defined names for each Labeled Extract material

13

Page 14: ISA-TAB proposalisatab.sourceforge.net/docs/ISA-TAB-specifications_v0… · Web viewThis document, part of the , is a . working draft. authored

ISA-TAB Proposal – draft v0.1

Valid qualifying header for LabeledExtract item is Label (which is mandatory) and Characteristics [], Material Type, Description, Comment [] (which are optional)Material TypeUsed as an attribute column following Source Name, Sample Name, Extract Name, or Labeled Extract Name. This column contains terms describing the type of each material. Examples: whole_organism, organism_part, cell, total_RNA. Valid qualifying header for Label item is Term REF. Terms for this field should come from OBI (for ArrayExpress submissions this term should be an instance of LabelCompound from the MGED Ontology).LabelUsed as an attribute column following Labeled Extract Name to indicate the compound linked to an Extract to create the Labeled Extract. Examples: Cy3, Cy5, biotin, alexa_546.Valid qualifying headers for Label item are Term REF. Terms for this field should come from OBI (for ArrayExpress submissions this term should be an instance of LabelCompound from the MGED Ontology).Hybridization NameUsed as an identifier within the ISA-TAB document.This column contains user-defined names for each Hybridization. Valid qualifying headers for Hybridization Name item are ArrayDesign REF (which is mandatory) and Comment [] (which is optional)Array Design REFThis column contains references to the array design used for individual hybridizations. For ArrayExpress submissions this should be a valid accession number, i.e. "A-AFFY-33" but for the purpose of data exchange, it should be a unambiguous name such as a commercial name HG-U133A-2 in the case of an Affymetrix array. Valid qualifying header for Derived Array Data Matrix File item is Comment [] (which is optional).The values in this field are used as identifiers. They must match the references provided in the array description file (ADF). Submitters, curators and software tools should consider the use of public accessions for this value.Scan NameUsed as an identifier within the ISA-TAB document. This optional column contains user-defined names for each Scan EventValid qualifying headers for Scan Name item are Description and Comment [] (which are optional)Image FileThis optional column contains a list of image files, one for each row of the Assay file, linking these image files to their respective hybridizations. Note that ArrayExpress does not store image data due to size constraints on the database. However, in the context of an infrastructure intending to use the format for data exchange purposes, this column is valuable to include links to image files stored on local web server. Valid qualifying headers for Derived Array Data Matrix File item is Comment [] (which is optional)Normalization NameUsed as an identifier within the ISA-TAB document. This optional column contains user-defined names for each Normalization event. Valid qualifying headers for Scan Name item are Description and Comment [] (which are optional)Array Data FileThis column contains a list of raw data files, one for each row of the Assay file, linking these data files to their respective hybridizations. Valid qualifying header for Array Data File Name item is Comment [] (which is optional)Derived Array Data FileThis column contains a list of processed data files, one for each row of the SDRF file, linking these data files to their respective hybridizations. Valid qualifying header for Derived Array Data File item is Comment [] (which is optional)Array Data Matrix FileThis column contains a list of raw data matrix files, where data from multiple hybridizations is stored in a single file, and the data mapped to individual hybridization via the Data Matrix format itself. Valid qualifying header for Array Data Matrix File item is Comment [] (which is optional)Derived Array Data Matrix FileThis column contains a list of processed data matrix files, where data from multiple hybridizations is stored in a single file, and the data mapped to each hybridization (or scan, or normalization) via the Data Matrix format itself.Valid qualifying header for Derived Array Data Matrix File item is Comment [] (which is optional)Factor Value [<factor name>]Factor Value column is key to provide the actual values of the independent variables when manipulated by the experimentalists. Those study variables aka as Factors should have been declared in the Reference File and the Factor Name should be recalled between the square brackets of the column header. In order to fully qualify the Factor Values, the following headers can be used to refine annotation and description: Term Source REF, Unit,

14

Page 15: ISA-TAB proposalisatab.sourceforge.net/docs/ISA-TAB-specifications_v0… · Web viewThis document, part of the , is a . working draft. authored

ISA-TAB Proposal – draft v0.1

Unit Term Source REF. The latter two fields enable full description of numerical values. For usability purposes, it could be possible to cascade FactorValues declared in the Study file at the level of the Assay file for facilitate association between datafiles and factor values.

3.4.1.2 Technology Type: Gel ElectrophoresisExtract NameUsed as an identifier within the ISA-TAB document. This column contains user-defined names for each Extract material. Valid qualifying headers for Extract item are Characteristics[], Material Type, Description, Comment[]Labeled Extract Name (where relevant)Used as an identifier within the ISA-TAB document. This column contains user-defined names for each Labeled Extract. Valid qualifying headers for LabeledExtract item are Label (which is mandatory) and Characteristics [], Material Type, Description, Comment [].Material TypeUsed as an attribute column following Sample Name, Extract Name, or Labeled Extract Name. This column contains terms describing the type of each material. Examples: whole_organism, organism_part, cell, fraction. LabelUsed as an attribute column following Labeled Extract Name to indicate the compound linked to an Extract to create the Labeled Extract. Examples: Cy2, Cy3, Cy5. A valid qualifying header for Label item is Comment []Electrophoresis Gel NameUsed as an identifier within the ISA-TAB document. This column contains user-defined names for each electrophoresis gel. Valid qualifying headers for Electrophoresis Gel Name item are Protocol, Parameter [] Comment [] (which is optional). For specific 2D applications, the following headers can be used instead:[First Dimension Gel=Isoelectrofocusing][Second Dimension Gel=SDS-PAGE]Scan NameUsed as an identifier within the ISA-TAB document. This optional column contains user-defined names for each Scan event. Valid qualifying headers for Scan Name item are Protocol, Parameter [], Performer, Date and Comment [] (which is optional)Normalization NameUsed as an identifier within the ISA-TAB document. This optional column contains user-defined names for each Normalization event. Valid qualifying headers for Normalization Name item are Protocol, Parameter [], Performer, Date and Comment [] (which is optional)Image FileThis optional column contains a list of image files, one for each row of the Assay file, linking these image files to their respective electrophoresis events. Raw Data FileThis column contains a list of raw data files, one for each row of the Assay file, linking these data files to their respective gels. A valid qualifying headers for Raw Data File item is Comment [] (which is optional)Processed Data FileThis column contains a list of processed data files, one for each row of the Assay file, linking these data files to their respective gel runs. Valid qualifying headers for Processed Data File item is Comment [] (which is optional)Spot Picking FileThis column contains a file name pointing to files hosting protein spot coordinates and metadata for use by spot picking instruments, typically for downstream analysis by Mass spectrometry.Factor Value [<factor name>]Factor Value column is key to provide the actual values of the independent variables when manipulated by the experimentalists. Those study variables aka as Factors should have been declared in the Reference File and the Factor Name should be recalled between the square brackets of the column header. In order to fully qualify the Factor Values, the following headers can be used to refine annotation and description: Term Source REF, Unit, Unit Term Source REF. The latter two fields enable full description of numerical values. For usability purposes, it could be possible to cascade FactorValues declared in the Study file at the level of the Assay file for facilitate association between datafiles and Factor Values.

15

Page 16: ISA-TAB proposalisatab.sourceforge.net/docs/ISA-TAB-specifications_v0… · Web viewThis document, part of the , is a . working draft. authored

ISA-TAB Proposal – draft v0.1

3.4.1.3 Technology Type: Mass SpectrometryExtract NameUsed as an identifier within the ISA-TAB document. This column contains user-defined names for each Extract material. Valid qualifying headers for Extract item are Characteristics [], Material Type, Description, Comment []Labeled Extract Name (where relevant)Used as an identifier within the ISA-TAB document. This column contains user-defined names for each Labeled Extract material. Valid qualifying headers for LabeledExtract item are Label (which is mandatory) and Characteristics [], Material Type, Description, Comment [] (which are optional)Material TypeUsed as an attribute column following Source Name, Sample Name, Extract Name, or Labeled Extract Name. This column contains terms describing the type of each material. Examples: column fraction, gel excised spot. LabelUsed as an attribute column following Labeled Extract Name to indicate the compound linked to an Extract to create the Labeled Extract. Examples: P33, C14. Valid qualifying headers for Label item are and Term Source REFMass Spectrometry Run Name Used as an identifier within the ISA-TAB document. This column contains user-defined names for each Assay. Valid qualifying headers are Protocol, Parameter [], Performer, Date and Comment [] (which is optional)The following columns can be used to annotate Assay Name columns:Analyzer typeA field to report additional information about the analyzer. Terms should be coming from a controlled terminology. A valid qualifying header is Term Source REF.DetectorA field to report additional information about the detector. Terms should be coming from a controlled terminology. A Valid qualifying header is Term Source REFRaw Spectral Data FileThis column contains a list of raw data files, one for each row of the Assay file. Processed Spectral Data FileThis column contains a list of raw data files, one for each row of the Assay file. Normalized Spectral Data FileThis column contains a list of raw data files, one for each row of the Assay file.

When Mass Spectrometry is used in proteomics the following information will be required.Peptides FileThis data file should be formatted following the PSI-MS specifications and according to Pride submission requirements (8). Protein FileThis data file should be formatted following the PSI-MS specifications and according to Pride submission requirements (8).PTMs FileThis data file should be formatted following the PSI-MS specifications and according to Pride submission requirements (8).PTM Codes FileThis data file should be formatted following the PSI-MS specifications and according to Pride submission requirements (8).Factor Value [<factor name>]Factor Value column is key to provide the actual values of the independent variables when manipulated by the experimentalists. Those study variables aka as Factors should have been declared in the Reference File and the Factor Name should be recalled between the square brackets of the column header. In order to fully qualify the Factor Values, the following headers can be used to refine annotation and description: Term Source REF, Unit, Unit Term Source REF. The latter two fields enable full description of numerical values. For usability purpose, it could be possible to cascade FactorValues declared in the Study file at the level of the Assay file for facilitate association between datafiles and FactorValues.

When Mass Spectrometry is used in metabol/nomics a list of requirements is under development.

16

Page 17: ISA-TAB proposalisatab.sourceforge.net/docs/ISA-TAB-specifications_v0… · Web viewThis document, part of the , is a . working draft. authored

ISA-TAB Proposal – draft v0.1

3.4.1.4 Technology Type: NMR SpectroscopyExtract NameUsed as an identifier within the ISA-TAB document. This column contains user-defined names for each Extract material. Valid qualifying headers for Extract item are Characteristics [], Material Type, Description, Comment []Material TypeUsed as an attribute column following Source Name, Sample Name, Extract Name, or Labeled Extract Name. This column contains terms describing the type of each material. Examples: whole_organism, organism_part, cell, protein_extract. Valid qualifying header for Material is Term REFNMR Run NameUsed as an identifier within the ISA-TAB document. This column contains user-defined names for each NMR run. The following columns can be used to annotate NMR Run Name columns:InstrumentThis column contains a reference to instruments declared in the protocol section.Free Induction Decay Data FileThis column contains a list of raw data files, one for each row of the Assay file. Valid qualifying header for Free Induction Decay Data (Assay Data File) item is Comment [] (which is optional). Refer to the Annex for a list of data format generated by most commonly used NMR instruments. Acquisition Parameter Data File [NMR pulse sequence]This column contains a list of files detailing acquisition parameters in particular, a file containing the NMR pulse sequence must be provided. Refer to the Annex for a list of data format generated by most commonly used NMR instruments. Processed Spectral Data FileThis column contains a list of processed spectral data files, one for each row of the Assay file. Valid qualifying header for Processed Spectral Data Name item is Comment [] (which is optional)Normalized Spectral Data FileThis column contains a list of raw data files, one for each row of the Assay file. Valid qualifying header for Raw Spectral Data (Assay Data File) Name item is Comment [] (which is optional)Factor Value [<factor name>]Factor Value column is key to provide the actual values of the independent variables when manipulated by the experimentalists. Those study variables aka as Factors should have been declared in the Reference File and the Factor Name should be recalled between the square brackets of the column header. In order to fully qualify the Factor Values, the following headers can be used to refine annotation and description: Term Source REF, Unit, Unit Term Source REF. The latter two fields enable full description of numerical values. For usability purpose, it could be possible to cascade FactorValues declared in the Study file at the level of the Assay file for facilitate association between datafiles and FactorValues.

When NMR Spectroscopy is used in metabol/nomics the requirements for the final list of metabolites (identification and quantification) is under development.

17

Page 18: ISA-TAB proposalisatab.sourceforge.net/docs/ISA-TAB-specifications_v0… · Web viewThis document, part of the , is a . working draft. authored

ISA-TAB Proposal – draft v0.1

4 AnnexPartial list of NMR instrument output formats and that would be valid for reporting on NMR spectroscopy raw data (FID) and acquisition metadata

Vendor Data Format Application Spectrum FileRequired Parameter

FilesOptional

Parameter Files

Bruker UXNMR/XWIN-NMR

1D NMR fid, 1r acqus, procs title, intrng

2D NMR ser, 2rracqus, acqu2s, procs, proc2s title

JCAMP-DX

1D NMR *.dx; *.jdx - -

2D NMR *.dx; *.jdx - -

JEOL EX/GX 1D NMR *.gxd *.gxp -

JEOL AL95 2D NMR *.als - -

JEOL Alpha 2D NMR*.nmfid, *.nmf, *.nmdata, *.nmd - -

Varian FDF

1D NMR *.fdf procpar -

2D NMR fid0001.fdf procpar -

Varian VNMR peaks 2D NMR *.txt - -

Varian VNMR

1D NMR fid, data, phasefile procpar text

2D NMR fid, phasefile procpar text

18