summary of data dissemination working group · 16omics type protein-protein interactions yeast two...
TRANSCRIPT
Summary of Data Dissemination Working Group
22Jan2016
Overview
• Review of Data Dissemination Working Group – Strategy for data dissemination
• Testing of model and submission process – Systems Biology Centers test submissions – Current work converting IRD/ViPR to SysBio v2.0
DDWG Background and objective
• DDWG started fall 2014 • Tasked with developing a data dissemination
strategy for all five systems biology centers. • Key issues: – What types of data should be disseminated? – Where should the data go? – How should the metadata be represented?
Projected data types for dissemination Original listorder
ExperimentType Analyte Methodology FluDyNeMo Flu-OMICS MaHPIC Omics-4TB OMICS-LHV
# of SysBioCenters
Currentlysupported? Data archives
Disseminationpriority
1 OMICS Type mRNA (transcriptome) microarray No No No Yes Yes 2 Y GEO & BRC 1
2 OMICS Type miRNA microarray No No No Yes Yes 2 Y GEO & BRC 1
3 OMICS Type mRNA (transcriptome) RNA-seq Yes Yes Yes Yes Yes 5 N 1
4 OMICS Type miRNA RNA-seq Yes No Yes 2 N 1
5 OMICS Typemicrobial RNA(metatranscriptome) RNA-seq Yes No No No 1 N 3
6 OMICS Type influenza metagenome RNA-seq Yes No No No 1 N 3
7 OMICS Type bacterial 16S profiling targeted sequencing Yes No No No 1 N 3
8 OMICS Type mRNA (transcriptome)Microfluidic multiplexqRT-PCR No Yes Yes No 2 N 2
9 OMICS Type protein-DNA interactions ChIP-seq No Yes No Yes Yes 3 N 2
10 OMICS Type open chromatin Faire-SEQ No No No No Yes 1 N 2
11 OMICS Type DNA methylation No Yes No No Yes 2 N 2
12 OMICS Type protein (proteome) mass spectrometry No Yes Yes Yes Yes 4 Y Peptide Atlas & BRC 1
13 OMICS Typephosphoproteins(phosphoproteome) mass spectrometry No Yes Yes Yes Yes 4 N 1
14 OMICS Type metabolites (metabolome) mass spectrometry No Yes Yes Yes Yes 4 Y Metabolites & BRC 1
15 OMICS Type lipids (lipidome) mass spectrometry No Yes Yes Yes Yes 4 Y BRC 1
16 OMICS Type protein-protein interactions yeast two hybrid No No No No No 0 N 4
17 OMICS Type protein-protein interactions co-immunoprecipitation No Yes No No Yes 2 N 2
18 Phenotypic Weight Yes Yes Yes No Yes 4 N 1
19 Phenotypic Body Temperature No No Yes No No 1 N 3
20 Phenotypic Virus Titers plaque assay Yes Yes No No Yes 3 N 2
21 Phenotypic Virus genomic RNA levels qPCR No No No No Yes 1 N 3
22 Phenotypic Virus mRNA levels qPCR No No No No Yes 1 N 3
23 Phenotypic Hematology (??)CBC (manual &automated) No No Yes No No 1 N 3
24 Phenotypic Lung Function (??) No No No? No No 0 N 4
25 Phenotypic Clinical Score Direct Observation Yes No No ? No Yes 2 N 2
26 Phenotypic tissue architecture histology with H&E stain Yes Yes? Yes ? Yes Yes 5 N 1
27 Phenotypic protein tissue expression immunohistochemistry Yes No Yes ? No Yes 3 N 2
28 Phenotypic serum antibody ELISA Yes No No Yes No 2 N 2
29 Phenotypic cellular cytotoxicity Cell Titer Go (Promega) No No No No Yes 1 N 3
30 Phenotypic cytokine protein levels cytokine bead arrays Yes No Yes Yes Yes 4 N 1
31 Phenotypic cytokine protein levels ELISA Yes No Yes? No Yes 3 N 2
32 Phenotypic cytokine protein levels Bioplex assay Yes No No No Yes 2 N 2
33 Phenotypic cytokine protein secretion ELISPOT Yes No No No 1 N 3
34 Phenotypic parasitemiathin and thick smearslides No No Yes No No 1 N 3
35 Phenotypic
(MPSS) MacaquePhysiological ScoringSystem [numeric value 0-16] No No Yes No No 1 N 3
36 Phenotypic serum chemical levels iSTAT chem profile No No Yes No No 1 N 3
Leveraging public archives to store raw and processed data
• Primary “omics” type data and unstructured metadata to public archives – GEO / SRA / Array Express – PeptideAtlas / Metabolites / massIVE
• Derived “omics” data and structured metadata to BRCs
• Phenotypic data – If no archive exists, BRC will accept data
• where possible, SysBio metadata standards should be used
Derived data from SBCs to respective Bioinformatics Resource Centers (BRCs)
Flu-Omics
Derived data in the form of biosets
– Biosets are interesting interpreted results from an experiment
– Biosets can be directly provided by the SBCs to BRCs or BRCs may choose to generate from processed data
– Bioset example – genes/proteins that are differentially expressed in a: • comparison of human mock infected and influenza infected
cells after 7 HPI • comparison of influenza infected wild-type mice and CXCR3
KO mice after 2 days of infection • comparison of H5N1 infected wild-type mice to H1N1
infected wild-type mice • comparison of H5N1 at 5 MOI to H5N1 at 1 MOI in human
cells
Metadata representation
• Enhancements of SysBio v1.0 in SysBio v2.0 – Added experimental time line using a “Reference
Time Zero (T0)” to support multiple treatment, multiple sampling and complex study designs
– Added “Analysis Workflows” and “Data Processing Events” to capture data transformation and relationships between data
– Added “Disease” and “Disease Course Stage” objects to explicitly capture disease manifestation (previously associated with viral agent)
Data model and submission process testing
Getting started
• One-on-one calls between System Centers and BRCs identified use cases for initial test of metadata standard and submission process
• Testing results and potential issues to be presented later by individual centers
• Converting IRD/ViPR previous contract data from SysBio v1.0 to SysBio v2.0 underway
IRD/ViPR update
• Have begun implementing data model based on SysBio v2.0 at IRD/ViPR
• Converting data from previous SBC contracts • Preparing loading and validation submission
infrastructure • Updates to UI pending
IRD/ViPR data model
Study/Experiment Assay Data Analysis
Conclusion
• SysBio v2.0 adopted in summer 2015 – Testing of new data types may require revisions
• Submissions to begin in 2016 • Areas still under consideration
• Controlled vocabulary • Data formatting • Data archive selection
– Unified approach?
• Stable & unique entity identifiers (post-translational modifications, metabolites, etc.)
Acknowledgement
EupathDB Brian Brunk Omar Harb Jessica Kissinger
MaHPIC Jessica Kissinger Mary Galinkski Suman Pakala Mustafa Veysi Nural Regina C Joice
Omics-LHV Michelle Craft Kelly Stratton Katrina Waters Amie Eisfeld Miron Livny Allison Thompson
Data Dissemination Working Group
NIAID Vivian Dugan Alison Yao Megan Hoffmann Eric Choi
ViPR/IRD Richard Scheuermann Brian Aevermann
Flu-Omics Sumit Chandra Lars Pache Crystal Herndon Andre Gatarano
Flu-DyNeMo Elodie Ghedin Lauren Lashua Alan Twaddle Abhishek Pratap
Omics-4TB Serdar Turkasian Micheleen Harris
PATRIC Rebecca Will Tom Brettin Rebecca Wattam Maulik Shukla