pcbc bioinformatics core & committee pcbc steering committee call nathan salomonis cincinnati...

21
PCBC Bioinformatics Core & Committee PCBC Steering Committee Call Nathan Salomonis Cincinnati Children’s Larsson Omberg, Sage Bionetworks Nathan Salomonis Division of Biomedical Informatics, CCHMC

Upload: suzan-smith

Post on 28-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

PCBC Bioinformatics Core & Committee PCBC Steering Committee Call

Nathan Salomonis Cincinnati Children’s

Larsson Omberg, Sage Bionetworks

Nathan SalomonisDivision of Biomedical Informatics, CCHMC

Bionformatics Working Group• Bruce Aronow• Nathan Salomonis• Phillip Dexheimer• Carolyn Lutzko

• Alex Pico

• Larsson Omberg• Kenny Daily

• Antonis Hatzopoulos

• Winston Hide• Shanan Sui

• Joseph Huo• Elias Zambidis

• Michael Kyba

• Jennifer Larkin

• Lynn Schriml• Michael Terrin• Ling Tang

*

*

*

**

*****

*

*

*

• C4• SAGE BIONETWORKS• VANDERBILT U.• HARVARD U.• JOHNS HOPKINS U.• USCF• STANFORD U.• U. MINNESOTA• NHLBI• ADMINISTRATIVE CORE U. MARYLAND

PCBC Bioinformatics Committee & Core

• Create structured annotations for iPSC generation and derived products (metadata).

• Provide new tools and resources for access to analysis of C4 data.

• Provide education to the PCBC and beyond.

• Spearhead informatics efforts in the consortium.

Prior Progress on Primary Aims

• Developed Metadata standards for cell lines• Developed an advanced online portal in Synapse for direct

access to: – PCBC omics datasets– Integrated analysis results– Metadata– Protocols (experimental, software)– Other datasets

• Created specialized tools for PCBC data access:– ToppGene progenitor signatures (pathway analysis)– Cytoscape tool for Synapse data visualization– AltAnalyze for integrated omics analyses and progenitor cell-type

prediction

• Collaboratively wrote papers for description of these resources and datasets.

Major Updates

• PCBC Omics Portal is Live (Stealth Release). • Resubmitting manuscript 1 following Cell

Stem Cell encouraging reviews.• Multitude of new interfaces, automated

worfklows, result sets in Synapse.• Significant progress on differentiation

manuscript analyses.• New software to help experimentalists

analyze their own omics data.• Recent and future bioinformatics workshops.

Synapse: Online repository for PCBC data access, annotations, sharing and analysis

Online repository for PCBC data access, annotations, sharing and analysis.

Key Features of Synapse

• Download PCBC Omics data from the web or programmatically (R/Python/Java).

• Easily post new datasets, images or presentation.

• “Time Machine” of Data Files and Analyses.

• Access Control – Private Work Areas.

• DOI Annotation for Direct Data Access from Publications.

• Wiki Content – Editing

• Help Desk

Target Audiences of PCBC Database in Synapse

InvestigatorsExplore Genes/Pathways

Explore Processing Pipelines

Genes/Pathways Search Engines

Review Results of Previous Analyses

Communicate Early Results

Share Results

Target Audiences of PCBC Database in Synapse

Target Audience

Bioinformaticians

Process Own DataUsing Defined pipelines

Download & Query Raw data

Access directly from R/Python/JAVA

Download Analysis Results

Share analysis

Target Audience

Target AudienceNavigate PCBC Data: Download, Query, perform Pathway Analysis

Target AudienceFollow Data Pipelines withOwn Experimental Results

Private Work Areas

Target AudienceCollaborate with Other Groups

Collaborative Private Work Areas

Usage

• Over 300 users outside of the bioinformatics core.– 111 registered PCBC users accessing the site.– 200 folks outside of the PCBC

Usage

Brand New Features in Synapse

• PCBC Portal is open-access to anyone with a free Synapse account (March 2015 – stealth release prior publication).

• New and Improved heatmap viewer with integrated RNA-Seq, DNA-methylation and microRNA.

• Simple interactive metadata navigator for cell lines (to be updated with Wicell).

• Amazon hosted virtual computing environment with tools for sequence analysis of any data.

• Expanded bioinformatics best practices and protocol comparison (tutorial videos, algorithm comparisons, etc.).

• Improved attribution pages.

• New analysis methods and results associated with bioinformatics core papers being (re)submitted

PCBC Metadata Developed According to Global Vocabularies of Terms

Creating Metadata Standards for SharingExchanging and Analyzing PCBC Data

How is the PCBC Metadata Standard Organized ?- categories of metadata

- describing cell line, host and classification methods

-including investigator, cell of origin, method of reprogramming, -reprogramming gene combinations, donor gender, age, ethnicity and disease status

Metadata Collection Standards- developed for the PCBC consortium- defined through an iterative process- relevant terms mapped to established community

ontologies- metadata collected for each cell line submitted to C4

Metadata Associated Data

• mRNA-Seq– 301 samples

• microRNA-Seq– 252 samples

• DNA-methylation– 131 samples

PCBC Metadata Developed According to Global Vocabularies of Terms

Ontologies: Disease Ontology, NCBI Taxonomy vocabulary, Cell Ontology, Cell Line Ontology, HsapDv (human developmental stage ontology), NCI Thesaurus (race, ethnicity), PATO (gender), Human Phenotype Ontology

Tools:

PCBC Metadata Developed According to Global Vocabularies of Terms

PCBC Metadata Developed According to Global Vocabularies of Terms

PCBC Metadata Annotations as Exchange Format (ISA-Tab)

Allow Global Data Sharing/ReuseDocument Provenance/History of Data

Investigation-Study-Assay (isatab)

New Software for PCBC Researchers

• We are in the final stages of releasing tools to allow bioinformatics novice researchers to analyze their own bulk and single-cell RNA-Seq datasets (AltAnalyze version 2.10).

• New tools for cell-type prediction automated within this tool-kit.

• Used by over a dozen PCBC researchers at the Stanford Bioinformatics training course.

Manuscripts

1. Re-Submission of the first C4 Manuscript (Cell Reports):– Integrated Genomic Analysis of Diverse Induced Pluripotent

Stem Cell Lines Identifies Novel Molecular Determinants of Pluripotency

2. Data Descriptor manuscript (Following manuscript 1 acceptance):– Comprehensive Characterization of Diverse Pluripotent Stem

Cells from the Progenitor Cell Biology Consortium

3. Expected Submission in September– Multi-Lineage Characterization of Diverse Induced Pluripotent

Stem Cells and their Derivatives • (collaborative multicenter effort lead by Sage)