a software tool for analyzing genome-scale data in the context of biological pathways and

51
Software Tool for Analyzing Genome-Scale a in the Context of Biological Pathways a the Gene Ontology J. David Gladstone Institute of Cardiovascular Disease UCSF GenMAPP

Upload: wilton

Post on 02-Feb-2016

30 views

Category:

Documents


0 download

DESCRIPTION

GenMAPP. A Software Tool for Analyzing Genome-Scale Data in the Context of Biological Pathways and the Gene Ontology. J. David Gladstone Institute of Cardiovascular Disease UCSF. Overview. Intro to GenMAPP - GenMAPP analysis example Advanced features. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

A Software Tool for Analyzing Genome-Scale Data in the Context of Biological Pathways and

the Gene Ontology

J. David Gladstone Institute of Cardiovascular DiseaseUCSF

GenMAPP

Page 2: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Overview

• Intro to GenMAPP

- GenMAPP analysis example

• Advanced features

Page 3: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Analyzing Large-Scale Data in the Context of Biological Pathways

• Which genes are expressed in my dataset?

• What biological processes are important in my data model?

• New insight into underlying biology

Page 4: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Analyzing Large-Scale Data in the Context of Biological Pathway

• View data in the context of known biology

• Rather than seeing which individual genes are changed, pathway analysis emphasizes processes that are changed

• Biologists are familiar with pathways, so it is a natural way of sharing data

Page 5: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Cardiomyopathy: Downregulated genes

Page 6: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Cardiomyopathy: Downregulated genes

Page 7: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Fatty Acid Degradation Pathway

Page 8: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Cardiomyopathy Data on Fatty Acid Degradation Pathway

Page 9: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

GenMAPPGene Map Annotator and

Pathway Profiler

Visualize gene expression and other genomic data on biological pathways and other groupings of genes Global analysis identifies significantly changed processes and functional groups

www.GenMAPP.org

Page 10: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

GenMAPP

• Developed in the Conklin lab at Gladstone as an internal tool for dealing with microarray data

• Approximately ~12,000 registered users to date

• 100% Free!!

• Used in 150 - 200 publications

• Open source, code available at SourceForge.net

• Current version for Windows only (Coded in VB)

Page 11: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Time Course Data on Cell Cycle Pathway

Page 12: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

SNPs with Predicted Effects

http://alto.compbio.ucsf.edu/LS-SNP/

Page 13: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

SNPs that Predispose to Myocardial Infarction

Tobin et al, European Heart Journal 2004

• 547 acute MI cases; 505 controls• 58 SNPs in 35 genes

=> SNPs in 5 different genes showed statistical

association with MI

Study spans 19 pathways

=> 4 of 5 hits are on a single pathway

Page 14: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

SNPs and Myocardial Infarction

Tobin et al, European Heart Journal 2004

Page 15: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

SNP Data in GenMAPP

• Visualization Distribution of SNPs per gene

• Prioritization Mapping SNP annotations onto pathways

• Analysis Interpreting SNP data in the context of biological pathways

Future directions High-resolution visualization of individual SNPs with the ability to overlay data

Page 16: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

MAPPFinder

MAPPFinder

Global comparison of changes in dataset to changes expected by

chance

Experimental Data Gene Ontology termsGenMAPP Pathways

Pathways and GO terms with significant changes

Originally developed as a separate application by Scott Doniger*

* Doniger et al. Genome Biology 4(1):R7

Page 17: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

MAPPFinder Browser

Page 18: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

MAPPFinder Browser

Page 19: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

GenMAPP Relationship SchemaGenMAPP Relationship Schema

Pathway MAPP

User Dataset (GEX)

Criterion Gene ID

Blue 1415904_at

Gene ID System

Affymetrix

Gene Name Gene ID

Lpl 16956

Gene ID System

EntrezGene

Page 20: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

GenMAPP Supported Species

Fruit flyHumanMouseRatWormYeastZebrafishChicken DogCow

By request:Chimp FrogFugu  F.rubripesHoney beeMosquito Pufferfish T.nigroviridis

Page 21: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

GenMAPP Supported Gene IDs

Annotations

InterProEMBLOMIMPfamGene Ontology

Species-specific

MGIRGDSGDWormBaseZFINHUGOFlyBase

Gene IDs

Affymetrix Entrez GeneRefSeq (protein only) UnigeneUniProtEnsemblPDB

Page 22: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Available MAPP Archives

Download all MAPPs through Downloader in GenMAPP

Contributed MAPPsHand-curated pathways created at GenMAPP.org or submitted by GenMAPP users. >70 MAPPs for human, mouse and rat.

Inferred MAPPs Inferred from human contributed MAPPs, using homology information from Homologene and Ensembl   

Tissue-Specific MAPPs (human and mouse only)Based on the analysis of two microarray datasets generated by the Genomic Institute of the Novartis Foundation

  GO Sample MAPPs 

An partial collection of GO terms formatted as GenMAPP MAPP files, each containing between 100 genes and 300 genes. GO MAPPs are formatted as lists of genes, and do not contain any graphics other than the gene object and the label

SGD metabolic MAPPs  (yeast only)Derived from the yeast pathways at SGD

KEGG converted MAPPs The KEGG Converted MAPPs were converted from the Pathway Resource at the Kyoto Encyclopedia of Genes and Genomes.

Page 23: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

http://www.genmapp.org/featured_mapps.html

Page 24: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Input Data

• Data in spreadsheet summary format • NO raw data• Data should include metrics that you want to use as cutoffs:

avg signal, ratio, fold, signal quality, p-value, cluster ID, other statistics

• Include ALL genes measured in experiment, DO NOT pre-filter• Choose optimal primary gene ID• Custom annotation can be useful (Database includes standard annotation)

Example: Group Comparison Experiment

• Fold changes between groups• p-value associated with fold • Average signal per group

Page 25: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

GenMAPP WorkflowGenMAPP Workflow

Import Data

Set Color Criteria

Display Data on Pathways

Gene Ontology analysis Export Pathways to the Web

Pre-Processed Formatted Data (with statistics, metrics)

Create/Edit/ConvertPathways

ExpressionExpressionDatasetDatasetManagerManager

DraftingDraftingBoardBoard

Drafting BoardDrafting BoardMAPPBuilderMAPPBuilder

ConverterConverter

MAPPFinderMAPPFinder MAPPSetsMAPPSets

Page 26: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Example: Analysis of Complex Time-Example: Analysis of Complex Time-Course DataCourse Data

Challenges:• How to represent your data in an intuitive manner• How to analyze patterns rather than specific comparisons.

Approach: • Set up hypotheses to test• Attach global statistics (e.g. ANOVA) and pattern recognition

• Efficiently import in data into GenMAPP• Visualize cluster and time-point data (GenMAPP 2.1-NEW)• Global analysis of pathway/ontologies (MAPPFinder)• Export results to the web/for publication

Page 27: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Set Up Hypotheses to TestSet Up Hypotheses to Test

Build a MAPP to Test a Hypothesis• Use literature and previous knowledge about the model you are

studying to build a list of candidates or pathway.

Step 1):• Collect a list of gene IDs• Import them using the MAPPBuilder Function• Organize into a biological pathway along with predictions of expected

changes.

Salomonis N, et al. Genome Biol. 2005 6:R12–R12.16

Page 28: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Import List of Genes in MAPPBuilderImport List of Genes in MAPPBuilder

Page 29: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Gene Layout on the Drafting BoardGene Layout on the Drafting Board

Page 30: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Example: Analysis of Complex Time-Example: Analysis of Complex Time-Course DataCourse Data

Challenges:• How to represent your data in an intuitive manner• How to analyze patterns rather than specific comparisons.

Approach: • Set up hypotheses to test• Attach global statistics (e.g. ANOVA) and pattern

recognition• Efficiently import in data into GenMAPP• Visualize cluster and time-point data (GenMAPP 2.1-NEW)• Global analysis of pathway/ontologies (MAPPFinder)• Export results to the web/for publication

Page 31: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Dataset: Mouse Uterine Pregnancy Dataset: Mouse Uterine Pregnancy Time-CourseTime-Course

Experiment Design:• Analyzed 7 time-points (3-8 replicates):

• Non-Pregnant mice• 14.5, 16.5 and 17.5 days post fertilization • 18.5 days (term pregnancy)• 6 hours and 24 hours postpartum

• Hybridized to mouse 11k Affymetrix arrays

Analysis:• Normalized and Adjusted expression (gcrma R)• Performed a global f-test (multtest R)• Hierarchical and partitioned clustering (hopach R)

Salomonis N, et al. Genome Biol. 2005 6:R12–R12.16

Page 32: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

HOPACH ClusteringHOPACH ClusteringHierarchical Ordered Partitioning and Collapsing HybridHierarchical Ordered Partitioning and Collapsing Hybrid

1. Use global f-test to filter probeset list down to 3500 entries.

2. Cluster fold changes for each replicate compared to non-pregnant baseline mean.

3. Take the top level cluster (left) and re-associate with expression data.

Page 33: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Example: Analysis of Complex Time-Example: Analysis of Complex Time-Course DataCourse Data

Challenges:• How to represent your data in an intuitive manner• How to analyze patterns rather than specific comparisons.

Approach: • Set up hypotheses to test• Attach global statistics (e.g. ANOVA) and pattern recognition

• Efficiently import in data into GenMAPP• Visualize cluster and time-point data (GenMAPP 2.1-NEW)• Global analysis of pathway/ontologies (MAPPFinder)• Export results to the web/for publication

Page 34: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

GenMAPP InputGenMAPP Input

Import File Design:• Include all probe data (not just filtered)• Include the following columns of data

• Multtest p-values• HOPACH clusters• Average group expression values• Fold changes (all relevant pair wise comparisons)• Gene Database system code

Salomonis N, et al. Genome Biol. 2005 6:R12–R12.16

Page 35: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

GenMAPP InputGenMAPP Input

Page 36: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

GenMAPP Expression Dataset GenMAPP Expression Dataset ManagerManager

Import Text File into GenMAPP• Tell GenMAPP which columns have non-numeric data.

Establishing Rules for Coloring Gene Boxes:• Design criterion that captures any patterns you want to see.• Here we want:

• Fold change gradients for up and down regulated for time-point comparisons (Color Sets)

• Different colors assigned to each HOPACH cluster

Salomonis N, et al. Genome Biol. 2005 6:R12–R12.16

Page 37: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

GenMAPP Expression Dataset GenMAPP Expression Dataset ManagerManager

Page 38: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

GenMAPP Expression Dataset GenMAPP Expression Dataset ManagerManager

Page 39: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Example: Analysis of Complex Time-Example: Analysis of Complex Time-Course DataCourse Data

Challenges:• How to represent your data in an intuitive manner• How to analyze patterns rather than specific comparisons.

Approach: • Set up hypotheses to test• Attach global statistics (e.g. ANOVA) and pattern recognition• Efficiently import in data into GenMAPP• Visualize cluster and time-point data (GenMAPP 2.1-NEW)• Global analysis of pathway/ontologies (MAPPFinder)• Export results to the web/for publication

Page 40: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Viewing Time-Course Data on Viewing Time-Course Data on MAPPsMAPPs

Method 1)• View criterion, one at a time on pathways of

interest.

Page 41: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Single Color Set ViewSingle Color Set View

Page 42: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Single Color Set ViewSingle Color Set View

Page 43: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Viewing Time-Course Data on Viewing Time-Course Data on MAPPsMAPPs

Method 1)• View criterion, one at a time on pathways of

interest.

Method 2)• View clusters directly on pathway.

Page 44: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Single Color Set ViewSingle Color Set View

Page 45: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Single Color Set ViewSingle Color Set View

Page 46: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Viewing Time-Course Data on Viewing Time-Course Data on MAPPsMAPPs

Method 1)• View criterion, one at a time on pathways of

interest.

Method 2)• View clusters directly on pathway.

Method 3)• View all criterion of interest simultaneously.

Page 47: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Single Color Set ViewSingle Color Set View

Page 48: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Multiple Color Set ViewMultiple Color Set View

Page 49: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Example: Analysis of Complex Time-Example: Analysis of Complex Time-Course DataCourse Data

Challenges:• How to represent your data in an intuitive manner• How to analyze patterns rather than specific comparisons.

Approach: • Set up hypotheses to test• Attach global statistics (e.g. ANOVA) and pattern recognition

• Efficiently import in data into GenMAPP• Visualize cluster and time-point data (GenMAPP 2.1-NEW)• Global analysis of pathway/ontologies (MAPPFinder)• Export results to the web/for publication

Page 50: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

Advanced Features

• Customizing a Gene Database / Creating a Gene Database for a non-supported species=> Implement GenMAPP for a novel model species

• Create your own pathway MAPPs => Implement GenMAPP for a novel model species => Author novel pathways based on your discoveries

• High-throughput export of browsable html pathway archive => For interactive web-display of data on pathway archive

International Gene Trap Consortium

Page 51: A Software Tool for Analyzing Genome-Scale  Data in the Context of Biological Pathways and

GenMAPP team

The GenMAPP program can be downloaded at www.GenMAPP.org

Questions?

[email protected]@gladstone.ucsf.edu

Bruce Conklin Alex Pico Alex Zambon Karen Vranizan Nathan Salomonis Kam Dahlquist

http://groups.google.com/group/GenMAPP