divining systems biology knowledge from high-throughput experiments using egan jesse paquette ismb...

58
Divining Systems Biology Knowledge from High-throughput Experiments Using EGAN Jesse Paquette ISMB 2010 Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center University of California, San Francisco (AKA BCBC HDFCCC UCSF)

Post on 22-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Divining Systems Biology Knowledge from High-throughput Experiments Using EGAN

Jesse PaquetteISMB 2010

Biostatistics and Computational Biology CoreHelen Diller Family Comprehensive Cancer Center

University of California, San Francisco (AKA BCBC HDFCCC UCSF)

High-throughput experiments

• This talk applies to– Expression microarrays– aCGH– SNP/CNV arrays– MS/MS Proteomics– DNA methylation– ChIP-Seq– RNA-Seq– In-silico experiments

• If parts of the output can be mapped to gene IDs– You can use EGAN

What do you hope to accomplish?

Collect data

Process data

Differential analysis Publish!

Clusters and/or gene lists

New testable hypotheses

Produce insight about the underlying biology

New grants!New papers!

Drug targets!

Leverage organic intelligence

Clusters and/or gene lists

New testable hypotheses

Produce insight about the underlying biology

Summarize

Visualize

Contextualize

Producing insight from clusters and gene lists

• Summarize: find enriched pathways (and other gene sets)– Hypergeometric over-representation

• DAVID– Global trends

• GSEA

• Visualize: gene relationships in a graph– Protein-protein interactions

• Cytoscape– Network module discovery

• Ingenuity IPA– Literature co-occurrence

• PubGene

• Contextualize: pertinent literature• PubMed• Google• iHOP

EGAN: Exploratory Gene Association Networks

• Methods: state-of-the-art analysis of clusters and gene lists– Hypergeometric enrichment of gene sets– Global statistical trends of gene sets– Hypergraph visualization (via Cytoscape libraries)– Literature identification– Network module discovery

• User Interface: responds quickly to new queries from the biologist– Sandbox-style functionality– Dynamic adjustment of p-value cutoffs– Point-and-click interface– All data in-memory for immediate access– Links to external websites

• Modular: integrates as a flexible plug-and-play cog – All data is customizable– Proprietary data can be restricted to the client location– Java runs on almost every OS (PC, Mac, LINUX)– Can be configured and launched from a different application (e.g. GenePattern)– Analyses can be scripted for automation

Gene sets

• A gene set is a a set of semantically related genes– e.g. Wnt signaling pathway

• EGAN contains a database of gene sets– > 100k gene sets by default

• KEGG, Reactome, NCI-Nature, Gene Ontology, MeSH, Conserved Domain, Cytoband, miRNA targets

– You can easily add your own• Simple file format

• Download from MSigDB (Broad Institute)

Gene-gene relationships

• EGAN also contains– Protein-protein interactions (PPI)– Literature co-occurrence– Chromosomal adjacency– Kinase-target relationships

• Other possibilities– Sequence homology– Expression correlation

Example with microarray and aCGH results

• Mirzoeva et al. (2009) Cancer Research– UCSF-LBL collaboration– Analysis of breast cancer cell lines

• Basal vs. luminal

• Discoveries in this presentation– miRNA regulator of subtype (mir-200)– Annexin (ANXA1) as potential regulator of ER,

glucocorticoid and EGFR signaling

Gene list - higher expression in basal cell lines

Gene set/pathway enrichment

Importing gene lists from publications

Combining expression with aCGH

Finding network modules

Where to find EGAN

• Website– http://akt.ucsf.edu/EGAN/

• 2010 paper in Bioinformatics– http://www.ncbi.nlm.nih.gov/pubmed/19933825

Acknowledgements• BCBC HDFCCC UCSF

– Taku Tokuyasu– Adam Olshen– Ritu Roy– Ajay Jain

• LBNL– Debopriya Das– Joe Gray

• Funding– UCSF Cancer Center Support

Grant

• UCSF– Early adopters

• Ingrid Revet• Antoine Snijders• Stephan Gysin• Sook Wah Yee• Joachim Silber

– Cytoscape gurus• David Quigley• Scooter Morris

– OTM• David Eramian• Ha Nguyen

– Laura van ’t Veer– Donna Albertson– Graeme Hodgson