m reich - genomespace
DESCRIPTION
Presentation at BOSC2012 by M Reich - GenomeSpaceTRANSCRIPT
Michael ReichBroad Institute of Harvard and MIT
July 14, 2012
Gene mutation causing Leigh Syndrome French Canadian Type (LFSC) and 8 other mitochondrial diseases
Integrate: candidate genomic region, mitochondrial proteomic data, and cancer expression compendium
Authors: Mootha et al. 2003, Calvo et al. 2006Subtle repression of oxphos genes in diabetic muscle: role for mitochondrial dysregulation in diabetes pathology; new computational approach Gene Set Enrichment Analysis
Integrate: Gene sets/pathways & processes with expression profiles
Authors: Mootha et al. 2003, Subramanian & Tamayo et al. 2005
IKBKE as a new breast cancer oncogene
Integrate: RNAi screens, transformation of activated kinases, and copy number from SNP arrays of cell linesAuthors: Boehm et al. 2007
~3000 novel, large non-coding RNAs with functions in development, the immune response and cancer
Integrate: Genome sequences from 21 mammals, epigenomic maps, and expression profiles Authors: Guttman, Rinn et al. 2009
Discovery of 3 new genes involved in Glioblastoma Multiforme (NF1, ERRB2, PIK3R1); Confirmation of TP53, PTEN, EGFR, RB1, PIK3CA
Integrate: DNA sequence, copy number, methylation aberrations and expression profiles in 206 glioblastomasAuthors: TCGA Research Network 2008
Characterization of disease subtypes and improved risk stratification for medulloblastoma patients
Integrate: Copy number, expression, clinical data for 96 medulloblastoma patientsAuthors: Tamayo et al., Cho et al. 2011
Need: Insights through integrative studies
Translational Research ExampleGenePattern Cytoscape IGV/UCSC Genomica
Network
Compendium
Expression
Alterations
atcgcgtttattcgataaggatcgcgttttttcgataagg
CMAP
Add Transcription Factor track from UCSC
6
Looks close to p53 site
7 Test for similarity of
p53 and gene location
8
Extractmodule
ii
Learn p53 site/score on
promoteriv
Load compendium Show module
mapi
Show Chromosome
5
Expand +1 (include
neighbors)
4
Show network
3
Differentially Expressed
Genes
1
Idea
GSEA test enrichment
2
iii
Arrests G2/M
Conclusion
vi
Pathwayactivation
Added to GenePattern
v
ODF GMT
HTML
SIF
GCT GXP
NA gene list
GXC GRP
GFF GXA
GenePattern
Cytoscape
IGV
Genomica
CMAP
UCSC Browser
Analysis step
Analysis conclusion
Within tool
Across tools
12 steps, 6 tools, 7 transitions
6 8
2 3 ii iii
4 5 iv v
1 i
• A lightweight “connection layer” for a wide variety of integrative genomic analyseso Support for all types of resource: Web-based,
desktop, etc.o Automatic conversion of data formats between toolso Easy access to data from any locationo Any tool that joins is automatically connected to the
community of toolso Ease of entry into the environment
The Need
Cloud-based storage
API connectivity layer
3 Driving BiologicalProjects
lincRNAs
Cancer stem cells
Patient Stratification
6 Seed Tools
CytoscapeGalaxy
GenePatternGenomica
IGVUCSC Browser
New tools New Biological Projects
Online community to share diverse computational tools
www.genomespace.org
• Aimed towards non-programming users
• Support interoperability through automatic cross-tool data transfer
• Requires minimal changes to tools
www.genomespace.org
GenomeSpace Principles
GS Enabled Tools
Integrative Genomics ViewerCytoscapeGalaxyGenePattern
GenomeSpace Components
Authentication and Authorization
Genome Space Server
Data ManagerAnalysis and Tool Manager
GenomeSpace Project Data
1
2 3
geWorkbench
External Data Sources & Tools
Seed Tools
Cytoscape Galaxy GenePattern
Genomica IGV UCSC Genome Browser
New Tools
InSilicoDB(University of Brussels)
Cistrome(Dana-Farber Cancer Institute)
ArrayExpress(EBI)
geWorkbench(Columbia University)
Reactome(Ontario Institute of Cancer Research)
Recently added
In development
Using GenomeSpace
Cloud-based
filesystem
Tools and Data
SourcesActions
GenomeSpace Actions
GenomeSpace Tool Enablement: IGV
GenomeSpace Tool Enablement: GenePattern
GenomeSpace Tool Enablement: GenePattern
GenomeSpace Data Source Integration: InSilico DB
Other collaborating projects
• Taverna/MyExperiment (University of Manchester)
• National Center for Biomedical Ontology (Stanford University)
DBP3: Studying the regulatory control of human hematopoiesis
DBP3: Studying the regulatory control of human hematopoiesis – Overview
42
2
1 1 1 1 1
2
1
Part 1: Data pre-processingand quality control
Part 3: Studying thetranscriptional program
Part 2: Basic analysis
2
To part 2
From part 11
1
3
2
Genomica
GenePattern
IGV
Cytoscape
4Analysis step (# steps)
Analysis conclusion
Currently integrated
Not yet integrated
Analysis section
Manual step
Galaxy
Optional choices
12
To part 3
2
2
2
3
1
4
1
1
1
2
3 2
1
1
3
Part 4: cis-regulatorysite analysis
2
1
To part 4
From part 2
Frompart 2
1
2
From part 3
To part 5
Part 5: Finding newtranscription factor
regulators
1
2
From part 4
2
2
23
1
1 3+
Part 5: Finding newtranscription factor
regulators
2
From part 4
2
2
3Step 1
Step 2
Step 3
Step 4
Step 1: Create transcription factor dataset in Genomica and save to GenomeSpace
Step 2: Send transcription factor datasets into GenePattern
Step 2: Perform differential expression analysis in GenePattern
Step 3: Send differentially expressed genes to Genomica
Step 3: perform module network analysis in Genomica
Step 4: Visualize regulators with known SNPs and linkage regions
Step 4: Visualize regulators with known SNPs and linkage regions
Clustered
30
Deployment ArchitectureAmazon
ExternalData Sources(e.g., Arrary Express)
Data Manager (DM)
Analysis Task Manager (ATM)
SimpleDB
S3 File transfers
GS ClientsClustered
IdentityService (OpenID)
RE
ST
RE
ST
Pro
ven
an
ce
Deployment Architecture and APIs
GS UI
IGV
RE
ST
RE
ST
Genomica
CD
K
Gene-Pattern
CD
K
Galaxy
Cytoscape
RE
ST
CD
K
UCSC
RE
ST
Join the GenomeSpace community
• Researchers with biological projects• Developers
– Add your tools– Contribute format converters– Build new infrastructure
• Data portals and repositories– Link your resources
Acknowledgements
Broad InstituteTed LiefeldHelga ThorvaldsdottirJim RobinsonMarco OcanaEliot PolkJill Mesirov, PI
GenomeSpace CollaboratorsCytoscape: Trey Ideker Lab, UCSDGalaxy: Anton Nekrutenko Lab, Penn State University Genomica: Eran Segal Lab, Weizmann InstituteUCSC Browser TeamGenePattern TeamIGV Team
Driving Biological ProjectsHoward Chang Lab – Stanford UniversityAviv Regev Lab – Broad Institute Funding