oncomine: a bioinformatics infrastructure for cancer genomics dan rhodes chinnaiyan laboratory...
Post on 20-Dec-2015
221 views
TRANSCRIPT
ONCOMINE: A ONCOMINE: A Bioinformatics Bioinformatics Infrastructure for Cancer Infrastructure for Cancer GenomicsGenomics
Dan RhodesDan RhodesChinnaiyan LaboratoryChinnaiyan LaboratoryBioinformatics ProgramBioinformatics ProgramCancer Biology Training ProgramCancer Biology Training ProgramMedical Scientist Training ProgramMedical Scientist Training ProgramUniversity of Michigan Medical SchoolUniversity of Michigan Medical School
OutlineOutline
BackgroundBackground– DNA Microarrays and the Cancer TranscriptomeDNA Microarrays and the Cancer Transcriptome
ONCOMINEONCOMINE– Data collection, normalization & storageData collection, normalization & storage– Statistical AnalysisStatistical Analysis– Visualization of Data and AnalysisVisualization of Data and Analysis
ONCOMINE Data IntegrationONCOMINE Data Integration– Therapeutic Targets / BiomarkersTherapeutic Targets / Biomarkers– Metabolic and Signaling PathwaysMetabolic and Signaling Pathways– Known protein-protein InteractionsKnown protein-protein Interactions
ONCOMINE tutorialONCOMINE tutorial
OutlineOutline
BackgroundBackground– DNA Microarrays and the Cancer TranscriptomeDNA Microarrays and the Cancer Transcriptome
ONCOMINEONCOMINE– Data collection, normalization & storageData collection, normalization & storage– Statistical AnalysisStatistical Analysis– Visualization of Data and AnalysisVisualization of Data and Analysis
ONCOMINE Data IntegrationONCOMINE Data Integration– Therapeutic Targets / BiomarkersTherapeutic Targets / Biomarkers– Metabolic and Signaling PathwaysMetabolic and Signaling Pathways– Known protein-protein InteractionsKnown protein-protein Interactions
ONCOMINE tutorialONCOMINE tutorial
The Cancer The Cancer TranscriptomeTranscriptome
The Cancer The Cancer TranscriptomeTranscriptome
The Cancer The Cancer TranscriptomeTranscriptome
The Cancer The Cancer TranscriptomeTranscriptome
The Cancer The Cancer TranscriptomeTranscriptome
The Cancer The Cancer TranscriptomeTranscriptome
The Cancer The Cancer TranscriptomeTranscriptome
The Cancer The Cancer TranscriptomeTranscriptome
The Cancer The Cancer TranscriptomeTranscriptome
The Cancer The Cancer TranscriptomeTranscriptome
The Cancer The Cancer TranscriptomeTranscriptome 180+ studies profiling human 180+ studies profiling human
cancercancer Each profiling 5 – 100+ samplesEach profiling 5 – 100+ samples We estimate > 10,000 We estimate > 10,000
microarraysmicroarrays 10k chips measuring 20k genes10k chips measuring 20k genes = 200+ million data points= 200+ million data points
OutlineOutline
BackgroundBackground– DNA Microarrays and the Cancer TranscriptomeDNA Microarrays and the Cancer Transcriptome
ONCOMINEONCOMINE– Data collection, normalization & storageData collection, normalization & storage– Statistical AnalysisStatistical Analysis– Visualization of Data and AnalysisVisualization of Data and Analysis
ONCOMINE Data IntegrationONCOMINE Data Integration– Therapeutic Targets / BiomarkersTherapeutic Targets / Biomarkers– Metabolic and Signaling PathwaysMetabolic and Signaling Pathways– Known protein-protein InteractionsKnown protein-protein Interactions
ONCOMINE tutorialONCOMINE tutorial
OncomineOncomineoncology + data-mining = oncomineoncology + data-mining = oncomine
105 independent datasets (90 analyzed)105 independent datasets (90 analyzed) 7,292 cancer microarrays7,292 cancer microarrays 79 million gene expression measurements79 million gene expression measurements 382 distinct cancer signatures382 distinct cancer signatures > 5 million tests of differential expression> 5 million tests of differential expression > 5 million tests of gene set enrichment> 5 million tests of gene set enrichment > 5 billion pairwise correlations> 5 billion pairwise correlations
Database – relational, Oracle 9.2Database – relational, Oracle 9.2 Statistical computing – R, Perl, Statistical computing – R, Perl,
JavaJava Front End – Java Server PagesFront End – Java Server Pages Server – Apache/TomcatServer – Apache/Tomcat Graphics – Scalable Vector Graphics – Scalable Vector
Graphics (SVG)Graphics (SVG)
OncomineOncomine
Data CollectionData Collection
Monthly Pubmed searches (cancer + Monthly Pubmed searches (cancer + microarray + transcriptome + tumor + gene microarray + transcriptome + tumor + gene expression profiling)expression profiling)
Gene Expression RepositoriesGene Expression Repositories– Gene Expression Omnibus (GEO) (Gene Expression Omnibus (GEO) (
http://www.ncbi.nlm.nih.gov/geo/http://www.ncbi.nlm.nih.gov/geo/))– ArrayExpress (ArrayExpress (http://www.ebi.ac.uk/arrayexpress/http://www.ebi.ac.uk/arrayexpress/))– Stanford Microarray Database (Stanford Microarray Database (
http://genome-www5.stanford.edu/http://genome-www5.stanford.edu/))– Whitehead Cancer Genomics (Whitehead Cancer Genomics (http://http://
www.broad.mit.eduwww.broad.mit.edu/cancer//cancer/))
Data NormalizationData Normalization
Global normalization – same scaling Global normalization – same scaling factors applied to all microarray factors applied to all microarray features – mean and variance features – mean and variance normalizationnormalization
Affymetrix - Quantile normalizationAffymetrix - Quantile normalization Spotted cDNA - Loess normalizationSpotted cDNA - Loess normalization
– normalize an M vs. A plotnormalize an M vs. A plot
Data StorageData Storage
Generic data structures to Generic data structures to accommodate a variety of dataaccommodate a variety of data
SamplesSamples Microarray Features / GenesMicroarray Features / Genes Normalized DataNormalized Data Statistical TestsStatistical Tests Gene SetsGene Sets
SamplesSamples
SamplesSamples
Microarray Features / Microarray Features / GenesGenes
Normalized DataNormalized Data
Gene SetsGene Sets
Statistical TestsStatistical Tests
Statistical TestsStatistical Tests
OutlineOutline
BackgroundBackground– DNA Microarrays and the Cancer TranscriptomeDNA Microarrays and the Cancer Transcriptome
ONCOMINEONCOMINE– Data collection, normalization & schemaData collection, normalization & schema– Statistical AnalysisStatistical Analysis– Visualization of Data and AnalysisVisualization of Data and Analysis
ONCOMINE Data IntegrationONCOMINE Data Integration– Therapeutic Targets / BiomarkersTherapeutic Targets / Biomarkers– Metabolic and Signaling PathwaysMetabolic and Signaling Pathways– Known protein-protein InteractionsKnown protein-protein Interactions
ONCOMINE tutorialONCOMINE tutorial
Differential Expression Differential Expression AnalysisAnalysis
Two-sided t-test for each gene:Two-sided t-test for each gene:
False discovery rate correction for multiple False discovery rate correction for multiple hypothesis testinghypothesis testing
R, Oracle, RODBCR, Oracle, RODBC
OutlineOutline
BackgroundBackground– DNA Microarrays and the Cancer TranscriptomeDNA Microarrays and the Cancer Transcriptome
ONCOMINEONCOMINE– Data collection, normalization & storageData collection, normalization & storage– Statistical AnalysisStatistical Analysis– Visualization of Data and AnalysisVisualization of Data and Analysis
ONCOMINE Data IntegrationONCOMINE Data Integration– Therapeutic Targets / BiomarkersTherapeutic Targets / Biomarkers– Metabolic and Signaling PathwaysMetabolic and Signaling Pathways– Known protein-protein InteractionsKnown protein-protein Interactions
ONCOMINE tutorialONCOMINE tutorial
Oncomine Tutorial part Oncomine Tutorial part II
WWW.ONCOMINE.ORGEMAIL: SHORTCOURSEPASSWORD: MCBI
• Gene Differential Expression
• Gene Co-Expression
• Study Differential Expression
OutlineOutline
BackgroundBackground– DNA Microarrays and the Cancer TranscriptomeDNA Microarrays and the Cancer Transcriptome
ONCOMINEONCOMINE– Data collection, normalization & storageData collection, normalization & storage– Statistical AnalysisStatistical Analysis– Visualization of Data and AnalysisVisualization of Data and Analysis
ONCOMINE Data IntegrationONCOMINE Data Integration– Therapeutic Targets / BiomarkersTherapeutic Targets / Biomarkers– Metabolic and Signaling PathwaysMetabolic and Signaling Pathways– Known protein-protein InteractionsKnown protein-protein Interactions
ONCOMINE tutorialONCOMINE tutorial
Therapeutic Targets / Therapeutic Targets / BiomarkersBiomarkers Gene Ontology ConsortiumGene Ontology Consortium
– Biological Process (apoptosis, cell cycle)Biological Process (apoptosis, cell cycle)– Cellular Component (cytoplasmic Cellular Component (cytoplasmic
membrane, membrane, extracellularextracellular))– Molecular Function (Molecular Function (kinasekinase, phosphatase, , phosphatase,
protease, etc.)protease, etc.)
Known Therapeutic TargetsKnown Therapeutic Targets– NCI Clinical Trials DatabaseNCI Clinical Trials Database– Therapeutic Target DatabaseTherapeutic Target Database
Therapeutic Target Therapeutic Target DatabaseDatabase
http://xin.cz3.nus.edu.sg/group/cjttd/ttd.asp
338 proteins withLiterature-documentedInhibitor, antagonist, Blocker, etc.
Known Drug Targets Known Drug Targets Expressed in Bladder Expressed in Bladder CancerCancer
Secreted proteins highly Secreted proteins highly expressed in Ovarian expressed in Ovarian CancerCancer
OutlineOutline
BackgroundBackground– DNA Microarrays and the Cancer TranscriptomeDNA Microarrays and the Cancer Transcriptome
ONCOMINEONCOMINE– Data collection, normalization & storageData collection, normalization & storage– Statistical AnalysisStatistical Analysis– Visualization of Data and AnalysisVisualization of Data and Analysis
ONCOMINE Data IntegrationONCOMINE Data Integration– Therapeutic Targets / BiomarkersTherapeutic Targets / Biomarkers– Metabolic and Signaling PathwaysMetabolic and Signaling Pathways– Known protein-protein InteractionsKnown protein-protein Interactions
ONCOMINE tutorialONCOMINE tutorial
Metabolic & Signaling Metabolic & Signaling PathwaysPathways KEGGKEGG
– Kyoto Encyclopedia of Genes & GenomesKyoto Encyclopedia of Genes & Genomes– 87 metabolic pathways, 1700 gene assignments87 metabolic pathways, 1700 gene assignments
BiocartaBiocarta– Signaling pathways reviewed and entered by Signaling pathways reviewed and entered by
‘expert’ biologists‘expert’ biologists– 215 signaling pathways, 3700 gene assignments215 signaling pathways, 3700 gene assignments
Pathway enrichment Pathway enrichment analysisanalysis Identify pathways and functional Identify pathways and functional
groups of genes deregulated in groups of genes deregulated in particular cancer typesparticular cancer types
Enrichment Analysis using Enrichment Analysis using Kolmogrov-Smirnov Scanning Kolmogrov-Smirnov Scanning (Lamb et al)(Lamb et al)
Kolmogrov-Smirnov Kolmogrov-Smirnov Scanning Scanning (Lamb et al)(Lamb et al)
1122 **3344 **5566 **77 **8899101011111212131314141515161617171818 **19192020
(1,2,3,4…,19,20)(1,2,3,4…,19,20)
Vs.Vs.
(2,4,6,7,18)(2,4,6,7,18)
Pathway Enrichment Pathway Enrichment
Liver vs. otherNormal tissues
Pathway Enrichment Pathway Enrichment contcont
Pathway enrichment Pathway enrichment analysisanalysis
A search for the Biocartapathways most enriched ina medulloblastoma signature (C2)uncovered involvement ofthe Ras/Rho pathway
Pathway enrichment Pathway enrichment analysis cont.analysis cont.
A direct link to the Biocarta pathway provides the details(Medulloblastoma genes with red boxes)
OutlineOutline
BackgroundBackground– DNA Microarrays and the Cancer TranscriptomeDNA Microarrays and the Cancer Transcriptome
ONCOMINEONCOMINE– Data collection, normalization & storageData collection, normalization & storage– Statistical AnalysisStatistical Analysis– Visualization of Data and AnalysisVisualization of Data and Analysis
ONCOMINE Data IntegrationONCOMINE Data Integration– Therapeutic Targets / BiomarkersTherapeutic Targets / Biomarkers– Metabolic and Signaling PathwaysMetabolic and Signaling Pathways– Known protein-protein InteractionsKnown protein-protein Interactions
ONCOMINE tutorialONCOMINE tutorial
Known Protein-Protein Known Protein-Protein InteractionsInteractions HPRD HPRD
– Human Protein Reference DatabaseHuman Protein Reference Database– Manually curatedManually curated– 20,000+ papers, 15,000+ distinct interactions20,000+ papers, 15,000+ distinct interactions
PKDBPKDB– Protein Kinase DatabaseProtein Kinase Database– Natural Language Processing Natural Language Processing – 60,000+ abstracts suggest interaciton, 16,000 distinct 60,000+ abstracts suggest interaciton, 16,000 distinct
interactionsinteractions– Error proneError prone
Co-RIFCo-RIF– Locus Link Reference into FunctionLocus Link Reference into Function– 12,000+ co-RIFs12,000+ co-RIFs
Human Interactome Human Interactome Map (www.himap.org)Map (www.himap.org)
INTERACTINTERACT
OutlineOutline
BackgroundBackground– DNA Microarrays and the Cancer TranscriptomeDNA Microarrays and the Cancer Transcriptome
ONCOMINEONCOMINE– Data collection, normalization & storageData collection, normalization & storage– Statistical AnalysisStatistical Analysis– Visualization of Data and AnalysisVisualization of Data and Analysis
ONCOMINE Data IntegrationONCOMINE Data Integration– Therapeutic Targets / BiomarkersTherapeutic Targets / Biomarkers– Metabolic and Signaling PathwaysMetabolic and Signaling Pathways– Known protein-protein InteractionsKnown protein-protein Interactions
ONCOMINE tutorialONCOMINE tutorial
Oncomine Tutorial Part Oncomine Tutorial Part IIII Gene set filtering to identify therapeutic Gene set filtering to identify therapeutic
targets and biomarkerstargets and biomarkers
Enrichment Analysis to identify Enrichment Analysis to identify pathways and processes deregulated in pathways and processes deregulated in cancercancer
Pathway and protein interaction Pathway and protein interaction networks deregulated in cancernetworks deregulated in cancer
AcknowledgementsAcknowledgements
Chinnaiyan LabChinnaiyan Lab– Radhika, Terry, Vasu, Jianjun, Scott, Radhika, Terry, Vasu, Jianjun, Scott,
SoorySoory
Pandey LabPandey Lab
IOBIOB– Shanker, NandanShanker, Nandan