gene set analyses of genomic datasets andreas schlicker jelle ten hoeve lodewyk wessels
TRANSCRIPT
![Page 1: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/1.jpg)
Gene set analyses of genomic datasets
Andreas SchlickerJelle ten HoeveLodewyk Wessels
![Page 2: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/2.jpg)
Scenario
You have a gene expression dataset containing data from normal colon and adenoma samples.
- Which pathways are differentially regulated between normal and CRC samples?
-Do products of significantly differently expressed genes have specific functions (Gene Ontology)?
-Is there a significant overlap with published expression signatures (mutations, response to treatment, ...)?
![Page 3: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/3.jpg)
Overview
• Mapping probe sets to functional annotation
• Hypergeometric test (Fisher’s exact test)
• Gene Set Enrichment Analysis
• Global test
![Page 4: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/4.jpg)
Mapping probe sets to functional annotation
![Page 5: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/5.jpg)
Examples of functional annotation
• Pathway databases (e.g. KEGG, Pathway Interaction Database, ConsensusPathDB, www.pathguide.org/)
• Functional categories (e.g. Gene Ontology, FunCat)
• Enzyme Commission numbers, disease associations, protein domains, …
• Published gene signatures
![Page 6: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/6.jpg)
Example KEGG pathway
http://www.genome.jp/kegg/kegg2.html
![Page 7: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/7.jpg)
Gene Ontology
• Collection of three separate ontologies: biological process, molecular function, cellular component
• Organized in a graph structure,
i.e. each term (concept, category) can have several parents
![Page 8: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/8.jpg)
Gene Ontology (II)
![Page 9: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/9.jpg)
Gene Ontology (III)
• Annotations with GO terms are assigned an evidence code:
G protein alpha subunit; GO:0060158 activation of phospholipase C …; ISS
• Different categories of evidence codes: experimental, computational, Author/Curator statement, fully automatic (IEA)
Details at http://www.geneontology.org/GO.evidence.shtml
![Page 10: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/10.jpg)
The true path ruleIf a gene product is annotated with term A, all annotations with ancestors of A must also be valid.
•Gene product annotated with this termIt can also be annotated with the term‘s ancestors
•Different gene products are usually not annotated on the same level of the hierarchy
![Page 11: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/11.jpg)
Hands on Time
![Page 12: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/12.jpg)
The hypergeometric test / Fisher’s exact test
![Page 13: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/13.jpg)
Basics
• Enrichment test
• Analysis steps:1. Single gene test (e.g. t-test for finding differentially expressed genes)
2. Do list (step 1) and gene sets overlap significantly?
diff. Expressed not diff. expressed
in gene set
not in gene set
![Page 14: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/14.jpg)
Example
• Microarray: 20000, MAPK: 100, diff. expressed: 200
Fisher‘s exact test p = 0.26
diff. Expressed
not diff. expressed
total
MAPK 2 98 100
not MAPK 198 19702 19900
total 200 19800 20000
![Page 15: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/15.jpg)
Example
• Microarray: 20000, MAPK: 100, diff. expressed: 200
Fisher‘s exact test p = 0.0005
diff. Expressed
not diff. expressed
total
MAPK 6 94 100
not MAPK 194 19706 19900
total 200 19800 20000
![Page 16: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/16.jpg)
Another Example
• Consider having data on treatment response and gene mutation for samples in a dataset
! Choose threshold for resistance/sensitivity
Resistant Sensitive total
Mutated
WT
total
![Page 17: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/17.jpg)
Problem with this approach
• Null hypothesis: Genes in the gene set are randomly drawn Significant result means that genes in the gene set are more alike than
random genes
• Problem: Gene set has been selected such that the genes have something in common False positives
![Page 18: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/18.jpg)
Hands on Time
![Page 19: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/19.jpg)
PAGE: Parametric Analysis of Gene Set Enrichment
![Page 20: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/20.jpg)
Basics
• For each gene set and each sample: – How different is the mean expression of all genes in a gene set from
the overall mean expression?
• Applied to full expression matrix– No need for selecting interesting genes (based on e.g. t-test)
![Page 21: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/21.jpg)
Basics
![Page 22: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/22.jpg)
Problem with this approach
• What happens if one part of the pathway is up-regulated and the another part is down-regulated?
![Page 23: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/23.jpg)
Hands on Time
![Page 24: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/24.jpg)
The global test
![Page 25: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/25.jpg)
Basics
• Group test
• Can the genes in the gene set predict the response?
• What is needed?– Clinical variable e.g. normal vs. CRC
– Gene expression e.g. GSE8671
– Gene sets e.g. KEGG pathways
![Page 26: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/26.jpg)
Interpretation
• Interpretation of significant test result (w.r.t. genes):
– Gene set is associated with clinical variable
– “On average“ the genes in the set are associated with the clinical variable
– Not every gene needs to be associated
![Page 27: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/27.jpg)
Interpretation
![Page 28: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/28.jpg)
Interpretation
• Interpretation of significant test result (w.r.t. samples):
– Expression profile in the gene set differs for different values of the clinical variable
– Samples with similar value (clinical variable) have relatively similar expression profiles
![Page 29: Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels](https://reader036.vdocuments.site/reader036/viewer/2022070412/5697bf731a28abf838c7edbb/html5/thumbnails/29.jpg)
Interpretation