functional enrichment analysis · contingency table - example - chi square test observed expected o...
TRANSCRIPT
Functional enrichment analysis
Enrichment analysis
Does my gene list (eg. up-regulated genes between two condictions) contain more genes than expected involved in a particular pathway or biological process (eg. cell cycle)?
Hypergeometric test Fisher exact test Bionomial test (Bernouilli) Chi sq test
in listwith annotation
tested
Enrichment analysis - For lists of genes
A BC
D
in list not in list totals
with annotation A B A+B
without annotation C D C+D
A+C B+D A+B+C+D=N
in list not in list totals
with annotation (A+B)(A+C)/N (A+B)(B+D)/N A+B
without annotation (C+D)(A+C)/N (C+D)(B+D)/N C+D
A+C B+D N
Contingency table
Observed
Expected
in listwith annotation
tested (183)
12 356
130
Enrichment analysis - For lists of genes
in list not in list totals
with annotation 12 35 A+B
without annotation 6 130 C+D
A+C B+D A+B+C+D=N
in list not in list totals
with annotation (A+B)(A+C)/N (A+B)(B+D)/N A+B
without annotation (C+D)(A+C)/N (C+D)(B+D)/N C+D
A+C B+D N
Contingency table - example
Observed
Expected
in list not in list totals
with annotation 12 35 47
without annotation 6 130 136
18 165 183
in list not in list totals
with annotation 4.6 42.4 47
without annotation 13.4 122.6 136
18 165 183
Contingency table - example - Chi square test
Observed
Expected
O = Observed / E = Expected
in list not in list totals
with annotation 12 35 47
without annotation 6 130 136
18 165 183
in list not in list totals
with annotation 4.6 42.4 47
without annotation 13.4 122.6 136
18 165 183
Contingency table - example - Fisher’s exact test
Observed
Expected
Enrichment analysis - For continuous values
Blue ticks = genes with an annotation
Are the values (eg. expression values) of the genes involved in a particular pathway or biological process (eg. cell cycle) biased towards high or low values?
Enrichment analysis - Gene Set Enrichment Analysis
Annotation sources (modules)
• Gene Ontology
– biological process, molecular function, cellular component – Terms may have >1 “parent” (more general term) – GO Slim: includes only general categories
• KEGG; REACTOME pathways
• Genes sharing a motif of regulated by the same protein/miRNA (experimental or predicted)
• Genes found on the same chromosome
• Broad’s Molecular Signatures Database (MSigDB)
• [any grouping that is biologically sensible]
Gene Ontology
www.geneontology.org
Multiple Test Problem!
Enrichment analysis on multiple annotations (eg. all Gene Ontology Terms / all KEGG pathways) involves multiple statistical tests
Need to do multiple test correction
Limitations of functional enrichment analysis
• Annotation databases are incomplete
• Annotation bias in databases: some genes are more studied and in consequence more annotated than others
• Terminology problem poses a challenge to data integration
• Some pieces of information may be imprecise or incorrect: some annotations in GO are inferred from electronic annotations without any expert human involvement (mainly annotations at very high level of the ontology).