http:// we got differentially expressed genes, now what ?

40
// www.aitbiotech.com/images/microarray.jpg //www.pnas.org/content/104/51/20374/F4.large.jpg ot differentially expressed genes, now what ? function, enriched, reduce false positive rom gene-lists to functional annotation 1

Upload: athena-zimmerman

Post on 15-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Http://  We got differentially expressed genes, now what ?

http://www.aitbiotech.com/images/microarray.jpghttp://www.pnas.org/content/104/51/20374/F4.large.jpg

We got differentially expressed genes, now what ?Find function, enriched, reduce false positive

From gene-lists to functional annotations

1

Page 2: Http://  We got differentially expressed genes, now what ?

• Molecular Function = elemental activity/task– the tasks performed by individual gene products;

examples are carbohydrate binding and ATPase activity

• Biological Process = biological goal or objective– broad biological goals, such as dna repair or purine

metabolism, that are accomplished by ordered assemblies of molecular functions

• Cellular Component = location or complex– subcellular structures, locations, and macromolecular

complexes; examples include nucleus, telomere, and RNA polymerase II holoenzyme

The 3 Gene Ontologies

Modified from: http://anil.cchmc.org/Intro_FunGen_Feb2008_Jegga.ppt#287,33,Slide 33

2

Page 3: Http://  We got differentially expressed genes, now what ?

Function (what) Process (why)

Drive a nail - into wood Carpentry

Drive stake - into soil Gardening

Smash a bug Pest Control

A performer’s juggling object Entertainment

Example: Gene = hammer

http://anil.cchmc.org/Intro_FunGen_Feb2008_Jegga.ppt#284,34,Slide 34

3

Page 5: Http://  We got differentially expressed genes, now what ?

Known Disease Genes

Direct Interactions of Disease Genes

Mining human interactome

Which of these interactants are potential new candidates?

Indirect Interactions of Disease Genes

7

66

778

Prioritize candidate genes in the interacting partners of the disease-related genes

•Training sets: disease related genes

•Test sets: interacting partners of the training genes

http://anil.cchmc.org/Intro_FunGen_Feb2008_Jegga.ppt#337,47,Slide 47

5

Page 6: Http://  We got differentially expressed genes, now what ?

Database

Panther

ToppGene

STRING

GOTM

Onto-Tools

TF networks (P.A.I.N.T)

http://www.pantherdb.org

6A Small example of post-microarray analysis tools:

Page 7: Http://  We got differentially expressed genes, now what ?

PANTHER™ Protein Classification System

7

http://www.pantherdb.org

Page 8: Http://  We got differentially expressed genes, now what ?

WHAT CAN I DO ON THE PANTHER SITE?

Protein ANalysis Through Evolutionary RelationshipsGoal: The PANTHER site was designed to facilitate functional analysis of large numbers of genes, proteins or transcripts.

Tools:

• Explore protein families functionality, molecular functions, biological processes and pathways.

• Generate lists of genes, proteins or transcripts that belong to a given protein family or subfamily, have a given molecular function or participate in a given biological process or pathway, e.g. generate a candidate gene list for a disease.

• Analyze lists of genes in a batch mode, proteins or transcripts according to categories based on family, molecular function, biological process or pathway, e.g. analyze mRNA microarray data.

8

Page 9: Http://  We got differentially expressed genes, now what ?

http://nar.oxfordjournals.org/cgi/content/full/31/1/334http://genome.cshlp.org/content/13/9/2129.fullhttp://nar.oxfordjournals.org/cgi/content/full/33/suppl_1/D284http://nar.oxfordjournals.org/cgi/content/full/35/suppl_1/D247 

9

Page 10: Http://  We got differentially expressed genes, now what ?

http://www.pantherdb.org/sitemap.jsp

Single gene search

Batch gene search

10

Page 11: Http://  We got differentially expressed genes, now what ?

11

1788_S_AT36651_AT41788_I_AT35595_AT36285_AT39586_AT35160_AT39424_AT

USP1DDR1WNT10BPRKAR1BMLLCD44GNA13MMP15IER3

http://david.abcc.ncifcrf.gov/tools.jsp

Convert Gene list ID Affy ID Gene symbol

Page 12: Http://  We got differentially expressed genes, now what ?

12

http://david.abcc.ncifcrf.gov/tools.jspPaste the AffyID listSelect AFFY_ID as ID typeSelect List type: Gene ListSubmit list

Select HOMO SAPIENS as species, press the select buttonChoose the Gene ID Conversion ToolSelect: GENE_SYMBOL, submit and download the results

Page 13: Http://  We got differentially expressed genes, now what ?

13

Perform Panther Batch Search:

Copy the gene symbol list and paste into the Batch search in Pantherhttp://www.pantherdb.org/ => Batch SearchSelect upload ID type: Gene SymbolSelect File Type: ID listResult page: GenesSelect 1 datasets: NCBI: H. sapiens Press the Search buttonPress in the and select: Biological process

Page 14: Http://  We got differentially expressed genes, now what ?

Panther Export Options

14

Click on either Pie slices or Bars to get sub-functions.Click on links to get gene lists for the chosen function.

Page 15: Http://  We got differentially expressed genes, now what ?

http://www.pantherdb.org/genes/

15

Other Panther Options

Page 16: Http://  We got differentially expressed genes, now what ?

http://www.pantherdb.org/panther/ontologies.jsp

Task: find genes in a specific ontology (or in a few ontologies)

Panther vs GO molecular function and biological process

Browse for genes in ontologies

16

Other Panther Options

Page 17: Http://  We got differentially expressed genes, now what ?

Search PANTHER Pathway

http://www.pantherdb.org/pathway/

Add legend to pathway

17

Other Panther Options

Page 18: Http://  We got differentially expressed genes, now what ?

Compare classifications of multiple clusters of lists to a reference list to statistically determine over- or under- representation of PANTHER classification categories. Each list is compared to the reference list using the binomial test (Cho & Campbell, TIGs 2000) for each molecular function, biological process, or pathway term in PANTHER.

Map the genes in a gene expression data file to a PANTHER ontology. For pathways, you can then view the gene expression values overlaid on top of a pathway diagram, where genes are colored according to the expression value.

http://www.pantherdb.org/tools/

Gene expression tools

18

Other Panther Options

Page 19: Http://  We got differentially expressed genes, now what ?

19

optional

defaultPlay with graphics

- GRAPHIC RESULTSOther Panther Options

Page 20: Http://  We got differentially expressed genes, now what ?

http://toppgene.cchmc.org/http://toppgene.cchmc.org/help/help.jsp

Portal for (i) gene list functional enrichment(ii) Candidate gene prioritization using either functional

annotations or network analysis(iii) identification and prioritization of novel disease candidate

genes in the interactome.

20

Page 21: Http://  We got differentially expressed genes, now what ?

http://nar.oxfordjournals.org/cgi/reprint/gkp427v1 Hypergeometric distribution with Bonferroni correction

21

Page 22: Http://  We got differentially expressed genes, now what ?

22

http://stattrek.com/Tables/Hypergeometric.aspx

What is a hypergeometric experiment?

A hypergeometric experiment has the following characteristics:Population size N, out of which M items are success.The researcher randomly selects a subset of n items from a population. Question: what is the probability that k selected item are success ?

What is a hypergeometric distribution?

A hypergeometric distribution is a probability distribution. It refers to the probabilities associated with the number of successes in a hypergeometric experiment. Example:We have a pack of 52 cards (26 black, success). We randomly select 12 cards out of 52. What is the probability of having 7 successes (black) ? (0.21)

Hypergeometric calculator results

Hypergeometric calculator:

Just 2 clarification slides….

Page 23: Http://  We got differentially expressed genes, now what ?

Statistical Corrections

http://cbi.labri.fr/outils/BlastSets/BlastSets_web_manual/principles.html

In many analysis of biological experiments, a great number of false positives are found among the results. When making multiple comparisons, we need to apply a statistical correction to our threshold, to remove the maximum of false positives.

Commonly available statistical corrections:

23

Method Complexity Time Method Results Drawback

Bonferroni correction

simplest fastest Most conservative keeping only the most significant results, removing every possible noise, or putative results.

a lot of significant information is removed along with the noise

False Discovery Rate (FDR)

Less conservative a good compromise between keeping only really significant hits, and having too much false positives.

Some false positives…

When detecting differentially expressed genes, we want to detect ONLY the differentially expressed, with no false positives !

Page 24: Http://  We got differentially expressed genes, now what ?

24

Page 25: Http://  We got differentially expressed genes, now what ?

25

Example:

Go to ToppGene web-page: http://toppgene.cchmc.org/

Choose ToppFun link

Copy the gene symbol list and paste into the provided box, make sure that entry

name is HGNC symbol, press the Submit Query button.

Go to bottom of page, choose FDR correction method to all features, and submit.

Observe details of the results, each at a time.

Page 26: Http://  We got differentially expressed genes, now what ?

Example: a. Using ToppFun for gene list enrichment analysis :Construct a gene list enrichment analysis on obesity-associated genes

26

Page 27: Http://  We got differentially expressed genes, now what ?

27

Page 28: Http://  We got differentially expressed genes, now what ?

28

Page 29: Http://  We got differentially expressed genes, now what ?

b. Using ToppGene for disease gene prioritization based on functional similarity to training set genesQuery: To rank or prioritize a list of genes (test set) by functional annotation similarity to training set.

29

Calculates score and p-value for the genes and functions.

Page 30: Http://  We got differentially expressed genes, now what ?

c. Using ToppNet for disease gene prioritization based on topological features in protein-protein interactions network (PPIN)Query: To rank or prioritize a list of genes (test set) based on topological features in PPIN.

30

Page 31: Http://  We got differentially expressed genes, now what ?

31

Page 32: Http://  We got differentially expressed genes, now what ?

d. Using ToppGenet to identify and prioritize the neighboring genes of the "seeds" or training set in protein-protein interactions network (PPIN)Query: To rank or prioritize a list of genes in the interactome of training set genes using either functional similarity (ToppGene) or PPIN analysis (ToppNet).

Create network by functional similarity (ToppGene) or network analysis (ToppNet). Distance to seeds: 1, the test set comprises all genes that are immediate interactants of the training set genes.purple nodes are the training set or seed genes.grey nodes are the interactants from the test set. The green nodes (subset of the grey ones) are the top ranked ones from the test set genes.

32

Page 33: Http://  We got differentially expressed genes, now what ?

STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) (functional connectivity within a proteome)

http://string-db.org/

STRING is a database and web resource dedicated to protein–protein interactions, including both physical and functional interactions. It weights and integrates information from numerous sources, including experimental repositories, computational prediction methods and public text collections, thus acting as a meta-database that maps all interaction evidence onto a common set of genomes and proteins.

Version 8.0 of STRING covers about 2.5 million proteins from 630 organisms

Databases:MINT, HPRD, BIND, DIP, BioGRID, KEGG and Reactome, IntAct, EcoCyc , NCI-Nature Pathway Interaction Database and Gene Ontology (GO) protein complexes. SGD, OMIM , The Interactive Fly, and all abstracts from PubMed

33

A shift of focus to system biology in the “post-genomic” era

Page 34: Http://  We got differentially expressed genes, now what ?

34

Page 35: Http://  We got differentially expressed genes, now what ?

http://bioinfo.vanderbilt.edu/gotm/

35

http://bioinfo.vanderbilt.edu/gotm/GOTM_Manual.pdf

Page 36: Http://  We got differentially expressed genes, now what ?

Bar graphPathway details

Input details Pathway gene details(all genes in pathway)

36

Page 37: Http://  We got differentially expressed genes, now what ?

The apoptosis pathway as described by KEGG

Underexpressed genesOverexpressed genes

37

Page 38: Http://  We got differentially expressed genes, now what ?

http://www.dbi.tju.edu/dbi/tools/paint/

38

TF networks (P.A.I.N.T)TF networks (P.A.I.N.T)

Page 39: Http://  We got differentially expressed genes, now what ?

SUSPECTS is a server designed to automate the first steps of the candidate gene approach. http://www.genetics.med.ed.ac.uk/suspects/search.shtml

BRCA1

The 3D boxes represent genes. Higher, brighter boxes represent better (higher scoring) candidates. The width of a box corresponds to the number of different types of evidence that contribute to its score. If a box is blue then a potentially relevant PubMed abstract has been found.

39

Page 40: Http://  We got differentially expressed genes, now what ?

http://www.genetics.med.ed.ac.uk/prospectr/

BRCA1:

PROSPECTR uses sequence features to rank genes in order of their likelihood of involvement in disease;

40