8. gene expression analysis microarrays - 2 · 2013-10-30 · 7 . computational tasks “gene...

56
8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 BIOINFORMATICS COURSE MTAT.03.239 30.10.2012

Upload: others

Post on 15-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2

BIOINFORMATICS COURSE MTAT.03.239

30.10.2012

Page 2: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

GENE EXPRESSION ANALYSIS MICROARRAYS

Slides adapted from Konstantin Tretyakov’s 2011/2012 and Kaur Alasoo’s 2012/2013 year slides

Page 3: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

3 “Gene expression analysis - microarrays" Bioinformatics Course

FLOW OF GENETIC INFORMATION

http://www.nature.com/scitable/topicpage/gene-expression-14121669

Page 4: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

4 “Gene expression analysis - microarrays" Bioinformatics Course

GENE EXPRESSION is the presence of the gene’s product in the cell in the form of a protein or mRNA

Page 5: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

5 “Gene expression analysis - microarrays" Bioinformatics Course

FLOW OF GENETIC INFORMATION

gene expression http://www.nature.com/scitable/topicpage/gene-expression-14121669

Page 6: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

6

QUESTIONS FOR GENE EXPRESSION

“Gene expression analysis - microarrays" Bioinformatics Course

How gene expression differs in different cell types? How gene expression differs in normal vs diseased cell (cancer)? How gene expression changes occur during organisms life span? How gene expression is regulated – which genes regulate which and how? How gene expression changes when a cell is treated by a drug?

http://www.cs.helsinki.fi/bioinformatiikka/mbi/courses/06-07/pcmda/slides/Microarrays_Brazma_lecture1.pdf

Page 7: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

7

COMPUTATIONAL TASKS

“Gene expression analysis - microarrays" Bioinformatics Course

Differential expression which genes have different expression levels across two groups?

Clustering which genes seem to be regulated together? which treatment/individuals have similar expression profiles?

Classification to which functional class does a given gene belong to? to which class does a given sample belong to? (e.g. determine the cancer type)?

Visualization How to show these visually?

http://pages.cs.wisc.edu/~bsettles/ibs08/lectures/04-expressionanalysis.pdf

Page 8: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

8 “Gene expression analysis - microarrays" Bioinformatics Course

MANY WAYS OF LOOKING AT DATA There is no right answer to view at data, use your imagination and invent new approaches.

Page 9: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

9 “Gene expression analysis - microarrays" Bioinformatics Course

EXAMPLE DATA

Page 10: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

10

EXAMPLE DATASET

“Gene expression analysis - microarrays" Bioinformatics Course

> library(ArrayExpress)

> library(affy)

# Download the experiment files

> affydata = ArrayExpress("E-GEOD-31215")

# Normalize the data

> normdata = rma( affydata )

> expdata = exprs( normdata )

# Set CEL file groups (the same order as in the expression matrix)

> k <- c( "ewsfli1", "empty", "ewsfli1", "empty", "ewsfli1", "empty", "ewsfli1", "empty" )

Page 11: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

11

PREPROCESSING (SINGLE CHANNEL)

“Gene expression analysis - microarrays" Bioinformatics Course

Background correction PM/MM probes, against GC content

Normalization Key assumption: most probes are not differentially expressed; distribution of intensities is approximately equal across arrays.

Summarization from probes to probesets (approximately, genes)

http://www.bioconductor.org/help/course-materials/2010/SeattleJan10/day2/PreProcessing.pdf

Page 12: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

12

COMPUTATIONAL TASKS

“Gene expression analysis - microarrays" Bioinformatics Course

Differential expression which genes have different expression levels across two groups?

Clustering which genes seem to be regulated together? which treatment/individuals have similar expression profiles?

Classification to which functional class does a given gene belong to? to which class does a given sample belong to? (e.g. determine the cancer type)?

Visualization How to show these visually?

http://pages.cs.wisc.edu/~bsettles/ibs08/lectures/04-expressionanalysis.pdf

Page 13: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

13

DIFFERENTIAL EXPRESSION

“Gene expression analysis - microarrays" Bioinformatics Course

To understand the effect of a drug we might be interested to know what genes are up-regulated (increased in expression) or down-regulated (decreased in expression) between treatment and control groups?

Find genes with different expression between conditions

Page 14: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

14

DIFFERENTIAL EXPRESSION METHODS

“Gene expression analysis - microarrays" Bioinformatics Course

• use a t-test or it’s derivates

• Limma R package > library(limma)

• RankProd R package > library(RankProd)

• Fold change

Page 15: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

15

DIFFERENTIAL EXPRESSION METHODS

“Gene expression analysis - microarrays" Bioinformatics Course

> library(limma)

> mm = model.matrix(~ as.factor( k ) - 1)

> colnames(mm) = c( "empty", "ewsfli1" )

> fit = lmFit(expdata, mm)

> contr = contr <- makeContrasts( ewsfli1 - empty, levels = colnames(mm) )

> fit = contrasts.fit(fit,contr)

> fit = eBayes(fit)

> dT = decideTests(fit, adjust.method="fdr", p.value=0.05)

> tT = topTable( fit, number = 10000000 )

> up = tT[ tT$logFC > 0 & tT$adj.P.Val <= 0.05, "ID" ]

> down = tT[ tT$logFC < 0 & tT$adj.P.Val <= 0.05, "ID" ]

> table( dT )

dT

-1 0 1

17 54495 163

Page 16: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

16

COMPUTATIONAL TASKS

“Gene expression analysis - microarrays" Bioinformatics Course

Differential expression which genes have different expression levels across two groups?

Clustering which genes seem to be regulated together? which treatment/individuals have similar expression profiles?

Classification to which functional class does a given gene belong to? to which class does a given sample belong to? (e.g. determine the cancer type)?

Visualization How to show these visually?

http://pages.cs.wisc.edu/~bsettles/ibs08/lectures/04-expressionanalysis.pdf

Page 17: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

17

CO-EXPRESSION

“Gene expression analysis - microarrays" Bioinformatics Course

Find similarly behaving genes using correlation or distance metrics

use dist() for distance measures in R use cor() for correlation measures in R

Unsupervised data exploration – clustering

use hclust() for hierarchical clustering in R use kmeans() for k-means clustering in R

Page 18: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

18

HOW TO SEE MANY DIMENSIONS

> rsamp = sample( 1:nrow( expdata ), 25 )

> expdata.sample = expdata[rsamp,]

> image( t( expdata.sample ) )

Page 19: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

19

HOW TO SEE MANY DIMENSIONS

> rsamp = sample( 1:nrow( expdata ), 100 )

> expdata.sample = expdata[rsamp,]

> image( t( expdata.sample ) )

> heatmap( expdata.sample )

Page 20: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

20

HOW TO SEE MANY DIMENSIONS

> plot( expdata.sample[1,], type = "l", xlab = "Experiments", ylab = "intensity", ylim = c( 2, 11 ) )

> for( i in 2:50 ) lines( expdata.sample[i, ], col=i ) “Gene expression analysis - microarrays"

Bioinformatics Course

Page 21: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

21

PROJECT INTO 2-DIMENSIONS

> plot( expdata.sample[,1], expdata.sample[,2], xlab = colnames( expdata.sample)[1], ylab = colnames( expdata.sample)[2] )

“Gene expression analysis - microarrays" Bioinformatics Course

Page 22: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

22

PRINCIPAL COMPONENT ANALYSIS (PCA)

> pc = prcomp( expdata, retx = TRUE )

> plot( pc )

> plot( pc$x[,c(1,2)]) “Gene expression analysis - microarrays"

Bioinformatics Course

Page 23: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

23

PRINCIPAL COMPONENT ANALYSIS (PCA)

> pc = prcomp( t(expdata), retx = TRUE )

> plot( pc )

> plot( pc$x[,c(1,2)]) “Gene expression analysis - microarrays"

Bioinformatics Course

Page 24: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

24

PRINCIPAL COMPONENT ANALYSIS (PCA) [5372 chips] http://www.nature.com/nbt/journal/v28/n4/fig_tab/nbt0410-322_F1.html

Page 25: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

KEGGANIM http://biit.cs.ut.ee/kegganim/

Page 26: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

26

DISTANCE BETWEEN GENES Euclidean distance

> dist( expdata.sample[c(1:2),], method = "euclidean" )

Correlation distance

> covariance = ( sum( ( x - mean( x ) ) * ( y - mean( y ) ) ) ) / ( length( x ) - 1 )

> pearson = cov( x, y )/( sd( x ) * sd( y ) )

> 1 - pearson

> covariance = cov( x, y )

> pearson = cor.test( x, y )

> 1 - pearson$estimate

Page 27: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

27

CORRELATION

“Gene expression analysis - microarrays" Bioinformatics Course

Page 28: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

28 “Gene expression analysis - microarrays" Bioinformatics Course

CLUSTERING is grouping genes so that similar genes are in the same group and genes different from each other are in separate groups

Page 29: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

29 “Gene expression analysis - microarrays" Bioinformatics Course

HIERARCHICAL CLUSTERING

> c = hclust( dist( expdata.sample, "euclidean" ), "complete" )

Page 30: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

30

K-MEANS CLUSTERING

> pc = prcomp( expdata , retx = TRUE ) # PCA

> c = kmeans( expdata, 20) # kmeans clustering

> plot( pc$x[,c(1,2)], col = c$cluster )

Page 31: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

31

K-MEANS CLUSTERING

> c$size

[1] 1288 5279 807 330 6332 1742 2338 600 2484 2556 3587 7102 357 173 6387 2592 375 2754 3114 4478

Page 32: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

32

K-MEANS CLUSTERING

> par( mfrow=c(3,4))

> for( i in 9:20 ) plot( c$centers[i,], type = "l", col = i )

Page 33: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

33

K-MEANS CLUSTERING

> par( mfrow=c(3,4))

> for( i in 9:20 ) plot( c$centers[i,], type = "l", col = i )

Page 34: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

> par( mfrow=c(1,1))

> my.expdata = expdata[ which( c$cluster == 14 ), ]

> for ( i in 1:nrow( my.expdata )) my.expdata[i,] = my.expdata[i,] - mean( my.expdata[i,] )

> plot( my.expdata[1,], type = "l" )

> for( i in 2:nrow( my.expdata ) ) lines( my.expdata[i, ], col=i )

K-MEANS CLUSTERING

Page 35: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

35 “Gene expression analysis - microarrays" Bioinformatics Course

CLUSTERING

http://www.bioconductor.org/packages/release/BiocViews.html#___Clustering

Page 36: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

36 “Gene expression analysis - microarrays" Bioinformatics Course

MEM http://biit.cs.ut.ee/mem/index.cgi

Page 37: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

37

COMPUTATIONAL TASKS

“Gene expression analysis - microarrays" Bioinformatics Course

Differential expression which genes have different expression levels across two groups?

Clustering which genes seem to be regulated together? which treatment/individuals have similar expression profiles?

Classification to which functional class does a given gene belong to? to which class does a given sample belong to? (e.g. determine the cancer type)?

Visualization How to show these visually?

http://pages.cs.wisc.edu/~bsettles/ibs08/lectures/04-expressionanalysis.pdf

Page 38: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

38 “Gene expression analysis - microarrays" Bioinformatics Course

FUNCTIONAL ANALYSIS mapping genes or genomic regions to biological annotations like ontology categories, different pathways, diseases states (e.g. giving insight into genes function in biological processes and physiological states)

Page 39: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

39 “Gene expression analysis - microarrays" Bioinformatics Course

GENE ONTOLOGY Tries to unify the representation of gene and gene product attributes across all species

Aims to:

• Maintain and develop its controlled vocabulary of gene and gene product attributes • Annotate genes and gene products, and assimilate and disseminate annotation data • Provide tools for easy access to all aspects of the data

Page 41: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

41 “Gene expression analysis - microarrays" Bioinformatics Course

GENE ONTOLOGY > up "205440_s_at" "1554384_at" "203936_s_at" "1554385_a_at" "235874_at" "217561_at" "223644_s_at” "204818_at" "202746_at" "205306_x_at" "209616_s_at" "211138_s_at" "203929_s_at" "208072_s_at” "205826_at" "222919_at" "230650_at" "223809_at" "206645_s_at" "235944_at" "220948_s_at” "229850_at" "241436_at" "206401_s_at" "219837_s_at" "228863_at" "202419_at" "205535_s_at” "201648_at" "208433_s_at" "228715_at" "204201_s_at" "219427_at" "1569256_a_at" "209685_s_at” "213201_s_at" "240950_s_at" "238417_at" "202747_s_at" "1558279_a_at" "206190_at" "205656_at" "204087_s_at" "1552508_at" "209791_at" "207957_s_at" "206326_at" "210941_at" "227289_at" "227115_at" "205307_s_at" "243856_at" "204916_at" "206191_at" "219551_at" "239297_at" "204229_at" "217783_s_at" "204364_s_at" "218976_at" "228224_at" "216080_s_at" "229139_at" "228737_at" "205534_at" "227610_at" "213933_at" "229485_x_at" "210964_s_at" "235079_at" "230593_at" "202709_at" "207178_s_at" "224963_at" "209541_at" "202023_at" "204223_at" "214455_at" "202421_at" "242817_at" "224959_at" "215695_s_at" "225379_at" "235924_at" "218182_s_at" "205818_at" "219908_at" "229040_at" "227875_at" "217495_x_at" "205227_at" "39966_at" "225564_at" "219806_s_at" "225864_at" "45288_at" "227405_s_at" "206595_at" "224178_s_at" "204365_s_at" "222379_at" "229383_at" "226865_at" "209652_s_at" "1553878_at" "209757_s_at" "205932_s_at" "205899_at" "220108_at" "204105_s_at" "213368_x_at" "225619_at" "201976_s_at" "206002_at" "228262_at" "205097_at" "228214_at" "227498_at" "37986_at" "229242_at" "227750_at" "203928_x_at" "206915_at" "230839_at" "221011_s_at" "238455_at" "57588_at" "227933_at" "201562_s_at" "212397_at" "214807_at" "221552_at" "232136_s_at” "210904_s_at" "228640_at" "228981_at" "205637_s_at" "202637_s_at" "204140_at" "236193_at" "228955_at" "218162_at" "239537_at" "218831_s_at" "213353_at" "223366_at" "215043_s_at” "201418_s_at" "219343_at" "219892_at" "205051_s_at" "227497_at" "227995_at" "213644_at" "221530_s_at" "226106_at" "229041_s_at" "227647_at" "227536_at" "220094_s_at" "222760_at" "229580_at" "231887_s_at" > down "210119_at" "226722_at" "219308_s_at" "227565_at" "217997_at" "201810_s_at" "226873_at" "207604_s_at" "201983_s_at" "212298_at" "202795_x_at" "1556308_at" "220260_at" "212642_s_at" "217996_at" "1555216_a_at" "210296_s_at"

Page 42: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

42 “Gene expression analysis - microarrays" Bioinformatics Course

http://biit.cs.ut.ee/gprofiler

Page 43: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

43 “Gene expression analysis - microarrays" Bioinformatics Course

http://biit.cs.ut.ee/gprofiler

Page 44: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

44 “Gene expression analysis - microarrays" Bioinformatics Course

http://biit.cs.ut.ee/gprofiler

Page 45: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

45 “Gene expression analysis - microarrays" Bioinformatics Course

http://biit.cs.ut.ee/gprofiler

UP-REGULATED GENES

Page 46: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

46 “Gene expression analysis - microarrays" Bioinformatics Course

http://biit.cs.ut.ee/gprofiler

DOWN-REGULATED GENES

Page 47: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

47 “Gene expression analysis - microarrays" Bioinformatics Course

MEASURE OF “INTERESTINGESS”

P[a randomly chosen cluster will have at least as many group representatives as our cluster]

Page 48: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

48 “Gene expression analysis - microarrays" Bioinformatics Course

HYPERGEOMETRIC DISTRIBUTION is a discrete probability distribution that describes the probability k success in n draws from a finite population of size N containing m successes without replacement

Page 49: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

49 “Gene expression analysis - microarrays" Bioinformatics Course

HYPERGEOMETRIC DISTRIBUTION

Page 50: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

50

HYPERGEOMETRIC DISTRIBUTION is a discrete probability distribution that describes the probability k success in n draws from a finite population of size N containing m successes without replacement GO id is “GO:0045596” n - number of genes in GO – 433 m - number of up-regulated genes – 88 k - number of up-regulated genes in GO – 10 N - total number of genes – 14611

> 1 - phyper( 10, 433, 14611 - 433, 88 )

[1] 5.600853e-05

> dhyper( 10, 433, 14611 - 433, 88 )

[1] 0.0002137613

Page 51: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

51 “Gene expression analysis - microarrays" Bioinformatics Course

ANNOTATING CLUSTERS For each functional category

• Count how many genes in cluster • Count how many genes in category total • Estimate probability to get same results randomly (p-value)

Leave those categories whose p-value is smaller than 0.05

Page 52: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

52 “Gene expression analysis - microarrays" Bioinformatics Course

ANNOTATING CLUSTERS For each functional category

• Count how many genes in cluster • Count how many genes in category total • Estimate probability to get same results randomly (p-value)

Assign those categories whose p-value is smaller than 0.05

DO NOT FORGET MULTIPLE TESTING CORRECTION!

Page 53: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

53

MULTIPLE CORRECTION PROBLEM The problem of multiplicity arises from the fact that as we increase the number of hypotheses in a test, we also increase the likelihood of witnessing a rare event, and therefore, the chance to reject the null hypotheses when it's true. With probability 0.05 we will assign cluster to 5 categories out of 100 by random chance. With probability 0.05 we will assign cluster to 5000 categories out of 100 000 by random chance. > dT = decideTests(fit, adjust.method="none", p.value=0.05)

> table( dT )

dT -1 0 1

2102 50459 2114

Page 54: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

54 “Gene expression analysis - microarrays" Bioinformatics Course

BONFERRONI CORRECTION The cut-off p-value that determines significant assignments is divided by the number of tests.

only consider categories with p-value <= 0.05 / 100 000 For 100 000 GO categories Leave those categories whose p-value <= 5e-07 > pvalues = tT[, "P.Value"]

> p.adjust( pvalues, method="bonferroni", n = length( pvalues ) )

Page 55: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

55 “Gene expression analysis - microarrays" Bioinformatics Course

FALSE DISCOVERY RATE Designed to control the expected proportion of incorrectly rejected null hypotheses (“false discoveries”).

To control FDR at level δ

• Order the unadjusted p-values: p1 ≤ p2 ≤ … ≤ pm • Find the test with the highest rank, j, for which the p-value, pj, is less than or equal to (j*m) / δ

• Declare the test of rank 1, 2, …, j as significant, reject the others

> pvalues = tT[, "P.Value"]

> p.adjust( pvalues, method=“BH", n = length( pvalues ) )

Page 56: 8. GENE EXPRESSION ANALYSIS MICROARRAYS - 2 · 2013-10-30 · 7 . COMPUTATIONAL TASKS “Gene expression analysis - microarrays" Bioinformatics Course Differential expression which

56 “Gene expression analysis - microarrays" Bioinformatics Course

FALSE DISCOVERY RATE > pvalues = tT[ , "P.Value"] # a vector of p-values

> pvalues = sort( pvalues ) # sort the p-values in ascending order

> ord = order( pvalues ) # an order vector of the p-values

> padj = ( ord / length( pvalues) ) * 0.05 # adjusted p-value

> table( pvalues <= padj )

FALSE TRUE

54495 180