course on microarray gene expression analysis ::: differential...

27
Daniel Rico [email protected] ::: Differential Expression Analysis Course on Microarray Gene Expression Analysis Bioinformatics Unit CNIO

Upload: others

Post on 11-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

Daniel Rico [email protected]

::: Differential Expression Analysis

Course on Microarray Gene Expression Analysis

Bioinformatics Unit CNIO

Page 2: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

Image analysis comparison (normalization and filtering) Data analysis

No Change

Upregulation

Downregulation

or

Upregulation Downregulation

Ratios (or not…) Log2 transform

? ?

Differential expression analysis

Page 3: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

“Toconsultasta.s.ciana0eranexperimentisfinishediso0enmerelytoaskhimtoconductapost‐mortemexamina.on.Hecanperhapssaywhattheexperimentdiedof.”

RonaldA.Fisher:IndianSta.s.calCongress,1938,vol.4,p.17

Differential expression analysis

::: Ask a statistician… or us, if you can’t find one!

ASK BEFORE DOING THE EXPERIMENTS!!!!!

Page 4: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

“Blockwhatyoucan,randomizewhatyoucannot”

Differential expression analysis

::: Principles of Experimental Design

1.  Replication. It allows the experimenter to obtain an estimate of the experimental error

2.  Randomization. It requires the experimenter to use a random choice for every factor that is not of interest but might influence the outcome of the experiment. Such factors are called nuisance factors. Ex.: printing of replicate spots on the array.

3.  Blocking: method of createing homogeneous blocks of data in which in which the nuisance factor is kept constant and the factor of interest is allowed to vary. It is used to increase the accuracy with which the influence of the various factors is assessed in a given experiment. Ex.: the microarray slide itself.

Page 5: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

Atleast5replicatesporclase(biological!!!!!)

a)Biologicalreplicates:

b)Technicalreplicates:

Tumor1A

Tumor1B

Tumor1C

Tumor1D

Tumor1

Array1 Array2 Array3 Array4

Array1

Array2

Array3

Differential expression analysis

::: Replication

Page 6: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

Differential expression analysis

::: Randomization

Not randomized Randomized

Each gene is spotted in quadruplicate: randomize position in the slide

Page 7: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

Differential expression analysis

::: Blocking

Exp. 1

Exp. 2

Exp. 3

Control T1 T2

RNA extracts: Day 1 Day 2 Day 3

Treatment and RNA extraction days are confounded!!!

Page 8: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

Differential expression analysis

::: Blocking

Exp. 1

Exp. 2

Exp. 3

Control T1 T2 RNA extracts

Day 1

Day 2

Day 3

Make coherent blocks:

Page 9: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

Foldchangeapproachsimplyignorethisinforma.on(thatyouhave!!!)

‐ Foldchange:Expressionra.obetween2groups(ie.Tumor/control)Differen.allyexpressedgenes(DEG)areselectediftheypassasecut‐off

Ej.2.5(Schenaetal),3(DeRisi)

Thesta.s.calsignificanceofachangedependsonthevariabilityandwithingroupandbetweengroups,andthisvariability(variance)differsgreatlyforeachgene.

ClassA(control)ClassB(tumor)

Variabilidadmedia

ClassA(control)ClassB(tumor)

Variabilidadalta

ClassA(control)ClassB(tumor)

Variabilidadbaja

Totestforsignificantchanges,wemustperformasta.s.caltestforeachgenetoobtainap‐value.

Differential expression analysis

::: Fold change is NOT the way!

Page 10: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

Differential expression analysis

::: Nine steps for hypothesis testing

1.  State the problem.

2.  State the null and alternative hypothesis.

3.  Choose the level of significance.

4.  Find the appropriate statistical model and test stastistic.

5.  Calculate the appropriate test statistic.

6.  Determine the p-value of the test statistic (the prob. of it occurring by chance).

7.  Compare the p-value with the chosen significance level.

8.  Reject or do not reject Ho based on the test above.

9.  Answer the question in step 1.

Page 11: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

Nonparametricmethods

‐ Appropriatewhennormalitycannontbeassumed.‐Morerobust(lesssensi.vetooutliers).‐ Lesssensi.vethanparametricmethodstodetectsignificantchanges.‐ Theyorderthedatabyexpression,andusetheranktotest. Ex.Gene63;4treatmentsand5controls;rank1,2,3,4,5,6,7,8,9

Mann‐Whitneytest.Testfordifferencesinmediansbetweentwoindependentpopula.ons.

WilcoxonSignedRanktest.Non‐parametrictestequivalenttothepairedTtestforpairedsamples(testifmedianofpaireddifferencesiszero)

Kruskal‐Wallis.Non‐parametrictestequivalenttoANOVAformorethan2popula.ons.

:::Parametric and non parametric methods

Parametricmethods

‐ Assumethatthedatafollownormaldistribu.on.

Ttest.Testdifferenceinmeansbetween2independentpopula.onswithequalvariances.WelchT‐testforunequalvariances.

PairedTtest.Ttestforpaireddata(blocksof2elements).Example:Treatmentinrightarm,le0armascontrol

ANOVA.Analysisofvariance,formorethan2popula.ons.

N(µ=12,σ=3)

Page 12: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

Differential expression analysis

::: T test

http://en.wikipedia.org/wiki/Student%27s_t-test

A t-test is any statistical hypothesis test in which the test statistic has a Student's t distribution if the null hypothesis is true. It is applied when the population is assumed to be normally distributed but the sample sizes are small enough that the statistic on which inference is based is not normally distributed because it relies on an uncertain estimate of standard deviation rather than on a precisely known value.

The overall shape of the probability density function of the t-distribution resembles the bell shape of a normally distributed variable with mean 0 and variance 1, except that it is a bit lower and wider. As the number of degrees of freedom grows, the t-distribution approaches the normal distribution with mean 0 and variance 1.

The following images show the density of the t-distribution for increasing values of ν. The normal distribution is shown as a blue line for comparison.; Note that the t-distribution (red line) becomes closer to the normal distribution as ν increases. For ν = 30 the t-distribution is almost the same as the normal distribution.

Page 13: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

Differential expression analysis

::: T test

http://www.socialresearchmethods.net/kb/stat_t.php

http://en.wikipedia.org/wiki/Student%27s_t-test

Difference between group means

Test statistic

Pooled standard deviation

Page 14: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

Differential expression analysis

::: Exercise 1: T test with Excel

1.  Open the file T_test_with_Excel.xls

2.  Observe the expression data for the gene AC002378 in controls (C) and tumors (T).

3.  See the formula for the “pooled SD” (Standard Deviation).

4.  Calculate the t value for the difference between C and T averages (use formula above). Hints: n1 is 6, n2 is 6, square root in Excel is: SQRT().

5.  Use the function TDIST() to calculate the p-value (probility of observing this value of t by chance. Hint: degrees of freedom for a T test are:

n1 + n2 – 2.

where:

http://en.wikipedia.org/wiki/Pooled_standard_deviation

Pooled Standard Deviation

Page 15: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

n

variables

Classicsta.s.calanalysis

variables

n

Sta.s.calanalysisinmicroarrayscenario

Differential expression analysis

::: Probems in identifying DEGs with microarrays

Page 16: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

Adequateforsmallsamplesizes(n).Beieres.ma.onofvariance,borrowinginforma.onfromothergenes.Giveslessfalseposi.vesthanstandardiestAllowspairedanalysis,co‐variatesandANOVA(RandAsterias‐PomeloII)

“Assumesnormalitybutperformswellgenerally”(Kim2006)

variables

n

Differential expression analysis

::: Probems in identifying DEGs with microarrays

SAM (Statistical Analysis of Microarrays, Tusher 2001): another good alternative based on permutations, but need more replicates

Page 17: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

20normalizedarrays1000genes2classes(healthyytumor)

Differen.allyexpressedgenesbetweenclasses

TtestWilcoxon´stestSAMLimmaetc

METODO

Example

pvalue

Differential expression analysis

::: Differential expression analysis

Page 18: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

= ?¿Differential expression analysis

::: Multiple testing: is a monkey able to write a sentence of “El Quijote”?

Page 19: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

Werunintothemul.pletes.ngproblem:Wearenottes.ngonehypotheses,butmanyhypothesesoneforeachgene.

1)  10independentgenes.So,wehave10nullhypotheses,oneforeachgene.

2)Nosignificantdifferencesingeneexpressionbetween2classes(H0istrue).Thus,theprobabilitythatapar.culartest(say,forgene3)isdeclaredsignificantatlevel0.05isexactly0.05...Good(ProbofrejectH0in1testifH0istrue=0.05)

3)However,theprobabilityofdeclaringatleastoneofthe10hypothesesfalse(i.e.rejec.ngatleastone,orfindingatleastoneresultsignificant)is:

Suppose:

Source:R.DíazUriarte

Page 20: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

Themoregenes,themoreseriousistheproblem.

Pr(atleastonenullrejected)=1‐Pr(allp>0.05)=

1–Pr(1‐0.05)10=1‐0.9510=0.401

Insummary,withoutcontrolformul.pletes.ngwewouldenduprejec.ngthenullmuchmoreo0enthanweshould.

Inourexample....1000genes...imaginethenumberoffalseposi.vesthatwewouldgetwithoutpvaluesadjustment...

Source:R.DíazUriarte

Page 21: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

Differential expression analysis

::: Exercise 2: Multiple testing with random data

1.  Open a new spreadsheet in Excel.

2.  Use the function rand() to generate random numbers between 0 and 1.

3.  Generate a random matrix of 6 columns and 100 rows. Select the matrix and “Paste special” the values in another sheet.

4.  Considering that the first 3 columns are controls and the other 3 are treatments, calculate a p-value with ttest(). Assume equal variances and select two tails. We will choose the level of significance to be 0.05.

5.  Order the data by p-value. How many “genes” would be significantly expressed?

6.  And if you extend the random matrix to 10,000 rows?

Page 22: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

ControlofFWER(prob.atlist1falseposi.ve,conserva.vemethods)

BonferroniHolm´sBonferroniStep‐DownWesrall&Youngpermuta.on

ControlofFDR(rateoffalseposi.vesintheresultsliberalmethods)

Benjamini&HochbergBenjamini&Yeku.eli

FWER:TypeIFamilyWiseErrorRateFDR:FalseDiscoveryRate

WewanttocalculatethenumberofH0thatwehavedeclaredfalse(Falseposi.ves)

Wemustadjustp‐valuesformul.pletes.ng…How??

Page 23: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

Differen.allyexpressedgenesbetweenclasses

Ttest,SAM,etc

ControldeFWERControldeFDR

MÉTODO

Ajustedepvalores

FWER:TypeIFamilyWiseErrorRateFDR:FalseDiscoveryRate

OK!pvalor

Differential expression analysis

::: Differential expression analysis

20normalizedarrays1000genes2classes(healthyytumor)

Page 24: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,
Page 25: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

EXAMPLE:mul.ple‐tes.ngresults.

We must used the FDR adjusted p-values! Publictools:

Asterias–POMELOIIGEPAS‐TRex

Page 26: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

Class1Class2

iestcut‐off

FDR<0.05

FDR<0.05

Sta.s.calanalysis‐DEG

...tes.nggenesindependently...

Biologicalmeaning?

Up-regulated

Down-regulated

FatiGO

T statistic

+

-

Page 27: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,

T statistic

-

+

ClassA ClassB Gene Set 1

ttest cut-off

Gene Set 2

Gene Set 3

Gene set 3 enriched in Class B

Gene set 2 enriched in Class A

Gene Set Enrichment Analysis - GSEA -

::: Fatiscan and GSEA approach