course on microarray gene expression analysis ::: differential...
TRANSCRIPT
![Page 1: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/1.jpg)
Daniel Rico [email protected]
::: Differential Expression Analysis
Course on Microarray Gene Expression Analysis
Bioinformatics Unit CNIO
![Page 2: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/2.jpg)
Image analysis comparison (normalization and filtering) Data analysis
No Change
Upregulation
Downregulation
or
Upregulation Downregulation
Ratios (or not…) Log2 transform
? ?
Differential expression analysis
![Page 3: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/3.jpg)
“Toconsultasta.s.ciana0eranexperimentisfinishediso0enmerelytoaskhimtoconductapost‐mortemexamina.on.Hecanperhapssaywhattheexperimentdiedof.”
RonaldA.Fisher:IndianSta.s.calCongress,1938,vol.4,p.17
Differential expression analysis
::: Ask a statistician… or us, if you can’t find one!
ASK BEFORE DOING THE EXPERIMENTS!!!!!
![Page 4: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/4.jpg)
“Blockwhatyoucan,randomizewhatyoucannot”
Differential expression analysis
::: Principles of Experimental Design
1. Replication. It allows the experimenter to obtain an estimate of the experimental error
2. Randomization. It requires the experimenter to use a random choice for every factor that is not of interest but might influence the outcome of the experiment. Such factors are called nuisance factors. Ex.: printing of replicate spots on the array.
3. Blocking: method of createing homogeneous blocks of data in which in which the nuisance factor is kept constant and the factor of interest is allowed to vary. It is used to increase the accuracy with which the influence of the various factors is assessed in a given experiment. Ex.: the microarray slide itself.
![Page 5: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/5.jpg)
Atleast5replicatesporclase(biological!!!!!)
a)Biologicalreplicates:
b)Technicalreplicates:
Tumor1A
Tumor1B
Tumor1C
Tumor1D
Tumor1
Array1 Array2 Array3 Array4
Array1
Array2
Array3
Differential expression analysis
::: Replication
![Page 6: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/6.jpg)
Differential expression analysis
::: Randomization
Not randomized Randomized
Each gene is spotted in quadruplicate: randomize position in the slide
![Page 7: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/7.jpg)
Differential expression analysis
::: Blocking
Exp. 1
Exp. 2
Exp. 3
Control T1 T2
RNA extracts: Day 1 Day 2 Day 3
Treatment and RNA extraction days are confounded!!!
![Page 8: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/8.jpg)
Differential expression analysis
::: Blocking
Exp. 1
Exp. 2
Exp. 3
Control T1 T2 RNA extracts
Day 1
Day 2
Day 3
Make coherent blocks:
![Page 9: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/9.jpg)
Foldchangeapproachsimplyignorethisinforma.on(thatyouhave!!!)
‐ Foldchange:Expressionra.obetween2groups(ie.Tumor/control)Differen.allyexpressedgenes(DEG)areselectediftheypassasecut‐off
Ej.2.5(Schenaetal),3(DeRisi)
Thesta.s.calsignificanceofachangedependsonthevariabilityandwithingroupandbetweengroups,andthisvariability(variance)differsgreatlyforeachgene.
ClassA(control)ClassB(tumor)
Variabilidadmedia
ClassA(control)ClassB(tumor)
Variabilidadalta
ClassA(control)ClassB(tumor)
Variabilidadbaja
Totestforsignificantchanges,wemustperformasta.s.caltestforeachgenetoobtainap‐value.
Differential expression analysis
::: Fold change is NOT the way!
![Page 10: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/10.jpg)
Differential expression analysis
::: Nine steps for hypothesis testing
1. State the problem.
2. State the null and alternative hypothesis.
3. Choose the level of significance.
4. Find the appropriate statistical model and test stastistic.
5. Calculate the appropriate test statistic.
6. Determine the p-value of the test statistic (the prob. of it occurring by chance).
7. Compare the p-value with the chosen significance level.
8. Reject or do not reject Ho based on the test above.
9. Answer the question in step 1.
![Page 11: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/11.jpg)
Nonparametricmethods
‐ Appropriatewhennormalitycannontbeassumed.‐Morerobust(lesssensi.vetooutliers).‐ Lesssensi.vethanparametricmethodstodetectsignificantchanges.‐ Theyorderthedatabyexpression,andusetheranktotest. Ex.Gene63;4treatmentsand5controls;rank1,2,3,4,5,6,7,8,9
Mann‐Whitneytest.Testfordifferencesinmediansbetweentwoindependentpopula.ons.
WilcoxonSignedRanktest.Non‐parametrictestequivalenttothepairedTtestforpairedsamples(testifmedianofpaireddifferencesiszero)
Kruskal‐Wallis.Non‐parametrictestequivalenttoANOVAformorethan2popula.ons.
:::Parametric and non parametric methods
Parametricmethods
‐ Assumethatthedatafollownormaldistribu.on.
Ttest.Testdifferenceinmeansbetween2independentpopula.onswithequalvariances.WelchT‐testforunequalvariances.
PairedTtest.Ttestforpaireddata(blocksof2elements).Example:Treatmentinrightarm,le0armascontrol
ANOVA.Analysisofvariance,formorethan2popula.ons.
N(µ=12,σ=3)
![Page 12: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/12.jpg)
Differential expression analysis
::: T test
http://en.wikipedia.org/wiki/Student%27s_t-test
A t-test is any statistical hypothesis test in which the test statistic has a Student's t distribution if the null hypothesis is true. It is applied when the population is assumed to be normally distributed but the sample sizes are small enough that the statistic on which inference is based is not normally distributed because it relies on an uncertain estimate of standard deviation rather than on a precisely known value.
The overall shape of the probability density function of the t-distribution resembles the bell shape of a normally distributed variable with mean 0 and variance 1, except that it is a bit lower and wider. As the number of degrees of freedom grows, the t-distribution approaches the normal distribution with mean 0 and variance 1.
The following images show the density of the t-distribution for increasing values of ν. The normal distribution is shown as a blue line for comparison.; Note that the t-distribution (red line) becomes closer to the normal distribution as ν increases. For ν = 30 the t-distribution is almost the same as the normal distribution.
![Page 13: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/13.jpg)
Differential expression analysis
::: T test
http://www.socialresearchmethods.net/kb/stat_t.php
http://en.wikipedia.org/wiki/Student%27s_t-test
Difference between group means
Test statistic
Pooled standard deviation
![Page 14: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/14.jpg)
Differential expression analysis
::: Exercise 1: T test with Excel
1. Open the file T_test_with_Excel.xls
2. Observe the expression data for the gene AC002378 in controls (C) and tumors (T).
3. See the formula for the “pooled SD” (Standard Deviation).
4. Calculate the t value for the difference between C and T averages (use formula above). Hints: n1 is 6, n2 is 6, square root in Excel is: SQRT().
5. Use the function TDIST() to calculate the p-value (probility of observing this value of t by chance. Hint: degrees of freedom for a T test are:
n1 + n2 – 2.
where:
http://en.wikipedia.org/wiki/Pooled_standard_deviation
Pooled Standard Deviation
![Page 15: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/15.jpg)
n
variables
Classicsta.s.calanalysis
variables
n
Sta.s.calanalysisinmicroarrayscenario
Differential expression analysis
::: Probems in identifying DEGs with microarrays
![Page 16: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/16.jpg)
Adequateforsmallsamplesizes(n).Beieres.ma.onofvariance,borrowinginforma.onfromothergenes.Giveslessfalseposi.vesthanstandardiestAllowspairedanalysis,co‐variatesandANOVA(RandAsterias‐PomeloII)
“Assumesnormalitybutperformswellgenerally”(Kim2006)
variables
n
Differential expression analysis
::: Probems in identifying DEGs with microarrays
SAM (Statistical Analysis of Microarrays, Tusher 2001): another good alternative based on permutations, but need more replicates
![Page 17: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/17.jpg)
20normalizedarrays1000genes2classes(healthyytumor)
Differen.allyexpressedgenesbetweenclasses
TtestWilcoxon´stestSAMLimmaetc
METODO
Example
pvalue
Differential expression analysis
::: Differential expression analysis
![Page 18: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/18.jpg)
≠
= ?¿Differential expression analysis
::: Multiple testing: is a monkey able to write a sentence of “El Quijote”?
![Page 19: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/19.jpg)
Werunintothemul.pletes.ngproblem:Wearenottes.ngonehypotheses,butmanyhypothesesoneforeachgene.
1) 10independentgenes.So,wehave10nullhypotheses,oneforeachgene.
2)Nosignificantdifferencesingeneexpressionbetween2classes(H0istrue).Thus,theprobabilitythatapar.culartest(say,forgene3)isdeclaredsignificantatlevel0.05isexactly0.05...Good(ProbofrejectH0in1testifH0istrue=0.05)
3)However,theprobabilityofdeclaringatleastoneofthe10hypothesesfalse(i.e.rejec.ngatleastone,orfindingatleastoneresultsignificant)is:
Suppose:
Source:R.DíazUriarte
![Page 20: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/20.jpg)
Themoregenes,themoreseriousistheproblem.
Pr(atleastonenullrejected)=1‐Pr(allp>0.05)=
1–Pr(1‐0.05)10=1‐0.9510=0.401
Insummary,withoutcontrolformul.pletes.ngwewouldenduprejec.ngthenullmuchmoreo0enthanweshould.
Inourexample....1000genes...imaginethenumberoffalseposi.vesthatwewouldgetwithoutpvaluesadjustment...
Source:R.DíazUriarte
![Page 21: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/21.jpg)
Differential expression analysis
::: Exercise 2: Multiple testing with random data
1. Open a new spreadsheet in Excel.
2. Use the function rand() to generate random numbers between 0 and 1.
3. Generate a random matrix of 6 columns and 100 rows. Select the matrix and “Paste special” the values in another sheet.
4. Considering that the first 3 columns are controls and the other 3 are treatments, calculate a p-value with ttest(). Assume equal variances and select two tails. We will choose the level of significance to be 0.05.
5. Order the data by p-value. How many “genes” would be significantly expressed?
6. And if you extend the random matrix to 10,000 rows?
![Page 22: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/22.jpg)
ControlofFWER(prob.atlist1falseposi.ve,conserva.vemethods)
BonferroniHolm´sBonferroniStep‐DownWesrall&Youngpermuta.on
ControlofFDR(rateoffalseposi.vesintheresultsliberalmethods)
Benjamini&HochbergBenjamini&Yeku.eli
FWER:TypeIFamilyWiseErrorRateFDR:FalseDiscoveryRate
WewanttocalculatethenumberofH0thatwehavedeclaredfalse(Falseposi.ves)
Wemustadjustp‐valuesformul.pletes.ng…How??
![Page 23: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/23.jpg)
Differen.allyexpressedgenesbetweenclasses
Ttest,SAM,etc
ControldeFWERControldeFDR
MÉTODO
Ajustedepvalores
FWER:TypeIFamilyWiseErrorRateFDR:FalseDiscoveryRate
OK!pvalor
Differential expression analysis
::: Differential expression analysis
20normalizedarrays1000genes2classes(healthyytumor)
![Page 24: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/24.jpg)
![Page 25: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/25.jpg)
EXAMPLE:mul.ple‐tes.ngresults.
We must used the FDR adjusted p-values! Publictools:
Asterias–POMELOIIGEPAS‐TRex
![Page 26: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/26.jpg)
Class1Class2
iestcut‐off
FDR<0.05
FDR<0.05
Sta.s.calanalysis‐DEG
...tes.nggenesindependently...
Biologicalmeaning?
Up-regulated
Down-regulated
FatiGO
T statistic
+
-
![Page 27: Course on Microarray Gene Expression Analysis ::: Differential …bioinfo.cnio.es/files/training/Microarray_Course/4_UBio... · 2011-01-17 · Non parametric methods ... freedom grows,](https://reader034.vdocuments.site/reader034/viewer/2022043020/5f3c01c3ad5e8c2c515d07a9/html5/thumbnails/27.jpg)
T statistic
-
+
ClassA ClassB Gene Set 1
ttest cut-off
Gene Set 2
Gene Set 3
Gene set 3 enriched in Class B
Gene set 2 enriched in Class A
Gene Set Enrichment Analysis - GSEA -
::: Fatiscan and GSEA approach