tauno&metsalu& biit&research&group& 12.03 · 2014. 3. 18. ·...
TRANSCRIPT
![Page 1: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/1.jpg)
Tauno Metsalu BIIT Research Group
12.03.2014
![Page 2: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/2.jpg)
Introduc;on
• One of the most common tasks with gene expression data is to find differen;ally expressed (DE) genes in two condi;ons
• Various methods for RNA-‐seq data have been proposed
• This ar;cle compares the methods both methodologically and in prac;ce
![Page 3: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/3.jpg)
Methods compared
• Cuffdiff • edgeR • DESeq • PoissonSeq • baySeq • limmaQN (quan;le normaliza;on) • limmaVoom (voom)
![Page 4: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/4.jpg)
Methodological background
![Page 5: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/5.jpg)
Star;ng point
• All methods except CuffDiff start from read counts assigned to each gene (HTSeq)
• Cuffdiff – starts from transcript level to account for different isoforms (Cufflinks)
• Some normaliza;on is needed to take different sequencing depths into account
![Page 6: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/6.jpg)
Normaliza;on (1)
• DESeq calculates scaling factor – read count for each gene over geometric mean of all read counts, and then takes median
• Cuffdiff – similar, but performs intra-‐condi;on scaling first and then inter-‐condi;ons; it also uses transcript-‐specific normaliza;on addi;onally
![Page 7: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/7.jpg)
Normaliza;on (2)
• edgeR uses trimmed means of M values (TMM) – weighted average of the subset of genes a\er excluding genes with high average read counts and/or large differences in expression between two experiments
• baySeq – uses upper quar;le (75% quan;le) to normalize library sizes
![Page 8: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/8.jpg)
Normaliza;on (3)
• PoissonSeq – least differen;ated gene set between two condi;ons is used to compute normaliza;on factors
• limmaQN – quan;le normaliza;on makes counts across all samples have the same empirical distribu;on
• limmaVoom – locally weighted scaaerplot smoothing (LOWESS) to es;mate mean-‐variance rela;on and transform read counts
![Page 9: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/9.jpg)
Sta;s;cal modeling (1)
• edgeR – uses nega;ve binomial distribu;on as a model for read counts; overdispersion factor is es;mated using both gene-‐specific and common dispersion effect
• DESeq – similar to edgeR, but overdispersion factor is es;mated using mean expression of a gene and biological expression variability
![Page 10: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/10.jpg)
Sta;s;cal modeling (2)
• Cuffdiff – separate variance model for single-‐isoform (similar to DESeq) and mul;-‐isoform genes (mixture model of nega;ve binomials)
• baySeq – Bayesian model of nega;ve binomial distribu;ons
![Page 11: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/11.jpg)
Sta;s;cal modeling (3)
• PoissonSeq – gene counts are modeled as Poisson variable where mean depends on normalized library size, expression of a gene and correla;on of the gene with respec;ve condi;on
• limmaQN and limmaVoom assume that the transformed values are ready for linear modeling (for using any sta;s;cal methods)
![Page 12: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/12.jpg)
Test for differen;al expression (1)
• edgeR and DESeq – exact test using nega;ve binomial distribu;on
• Cuffdiff – t-‐test for mean-‐variance ra;o test sta;s;c
• limmaQN and limmaVoom – moderated t-‐sta;s;c
• baySeq – posterior likelihood of DE
![Page 13: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/13.jpg)
Test for differen;al expression (2)
• PoissonSeq – tests for the significance of the correla;on between gene and condi;on using chi-‐square distribu;on
• All methods except PoissonSeq use standard FDR (Benjamini-‐Hochberg) whereas PoissonSeq implements a novel way of finding FDR
![Page 14: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/14.jpg)
Performance in prac;ce
![Page 15: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/15.jpg)
Reference datasets used
• Sequencing Quality Control (SEQC) – Replicated samples of the human whole body reference RNA and human brain reference RNA
– Spike-‐in synthe;c oligonucleo;des with different mixing ra;os
– Roughly 1000 genes validated by TaqMan qPCR
• Biological replicates from three cell lines (part of ENCODE project)
![Page 16: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/16.jpg)
Features compared
• Normaliza;on of count data • Sensi;vity and specificity of DE detec;on • Performance on the subset of genes that are expressed in one condi;on only
• Effect of reduced sequencing depth and number of replicates
![Page 17: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/17.jpg)
Correla;on with qPCR
![Page 18: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/18.jpg)
AUC with different DE cutoffs
![Page 19: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/19.jpg)
Distribu;on of p-‐values under null model
![Page 20: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/20.jpg)
Signal-‐to-‐noise vs significance for genes expressed in one condi;on
![Page 21: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/21.jpg)
False posi;ves
![Page 22: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/22.jpg)
Sensi;vity
![Page 23: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/23.jpg)
Overview
![Page 24: Tauno&Metsalu& BIIT&Research&Group& 12.03 · 2014. 3. 18. · Stas;cal&modeling&(3)& • PoissonSeq&–gene&counts&are&modeled&as& Poisson&variable&where&mean&depends&on& normalized&library&size,&expression&of&agene&](https://reader033.vdocuments.site/reader033/viewer/2022060906/60a111389286410df4530c44/html5/thumbnails/24.jpg)
Summary
• No single method was best in all comparisons • Cuffdiff performed the worst, possibly due to normaliza;on which accounts for isoforms
• Limma which is developed for expression microarray data had comparable performance
• Including more replicate samples should be preferred over increasing the number of sequencing reads