annotation-agnostic differential expression analysis
TRANSCRIPT
Annotation-agnostic differential expression analysis
Leonardo Collado-Torres @fellgernon #ENAR2016
www.slideshare.net/lcolladotor
Mo#va#ngproblem:iden#fyandvalidateregionsofthegenomethatchangeexpressionwhenanalyzing#ssueswithpoten#allyincompletetranscriptomeannota#on
RNA-seq reads
Genome (DNA)
RNA transcripts (many possible variants)
Measuring gene expression: RNA-seq
Adapted from @jtleek
Genome (DNA)
Mapped reads
Adapted from @jtleek
Commonanalysispipelines:
• Featurecoun#ng(geneorexonlevel)• Transcriptassembly
Challenges in counting
hBp://www-huber.embl.de/users/anders/HTSeq/doc/count.html
Annotation variation
Frazee et al, Biostatistics, 2014
DER finder approach
• Findcon#guousbasepairswithDifferen#alExpressionsignalàDERegionsorDERs
• Findnearestannotatedfeature
Frazee et al, Biostatistics, 2014
coverage vector 2 6 0 11 6
Genome (DNA)
Read coverage
Adapted from @jtleek
Jaffe et al, Nat. Neuroscience, 2015
Finding DERs by expressed-regions
CBC:28
MD:24STR:28AMY:31HIP:32
DFC:34
Total N samples: 487
BrainSpan data
CoverageDatafromBrainSpan:hBp://download.allenins#tute.org/brainspan/MRF_BigWig_Gencode_v10/
VFC:30 MFC:32 OFC:30 M1C:25
S1C:26 IPC:33 A1C:30 STC:35 ITC:33
V1C:33
• Data:3#ssues(liver,tes#s,heart),8sampleseach• Alignwith
• Iden#fyexpressedregionswithderfinder– Adjustcoverage(40million)– Findexpressedregions(cutoff5)– DiscardERs<9bp
GTEx: DERs via expressed-regions
Presence of intronic ERs
CanstrictlyintronicERsdifferen#ate#ssues?
PCs differentiate tissues
Differential intronic ERs | exonic ERs
Differential intronic ERs | exonic ERs
Differential intronic ERs | exonic ERs
Simulation setup 3replicates:
2groups,eachwith5samples~2millionpaired-endreadsforchr171/6high,1/6lowingroup2vsgroup1
Annota#on:
completemissing20%oftranscripts(8.28%exons)
Referenceset:
3868exonsthatoverlaponly1transcript
Simulation results
• Similarpowertomethodsthathavecompleteannota#on
• Methodswithincorrectannota#onlosealotofpower• HigherempiricalFDR/FPR
Collado-Torres et al, F1000Research, 2015
regionReport
Mo#va#ngproblem:iden#fyandvalidateregionsofthegenomethatchangeexpressionwhenanalyzing#ssueswithpoten#allyincompletetranscriptomeannota#onderfinderpermitsdiscoveryofnovelexpressedregions1. weiden#fiedexpressedintronicregionsthat
differen#ate#ssuesindependentlyofthenearestexonicregion
2. wehavedevelopedtoolsforreproducible/shareablerepor#ng
Acknowledgements
Hopkins Jeffrey Leek Alyssa Frazee Abhinav Nellore Chris Wilks Ben Langmead
LIBD Andrew Jaffe Jooheon Shin Nikolay Ivanov Amy Deep Ran Tao Yankai Jia Thomas Hyde Joel Kleinman Daniel Weinberger
Harvard Rafael Irizarry Michael Love Funding NIH LIBD CONACyT México
References + software + code • Collado-Torres, et al. bioRxiv (2015) doi:10.1101/015370
– http://bioconductor.org/packages/derfinder – http://lcolladotor.github.io/derSupplement/
• Collado-Torres, et al. F1000Research (2015) doi:10.12688/f1000research.6379.1
- http://www.bioconductor.org/packages/regionReport - http://lcolladotor.github.io/regionReportSupp/
• Nellore, Collado-Torres, et al. bioRxiv (2015) doi:10.1101/019067
- rail.bio • Nellore, …, Collado-Torres, et al. bioRxiv (2016) doi:10.1101/038224
- intropolis.rail.bio
• Jaffe, Shin, Collado-Torres, et al. Nat. Neurosci. (2015) doi:10.1038/nn.3898 – https://github.com/lcolladotor/libd_n36 – https://github.com/lcolladotor/enrichedRanges
• Frazee, et al. Biostatistics. (2014) doi:10.1093/biostatistics/kxt053 – https://github.com/leekgroup/derfinder