![Page 1: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/1.jpg)
Introduction to RNA-Seq
David WoodWinter School in Mathematics and Computational Biology
July 1, 2013
![Page 2: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/2.jpg)
RNA is...
Central
DNA
RNA
Protein
Epigenetics
Diverse
tRNA
mRNA
rRNA
Dynamic
Time
Abundance
![Page 3: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/3.jpg)
RNA is...
QuantitativeQualitative
Understand the molecular basis of gene function. Classify and transform cellular states
Integrative
Central
DNA
RNA
Protein
Epigenetics
Diverse
tRNA
mRNA
rRNA
Dynamic
Time
Abundance
![Page 4: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/4.jpg)
RNA studies involve...
Biological System
TechnologyAvailable Resources
Questions
~/bin
Project
DB
![Page 5: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/5.jpg)
RNA studies involve...
Biological System
TechnologyAvailable Resources
Questions
~/bin
Project
DB
This talk: Focusing on reference based mammalian RNA-seq analysis
![Page 6: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/6.jpg)
pA
pA pApAATG ATG
TSS transcription start site pA polyadenylation signalprotein coding regions
ATG translation start site AAA polyadenylationnon-coding regions
genomic DNA microRNAs spliced intron
TSS TSS TSS
TSS
ATG AAA
ATG AAA
ATG AAA
ATG
ATG
ATG
AAA
AAAATG
Transcriptional Complexity
![Page 7: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/7.jpg)
pA
pA pApAATG ATG
TSS transcription start site pA polyadenylation signalprotein coding regions
ATG translation start site AAA polyadenylationnon-coding regions
genomic DNA microRNAs spliced intron
TSS TSS TSS
TSS
ATG AAA
ATG AAA
ATG AAA
ATG
ATG
ATG
AAA
AAAATG
Transcriptional Complexity
PASR miRNAtiRNA
![Page 8: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/8.jpg)
AAAAAA
Alu
pA
pA pApAATG ATG
TSS transcription start site pA polyadenylation signalprotein coding regions
ATG translation start site AAA polyadenylationnon-coding regions
genomic DNA microRNAs spliced intron
TSS TSS TSS
TSS
ATG AAA
ATG AAA
ATG AAA
ATG
ATG
ATG
AAA
AAAATG
Transcriptional Complexity
PASR miRNAtiRNA
![Page 9: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/9.jpg)
AAAAAA
Alu
pA
pA pApAATG ATG
TSS transcription start site pA polyadenylation signalprotein coding regions
ATG translation start site AAA polyadenylationnon-coding regions
genomic DNA microRNAs spliced intron
TSS TSS TSS
TSS
ATG AAA
ATG AAA
ATG AAA
ATG
ATG
ATG
AAA
AAAATG
Transcriptional Complexity
PASR miRNAtiRNA
Mutations Allelic Expression
RNA Editing
![Page 10: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/10.jpg)
pA
pA pApAATG ATGTSS TSS TSS
TSS
AAA
PASR miRNA
ATG AAA
ATG AAA
ATG AAA
ATG
ATG
ATG
AAA
AAAATG
tiRNA
RNA-seq
non-spliced reads
junction readsstrand specific Cloonan et al. Nat Methods 2008; 5:613-619
AAA
Alu
mutations
![Page 11: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/11.jpg)
Advantages of RNA-seq
!"
#!!!!"
$!!!!!"
$#!!!!"
%!!!!!"
%#!!!!"
#&" #'" (!" ($" (%" ()" (*" (#" ((" (+" (&" ('" +!" +$"
,-./
01-23
40"
5/06789":6-02;/"
<-;462/"=;>2/?" @6?-.>.A;/"
/1BCD" <06E>;?6/6"
Discoverygenes, exons, junctions,
UTRs, fusions(Present and Future)
![Page 12: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/12.jpg)
Advantages of RNA-seq
!"
#!!!!"
$!!!!!"
$#!!!!"
%!!!!!"
%#!!!!"
#&" #'" (!" ($" (%" ()" (*" (#" ((" (+" (&" ('" +!" +$"
,-./
01-23
40"
5/06789":6-02;/"
<-;462/"=;>2/?" @6?-.>.A;/"
/1BCD" <06E>;?6/6"
Discoverygenes, exons, junctions,
UTRs, fusions(Present and Future)
Dynamic Range
Mortazavi et al. Nat. Methods 2008; 5:621–628
![Page 13: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/13.jpg)
Advantages of RNA-seq
!"
#!!!!"
$!!!!!"
$#!!!!"
%!!!!!"
%#!!!!"
#&" #'" (!" ($" (%" ()" (*" (#" ((" (+" (&" ('" +!" +$"
,-./
01-23
40"
5/06789":6-02;/"
<-;462/"=;>2/?" @6?-.>.A;/"
/1BCD" <06E>;?6/6"
Discoverygenes, exons, junctions,
UTRs, fusions(Present and Future)
Dynamic Range
Mortazavi et al. Nat. Methods 2008; 5:621–628
Nucleotide Specific
![Page 14: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/14.jpg)
Typical experiment workflow
Design Experiment
Sample AcquisitionField / Clinic / Lab
Validation
VerificationSample Acquisition
Run Experiment
Obtain RNA
Make Library
Sequencing
Base Calling Mapping
Library QC
Publish
Analysis
Interpretation
1° 2°
3°
3°
2°
Field / Clinic Wet Lab Dry Lab
![Page 15: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/15.jpg)
Typical experiment workflow
Design Experiment
Sample AcquisitionField / Clinic / Lab
Validation
VerificationSample Acquisition
Run Experiment
Obtain RNA
Make Library
Sequencing
Base Calling Mapping
Library QC
Publish
Analysis
Interpretation
1° 2°
3°
3°
2°
Field / Clinic Wet Lab Dry Lab
![Page 16: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/16.jpg)
Typical experiment workflow
Design Experiment
Sample AcquisitionField / Clinic / Lab
Validation
VerificationSample Acquisition
Run Experiment
Obtain RNA
Make Library
Sequencing
Base Calling Mapping
Library QC
Publish
Analysis
Interpretation
1° 2°
3°
3°
2°
Field / Clinic Wet Lab Dry Lab
![Page 17: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/17.jpg)
Typical experiment workflow
Design Experiment
Sample AcquisitionField / Clinic / Lab
Validation
VerificationSample Acquisition
Run Experiment
Obtain RNA
Make Library
Sequencing
Base Calling Mapping
Library QC
Publish
Analysis
Interpretation
1° 2°
3°
3°
2°
Field / Clinic Wet Lab Dry Lab
![Page 18: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/18.jpg)
Library Construction
AAAAA
AAAAA
AAAAA
AAAAAA
AAAFragment
ds-cDNAsynthesis
Ligate adaptors +
Amplify
TargetRNA
rRNA (80%)
tRNA (15%)
5%
cellular RNA
Deplete rRNA
Enrich polyA RNA
Profile (ribosomes)
Capture(tiling arrays)
Sequencing
![Page 19: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/19.jpg)
Typical experiment workflow
Design Experiment
Sample AcquisitionField / Clinic / Lab
Validation
VerificationSample Acquisition
Run Experiment
Obtain RNA
Make Library
Sequencing
Base Calling Mapping
Library QC
Publish
Analysis
Interpretation
1° 2°
3°
3°
2°
Field / Clinic Wet Lab Dry Lab
![Page 20: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/20.jpg)
RNA-seq Mapping
ATG AAA
Challenge #1: Introns
![Page 21: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/21.jpg)
RNA-seq Mapping
ATG AAA
Challenge #1: Introns
Align to database of junctions or transcriptome
Wood et al. Bioinformatics 2011; 27:580–581
Split Read Alignments
Trapnell et al. Bioinformatics 2009; 25:1105-11
![Page 22: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/22.jpg)
RNA-seq Mapping
ATG AAA
Challenge #1: Introns
Challenge #2: Correctness
Sufficient OverlapSufficient Evidence
Align to database of junctions or transcriptome
Wood et al. Bioinformatics 2011; 27:580–581
Split Read Alignments
Trapnell et al. Bioinformatics 2009; 25:1105-11
![Page 23: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/23.jpg)
RNA-seq Mapping
ATG AAA
Challenge #1: Introns
Challenge #2: Correctness
Sufficient OverlapSufficient Evidence
Align to the transcriptome
Challenge #3: Multi-mappers
Sequence Similarity
Align to database of junctions or transcriptome
Wood et al. Bioinformatics 2011; 27:580–581
Split Read Alignments
Trapnell et al. Bioinformatics 2009; 25:1105-11
![Page 24: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/24.jpg)
RNA-seq Mapping
Data QC (clipping)
Align to Filter Set
Align to ‘genome’
Align to ‘junctions’
Split read Alignment
Choose Alignments, DisambiguateExclude Flag and Exclude
Tophat: Trapnell et al. Bioinformatics 2009; 25:1105-11
![Page 25: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/25.jpg)
RNA-seq Mapping
Data QC (clipping)
Align to Filter Set
Align to ‘genome’
Align to ‘junctions’
Split read Alignment
Choose Alignments, DisambiguateExclude Flag and Exclude
BAMBAM BAM Alignment Filtering
AnalysisLibrary QC
Tophat: Trapnell et al. Bioinformatics 2009; 25:1105-11
![Page 26: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/26.jpg)
RNA-seq Mapping
reference?diploid?
gene model?ESTs?
Algorithm?rRNA, tRNA?
Data QC (clipping)
Align to Filter Set
Align to ‘genome’
Align to ‘junctions’
Split read Alignment
Choose Alignments, DisambiguateExclude Flag and Exclude
BAMBAM BAM Alignment Filtering
AnalysisLibrary QC
Tophat: Trapnell et al. Bioinformatics 2009; 25:1105-11
![Page 27: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/27.jpg)
Typical experiment workflow
Design Experiment
Sample AcquisitionField / Clinic / Lab
Validation
VerificationSample Acquisition
Run Experiment
Obtain RNA
Make Library
Sequencing
Base Calling Mapping
Library QC
Publish
Analysis
Interpretation
1° 2°
3°
3°
2°
Field / Clinic Wet Lab Dry Lab
![Page 28: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/28.jpg)
Library Quality Control (QC)
AAAAA
AAAAA
AAAAA
AAAAAA
AAAFragment
ds-cDNAsynthesis
Ligate adaptors +
Amplify
TargetRNA
rRNA (80%)
tRNA (15%)
5%
cellular RNA
Deplete rRNA
Enrich polyA RNA
Profile (ribosomes)
Capture(tiling arrays)
Sequencing
![Page 29: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/29.jpg)
Library Quality Control (QC)
AAAAA
AAAAA
AAAAA
AAAAAA
AAAFragment
ds-cDNAsynthesis
Ligate adaptors +
Amplify
TargetRNA
rRNA (80%)
tRNA (15%)
5%
cellular RNA
Deplete rRNA
Enrich polyA RNA
Profile (ribosomes)
Capture(tiling arrays)
Sequencing
Affects RNA content (Expression
quantification)
![Page 30: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/30.jpg)
Library Quality Control (QC)
AAAAA
AAAAA
AAAAA
AAAAAA
AAAFragment
ds-cDNAsynthesis
Ligate adaptors +
Amplify
TargetRNA
rRNA (80%)
tRNA (15%)
5%
cellular RNA
Deplete rRNA
Enrich polyA RNA
Profile (ribosomes)
Capture(tiling arrays)
Sequencing
Affects RNA content (Expression
quantification)
Affects Insert Size (transcript
identification)
![Page 31: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/31.jpg)
Library Quality Control (QC)
AAAAA
AAAAA
AAAAA
AAAAAA
AAAFragment
ds-cDNAsynthesis
Ligate adaptors +
Amplify
TargetRNA
rRNA (80%)
tRNA (15%)
5%
cellular RNA
Deplete rRNA
Enrich polyA RNA
Profile (ribosomes)
Capture(tiling arrays)
Sequencing
Affects RNA content (Expression
quantification)
Affects Insert Size (transcript
identification)
Affects Strand Specificity
![Page 32: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/32.jpg)
Library Quality Control (QC)
AAAAA
AAAAA
AAAAA
AAAAAA
AAAFragment
ds-cDNAsynthesis
Ligate adaptors +
Amplify
TargetRNA
rRNA (80%)
tRNA (15%)
5%
cellular RNA
Deplete rRNA
Enrich polyA RNA
Profile (ribosomes)
Capture(tiling arrays)
Sequencing
Affects RNA content (Expression
quantification)
Affects Insert Size (transcript
identification)
Affects Strand Specificity
Affects Library Complexity
(Tag uniqueness)
![Page 33: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/33.jpg)
Library Quality Control (QC)
AAAAA
AAAAA
AAAAA
AAAAAA
AAAFragment
ds-cDNAsynthesis
Ligate adaptors +
Amplify
TargetRNA
rRNA (80%)
tRNA (15%)
5%
cellular RNA
Deplete rRNA
Enrich polyA RNA
Profile (ribosomes)
Capture(tiling arrays)
Sequencing
Affects RNA content (Expression
quantification)
Affects Insert Size (transcript
identification)
Affects Strand Specificity
Affects Library Complexity
(Tag uniqueness)
Affects Mapping Rate
Paired-end?
![Page 34: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/34.jpg)
Typical experiment workflow
Design Experiment
Sample AcquisitionField / Clinic / Lab
Validation
VerificationSample Acquisition
Run Experiment
Obtain RNA
Make Library
Sequencing
Base Calling Mapping
Library QC
Publish
Analysis
Interpretation
1° 2°
3°
3°
2°
Field / Clinic Wet Lab Dry Lab
![Page 35: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/35.jpg)
Calculate Gene Expression
ATG AAA ATG
Gene A3500nt
(700 reads)
Gene B400nt
(160 reads)
AAA
![Page 36: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/36.jpg)
Mortazavi et al. Nat. Methods 2008; 5:621–628
Calculate Gene Expression
ATG AAA ATG
Gene A3500nt
(700 reads)
Gene B400nt
(160 reads)
AAA
RPKM = 2.0 RPKM = 4.0
RPKM = R 103 106L N
× ×
Reads Per Kilobase per Million
L = Length of geneN = Library Size
R = Gene Read Count
![Page 37: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/37.jpg)
Further Normalisation
ATG AAA
Repeat
Normalise to “mappable” gene length
Koehler et al. Bioinformatics 2010
![Page 38: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/38.jpg)
Further Normalisation
ATG AAA
Repeat
Normalise to “mappable” gene length
Koehler et al. Bioinformatics 2010
Robinson et al. Genome Biology 2010; 11:R25
Scale Expression Values by TMM
Cellular RNA
Cond. 1 Cond. 2
![Page 39: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/39.jpg)
Further Normalisation
ATG AAA
Repeat
Normalise to “mappable” gene length
Koehler et al. Bioinformatics 2010
Robinson et al. Genome Biology 2010; 11:R25
Scale Expression Values by TMM
Cellular RNA
Cond. 1 Cond. 2
RPKM
Cond. 1 Cond. 2
![Page 40: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/40.jpg)
Further Normalisation
ATG AAA
Repeat
Normalise to “mappable” gene length
Koehler et al. Bioinformatics 2010
Robinson et al. Genome Biology 2010; 11:R25
Scale Expression Values by TMM
Benjamini et al. NAR; 2012
Normalise to GC content of
region
![Page 41: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/41.jpg)
Calculate ‘Feature’ Expression
ATG AAA
ATG AAA
![Page 42: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/42.jpg)
Calculate ‘Feature’ Expression
ATG AAA
ATG AAA
Exonic Region
![Page 43: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/43.jpg)
Calculate ‘Feature’ Expression
ATG AAA
ATG AAA
Exonic Region Exon Junction
![Page 44: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/44.jpg)
Calculate ‘Feature’ Expression
ATG AAA
ATG AAA
Exonic Region Exon Junction Intronic Region
![Page 45: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/45.jpg)
Calculate ‘Feature’ Expression
ATG AAA
ATG AAA
Exonic Region Exon Junction Intronic Region Exon Boundary
![Page 46: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/46.jpg)
Calculate ‘Feature’ Expression
ATG AAA
ATG AAA
Exonic Region Exon Junction Intronic Region Exon Boundary Intergenic Region
![Page 47: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/47.jpg)
Calculate ‘Feature’ Expression
ATG AAA
ATG AAA
Exonic Region Exon Junction Intronic Region Exon Boundary Intergenic Region
Calculate RPKM for any feature
![Page 48: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/48.jpg)
Calculate ‘Feature’ Expression
ATG AAA
ATG AAA
Exonic Region Exon Junction Intronic Region Exon Boundary Intergenic Region
Calculate RPKM for any feature
Extended 3’ UTR
ATG AAA
![Page 49: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/49.jpg)
Calculate ‘Feature’ Expression
ATG AAA
ATG AAA
Exonic Region Exon Junction Intronic Region Exon Boundary Intergenic Region
Calculate RPKM for any feature
Extended 3’ UTR
ATG AAA
ATG AAA
Retained Intron
![Page 50: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/50.jpg)
Calculate Transcript Expression
ATG AAA
ATG AAA
ATG AAA
ATG
![Page 51: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/51.jpg)
Calculate Transcript Expression
ATG AAA
ATG AAA
ATG AAA
ATG
diagnostic feature
![Page 52: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/52.jpg)
Calculate Transcript Expression
ATG AAA
ATG AAA
ATG AAA
ATG
diagnostic feature
Approach #1: Expression calculated using diagnostic features
Strong Evidence
Excludes Transcripts
Sampling Variability
Lacks statistical robustness
Easy to calculate
Dependent on gene model
ALEXA-seq: Griffith et al. Nat. Methods 2010; 11:R25
![Page 53: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/53.jpg)
Calculate Transcript Expression
ATG AAA
ATG AAA
ATG AAA
ATG
![Page 54: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/54.jpg)
Calculate Transcript Expression
ATG AAA
ATG AAA
ATG AAA
ATG
Approach #2: Expression estimatedConstruct bipartite graph, then finds minimum path
Cufflinks: Trapnell et al. Nat. Biotech. 2010, 28:511-515
![Page 55: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/55.jpg)
Calculate Transcript Expression
ATG AAA
ATG AAA
ATG AAA
ATG
Estimates expression for all transcripts
Model can fail in complex / highly
expressed regions
More statistically robust Error rate largely unknown
Incorporates ambiguous reads
Approach #2: Expression estimatedConstruct bipartite graph, then finds minimum path
Cufflinks: Trapnell et al. Nat. Biotech. 2010, 28:511-515
![Page 56: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/56.jpg)
Expressed or not?
ATG AAA
ATG AAA
ATG AAA
Cond. 1
Cond. 2
Cond. 3
Freq
uenc
y
log2 (expression)
not “expressed” “expressed”
Need to determine ‘expression’ cut-off value
![Page 57: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/57.jpg)
Expressed or not?
Expressed if > 1 RPKM
1
Lacks sensitivity ArbitraryHas literature
support
![Page 58: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/58.jpg)
Expressed or not?
Expressed if > 1 RPKM
1
Expressed if above intergenic
background
2
log2 Expression
Freq
uenc
y
95th percentile
Lacks sensitivity ArbitraryHas literature
support
![Page 59: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/59.jpg)
Expressed or not?
Expressed if > 1 RPKM
1
Expressed if above intergenic
background
2
log2 Expression
Freq
uenc
y
95th percentile
Cut-off based on empirical
evidence
Still somewhat arbitrary
Lacks sensitivity ArbitraryHas literature
support
![Page 60: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/60.jpg)
Expressed or not?
Expressed if > 1 RPKM
1
Expressed if above intergenic
background
2
log2 Expression
Freq
uenc
y
95th percentile
Cut-off based on empirical
evidence
Still somewhat arbitrary
Incorporate replicate
information
3Based on observed
reproducibility
Requires replicates
Lacks sensitivity ArbitraryHas literature
support
−log2 (expression) bins
np−I
DR
Rep 1 vs Rep 2Rep 2 vs Rep 1MeanCut−off
00.
10.
30.
50.
70.
91
−11 −7 −3 1 5 9 13 17 21 25
![Page 61: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/61.jpg)
Expressed or not?
Expressed if > 1 RPKM
1
Expressed if above intergenic
background
2
log2 Expression
Freq
uenc
y
95th percentile
Cut-off based on empirical
evidence
Still somewhat arbitrary
Incorporate replicate
information
3Based on observed
reproducibility
Requires replicates
Lacks sensitivity ArbitraryHas literature
support
−log2 (expression) bins
np−I
DR
Rep 1 vs Rep 2Rep 2 vs Rep 1MeanCut−off
00.
10.
30.
50.
70.
91
−11 −7 −3 1 5 9 13 17 21 25
![Page 62: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/62.jpg)
Expressed or not?
Expressed if > 1 RPKM
1
Expressed if above intergenic
background
2
log2 Expression
Freq
uenc
y
95th percentile
Cut-off based on empirical
evidence
Still somewhat arbitrary
Incorporate replicate
information
3Based on observed
reproducibility
Requires replicates
Choose what is reasonable for your experiment, be consistent!
Lacks sensitivity ArbitraryHas literature
support
−log2 (expression) bins
np−I
DR
Rep 1 vs Rep 2Rep 2 vs Rep 1MeanCut−off
00.
10.
30.
50.
70.
91
−11 −7 −3 1 5 9 13 17 21 25
![Page 63: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/63.jpg)
Nucleotide-Resolution Analysis
ATG AAA
ATG AAA
ICR
Imprinting
![Page 64: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/64.jpg)
Nucleotide-Resolution Analysis
ATG AAA
ATG AAA
Imprinting
sQTLeQTL
![Page 65: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/65.jpg)
Nucleotide-Resolution Analysis
ATG AAA
ATG AAA
Imprinting
sQTLeQTLComplex Traits
![Page 66: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/66.jpg)
Nucleotide-Resolution Analysis
ATG AAA
ATG AAA
Imprinting
eQTLComplex Traits
A B CSNPs
Allelic Fraction
sQTL
![Page 67: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/67.jpg)
Nucleotide-Resolution Analysis
ATG AAA
ATG AAA
Imprinting
eQTLComplex Traits
A B CSNPs
Allelic Fraction
sQTL
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
Fraction of RNA−seq Reads Matching Reference Allele
Dens
ity
Expected MeanObserved Mean
Degner et al. Bioinformatics 2009
Reference bias
![Page 68: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/68.jpg)
Nucleotide-Resolution Analysis
ATG AAA
ATG AAA
Imprinting
eQTLComplex Traits
A B CSNPs
Allelic Fraction
sQTL
Map to a diploid genome
AlleleSeq: Rozowsky et al. Mol. Sys. Bio 2011
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
Fraction of RNA−seq Reads Matching Reference Allele
Dens
ity
Expected MeanObserved Mean
Degner et al. Bioinformatics 2009
Reference bias
![Page 69: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/69.jpg)
Typical experiment workflow
Design Experiment
Sample AcquisitionField / Clinic / Lab
Validation
VerificationSample Acquisition
Run Experiment
Obtain RNA
Make Library
Sequencing
Base Calling Mapping
Library QC
Publish
Analysis
Interpretation
1° 2°
3°
3°
2°
Field / Clinic Wet Lab Dry Lab
![Page 70: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/70.jpg)
The future of RNA-seq (now)Single Cell
Shalek, et al. Nature 2013
![Page 71: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/71.jpg)
The future of RNA-seq (now)Single Cell
Shalek, et al. Nature 2013
Huge Cohort
900 donors 30,000 RNA-seq
data sets!
Genotype-Tissue Expression project (GTEx)
Lonsdale, et al. Nature Genetics 2013
![Page 72: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/72.jpg)
Summary
Choose an alignment approach suitable for your experiment, available resources and tools
Assess library quality, specifically rRNA contamination, insert size, strand specificity and library complexity
Gene and ‘Feature’ Expression can be calculated using count data, and normalised by length, library size and GC content
Transcript expression calculation requires alternative approaches and algorithms, which although common, are largely unproven
RNA-seq can interrogate nucleotide specific questions, but be careful of alignment biases (diploid mapping can help here)
1
2
3
4
5
![Page 73: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation](https://reader033.vdocuments.site/reader033/viewer/2022053022/60505096aca56443607b4f71/html5/thumbnails/73.jpg)
Questions and References
Cloonan et al. Nat Methods 2008; Stem cell transcriptome profiling via massive-scale mRNA sequencing
Mortazavi et al. Nat. Methods 2008; Mapping and quantifying mammalian transcriptomes by RNA-Seq
Wood et al. Bioinformatics 2011; X-MATE: A flexible system for mapping short read data
Trapnell et al. Bioinformatics 2009; TopHat: discovering splice junctions with RNA-Seq
Koehler et al. Bioinformatics 2010. The Uniqueome: A mappability resource for short-tag sequencing
Robinson et al. Genome Biology 2010; A scaling normalization method for differential expression analysis of RNA-seq data.
Benjamini et al. NAR; 2012. Summarizing and correcting the GC content bias in high-throughput sequencing
Griffith et al. Nat. Methods 2010; Alternative expression analysis by RNA sequencing.
Trapnell et al. Nat. Biotech. 2010; Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform
Degner et al. Bioinformatics 2009; Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing
Rozowsky et al. Mol. Sys. Bio 2011; AlleleSeq: analysis of allele-specific expression and binding in a
Shalek, et al. Nature 2013; Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells
Lonsdale, et al. Nature Genetics 2013; The Genotype-Tissue Expression (GTEx) project.