introduction to rna-seq -...

73
Introduction to RNA-Seq David Wood Winter School in Mathematics and Computational Biology July 1, 2013

Upload: others

Post on 15-Oct-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Introduction to RNA-Seq

David WoodWinter School in Mathematics and Computational Biology

July 1, 2013

Page 2: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

RNA is...

Central

DNA

RNA

Protein

Epigenetics

Diverse

tRNA

mRNA

rRNA

Dynamic

Time

Abundance

Page 3: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

RNA is...

QuantitativeQualitative

Understand the molecular basis of gene function. Classify and transform cellular states

Integrative

Central

DNA

RNA

Protein

Epigenetics

Diverse

tRNA

mRNA

rRNA

Dynamic

Time

Abundance

Page 4: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

RNA studies involve...

Biological System

TechnologyAvailable Resources

Questions

~/bin

Project

DB

Page 5: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

RNA studies involve...

Biological System

TechnologyAvailable Resources

Questions

~/bin

Project

DB

This talk: Focusing on reference based mammalian RNA-seq analysis

Page 6: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

pA

pA pApAATG ATG

TSS transcription start site pA polyadenylation signalprotein coding regions

ATG translation start site AAA polyadenylationnon-coding regions

genomic DNA microRNAs spliced intron

TSS TSS TSS

TSS

ATG AAA

ATG AAA

ATG AAA

ATG

ATG

ATG

AAA

AAAATG

Transcriptional Complexity

Page 7: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

pA

pA pApAATG ATG

TSS transcription start site pA polyadenylation signalprotein coding regions

ATG translation start site AAA polyadenylationnon-coding regions

genomic DNA microRNAs spliced intron

TSS TSS TSS

TSS

ATG AAA

ATG AAA

ATG AAA

ATG

ATG

ATG

AAA

AAAATG

Transcriptional Complexity

PASR miRNAtiRNA

Page 8: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

AAAAAA

Alu

pA

pA pApAATG ATG

TSS transcription start site pA polyadenylation signalprotein coding regions

ATG translation start site AAA polyadenylationnon-coding regions

genomic DNA microRNAs spliced intron

TSS TSS TSS

TSS

ATG AAA

ATG AAA

ATG AAA

ATG

ATG

ATG

AAA

AAAATG

Transcriptional Complexity

PASR miRNAtiRNA

Page 9: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

AAAAAA

Alu

pA

pA pApAATG ATG

TSS transcription start site pA polyadenylation signalprotein coding regions

ATG translation start site AAA polyadenylationnon-coding regions

genomic DNA microRNAs spliced intron

TSS TSS TSS

TSS

ATG AAA

ATG AAA

ATG AAA

ATG

ATG

ATG

AAA

AAAATG

Transcriptional Complexity

PASR miRNAtiRNA

Mutations Allelic Expression

RNA Editing

Page 10: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

pA

pA pApAATG ATGTSS TSS TSS

TSS

AAA

PASR miRNA

ATG AAA

ATG AAA

ATG AAA

ATG

ATG

ATG

AAA

AAAATG

tiRNA

RNA-seq

non-spliced reads

junction readsstrand specific Cloonan et al. Nat Methods 2008; 5:613-619

AAA

Alu

mutations

Page 11: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Advantages of RNA-seq

!"

#!!!!"

$!!!!!"

$#!!!!"

%!!!!!"

%#!!!!"

#&" #'" (!" ($" (%" ()" (*" (#" ((" (+" (&" ('" +!" +$"

,-./

01-23

40"

5/06789":6-02;/"

<-;462/"=;>2/?" @6?-.>.A;/"

/1BCD" <06E>;?6/6"

Discoverygenes, exons, junctions,

UTRs, fusions(Present and Future)

Page 12: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Advantages of RNA-seq

!"

#!!!!"

$!!!!!"

$#!!!!"

%!!!!!"

%#!!!!"

#&" #'" (!" ($" (%" ()" (*" (#" ((" (+" (&" ('" +!" +$"

,-./

01-23

40"

5/06789":6-02;/"

<-;462/"=;>2/?" @6?-.>.A;/"

/1BCD" <06E>;?6/6"

Discoverygenes, exons, junctions,

UTRs, fusions(Present and Future)

Dynamic Range

Mortazavi et al. Nat. Methods 2008; 5:621–628

Page 13: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Advantages of RNA-seq

!"

#!!!!"

$!!!!!"

$#!!!!"

%!!!!!"

%#!!!!"

#&" #'" (!" ($" (%" ()" (*" (#" ((" (+" (&" ('" +!" +$"

,-./

01-23

40"

5/06789":6-02;/"

<-;462/"=;>2/?" @6?-.>.A;/"

/1BCD" <06E>;?6/6"

Discoverygenes, exons, junctions,

UTRs, fusions(Present and Future)

Dynamic Range

Mortazavi et al. Nat. Methods 2008; 5:621–628

Nucleotide Specific

Page 14: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Typical experiment workflow

Design Experiment

Sample AcquisitionField / Clinic / Lab

Validation

VerificationSample Acquisition

Run Experiment

Obtain RNA

Make Library

Sequencing

Base Calling Mapping

Library QC

Publish

Analysis

Interpretation

1° 2°

Field / Clinic Wet Lab Dry Lab

Page 15: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Typical experiment workflow

Design Experiment

Sample AcquisitionField / Clinic / Lab

Validation

VerificationSample Acquisition

Run Experiment

Obtain RNA

Make Library

Sequencing

Base Calling Mapping

Library QC

Publish

Analysis

Interpretation

1° 2°

Field / Clinic Wet Lab Dry Lab

Page 16: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Typical experiment workflow

Design Experiment

Sample AcquisitionField / Clinic / Lab

Validation

VerificationSample Acquisition

Run Experiment

Obtain RNA

Make Library

Sequencing

Base Calling Mapping

Library QC

Publish

Analysis

Interpretation

1° 2°

Field / Clinic Wet Lab Dry Lab

Page 17: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Typical experiment workflow

Design Experiment

Sample AcquisitionField / Clinic / Lab

Validation

VerificationSample Acquisition

Run Experiment

Obtain RNA

Make Library

Sequencing

Base Calling Mapping

Library QC

Publish

Analysis

Interpretation

1° 2°

Field / Clinic Wet Lab Dry Lab

Page 18: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Library Construction

AAAAA

AAAAA

AAAAA

AAAAAA

AAAFragment

ds-cDNAsynthesis

Ligate adaptors +

Amplify

TargetRNA

rRNA (80%)

tRNA (15%)

5%

cellular RNA

Deplete rRNA

Enrich polyA RNA

Profile (ribosomes)

Capture(tiling arrays)

Sequencing

Page 19: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Typical experiment workflow

Design Experiment

Sample AcquisitionField / Clinic / Lab

Validation

VerificationSample Acquisition

Run Experiment

Obtain RNA

Make Library

Sequencing

Base Calling Mapping

Library QC

Publish

Analysis

Interpretation

1° 2°

Field / Clinic Wet Lab Dry Lab

Page 20: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

RNA-seq Mapping

ATG AAA

Challenge #1: Introns

Page 21: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

RNA-seq Mapping

ATG AAA

Challenge #1: Introns

Align to database of junctions or transcriptome

Wood et al. Bioinformatics 2011; 27:580–581

Split Read Alignments

Trapnell et al. Bioinformatics 2009; 25:1105-11

Page 22: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

RNA-seq Mapping

ATG AAA

Challenge #1: Introns

Challenge #2: Correctness

Sufficient OverlapSufficient Evidence

Align to database of junctions or transcriptome

Wood et al. Bioinformatics 2011; 27:580–581

Split Read Alignments

Trapnell et al. Bioinformatics 2009; 25:1105-11

Page 23: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

RNA-seq Mapping

ATG AAA

Challenge #1: Introns

Challenge #2: Correctness

Sufficient OverlapSufficient Evidence

Align to the transcriptome

Challenge #3: Multi-mappers

Sequence Similarity

Align to database of junctions or transcriptome

Wood et al. Bioinformatics 2011; 27:580–581

Split Read Alignments

Trapnell et al. Bioinformatics 2009; 25:1105-11

Page 24: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

RNA-seq Mapping

Data QC (clipping)

Align to Filter Set

Align to ‘genome’

Align to ‘junctions’

Split read Alignment

Choose Alignments, DisambiguateExclude Flag and Exclude

Tophat: Trapnell et al. Bioinformatics 2009; 25:1105-11

Page 25: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

RNA-seq Mapping

Data QC (clipping)

Align to Filter Set

Align to ‘genome’

Align to ‘junctions’

Split read Alignment

Choose Alignments, DisambiguateExclude Flag and Exclude

BAMBAM BAM Alignment Filtering

AnalysisLibrary QC

Tophat: Trapnell et al. Bioinformatics 2009; 25:1105-11

Page 26: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

RNA-seq Mapping

reference?diploid?

gene model?ESTs?

Algorithm?rRNA, tRNA?

Data QC (clipping)

Align to Filter Set

Align to ‘genome’

Align to ‘junctions’

Split read Alignment

Choose Alignments, DisambiguateExclude Flag and Exclude

BAMBAM BAM Alignment Filtering

AnalysisLibrary QC

Tophat: Trapnell et al. Bioinformatics 2009; 25:1105-11

Page 27: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Typical experiment workflow

Design Experiment

Sample AcquisitionField / Clinic / Lab

Validation

VerificationSample Acquisition

Run Experiment

Obtain RNA

Make Library

Sequencing

Base Calling Mapping

Library QC

Publish

Analysis

Interpretation

1° 2°

Field / Clinic Wet Lab Dry Lab

Page 28: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Library Quality Control (QC)

AAAAA

AAAAA

AAAAA

AAAAAA

AAAFragment

ds-cDNAsynthesis

Ligate adaptors +

Amplify

TargetRNA

rRNA (80%)

tRNA (15%)

5%

cellular RNA

Deplete rRNA

Enrich polyA RNA

Profile (ribosomes)

Capture(tiling arrays)

Sequencing

Page 29: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Library Quality Control (QC)

AAAAA

AAAAA

AAAAA

AAAAAA

AAAFragment

ds-cDNAsynthesis

Ligate adaptors +

Amplify

TargetRNA

rRNA (80%)

tRNA (15%)

5%

cellular RNA

Deplete rRNA

Enrich polyA RNA

Profile (ribosomes)

Capture(tiling arrays)

Sequencing

Affects RNA content (Expression

quantification)

Page 30: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Library Quality Control (QC)

AAAAA

AAAAA

AAAAA

AAAAAA

AAAFragment

ds-cDNAsynthesis

Ligate adaptors +

Amplify

TargetRNA

rRNA (80%)

tRNA (15%)

5%

cellular RNA

Deplete rRNA

Enrich polyA RNA

Profile (ribosomes)

Capture(tiling arrays)

Sequencing

Affects RNA content (Expression

quantification)

Affects Insert Size (transcript

identification)

Page 31: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Library Quality Control (QC)

AAAAA

AAAAA

AAAAA

AAAAAA

AAAFragment

ds-cDNAsynthesis

Ligate adaptors +

Amplify

TargetRNA

rRNA (80%)

tRNA (15%)

5%

cellular RNA

Deplete rRNA

Enrich polyA RNA

Profile (ribosomes)

Capture(tiling arrays)

Sequencing

Affects RNA content (Expression

quantification)

Affects Insert Size (transcript

identification)

Affects Strand Specificity

Page 32: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Library Quality Control (QC)

AAAAA

AAAAA

AAAAA

AAAAAA

AAAFragment

ds-cDNAsynthesis

Ligate adaptors +

Amplify

TargetRNA

rRNA (80%)

tRNA (15%)

5%

cellular RNA

Deplete rRNA

Enrich polyA RNA

Profile (ribosomes)

Capture(tiling arrays)

Sequencing

Affects RNA content (Expression

quantification)

Affects Insert Size (transcript

identification)

Affects Strand Specificity

Affects Library Complexity

(Tag uniqueness)

Page 33: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Library Quality Control (QC)

AAAAA

AAAAA

AAAAA

AAAAAA

AAAFragment

ds-cDNAsynthesis

Ligate adaptors +

Amplify

TargetRNA

rRNA (80%)

tRNA (15%)

5%

cellular RNA

Deplete rRNA

Enrich polyA RNA

Profile (ribosomes)

Capture(tiling arrays)

Sequencing

Affects RNA content (Expression

quantification)

Affects Insert Size (transcript

identification)

Affects Strand Specificity

Affects Library Complexity

(Tag uniqueness)

Affects Mapping Rate

Paired-end?

Page 34: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Typical experiment workflow

Design Experiment

Sample AcquisitionField / Clinic / Lab

Validation

VerificationSample Acquisition

Run Experiment

Obtain RNA

Make Library

Sequencing

Base Calling Mapping

Library QC

Publish

Analysis

Interpretation

1° 2°

Field / Clinic Wet Lab Dry Lab

Page 35: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Calculate Gene Expression

ATG AAA ATG

Gene A3500nt

(700 reads)

Gene B400nt

(160 reads)

AAA

Page 36: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Mortazavi et al. Nat. Methods 2008; 5:621–628

Calculate Gene Expression

ATG AAA ATG

Gene A3500nt

(700 reads)

Gene B400nt

(160 reads)

AAA

RPKM = 2.0 RPKM = 4.0

RPKM  =  R   103 106L N

× ×

Reads  Per  Kilobase    per  Million

L  =  Length  of  geneN  =  Library  Size

R  =  Gene  Read  Count

Page 37: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Further Normalisation

ATG AAA

Repeat

Normalise to “mappable” gene length

Koehler et al. Bioinformatics 2010

Page 38: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Further Normalisation

ATG AAA

Repeat

Normalise to “mappable” gene length

Koehler et al. Bioinformatics 2010

Robinson et al. Genome Biology 2010; 11:R25

Scale Expression Values by TMM

Cellular RNA

Cond. 1 Cond. 2

Page 39: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Further Normalisation

ATG AAA

Repeat

Normalise to “mappable” gene length

Koehler et al. Bioinformatics 2010

Robinson et al. Genome Biology 2010; 11:R25

Scale Expression Values by TMM

Cellular RNA

Cond. 1 Cond. 2

RPKM

Cond. 1 Cond. 2

Page 40: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Further Normalisation

ATG AAA

Repeat

Normalise to “mappable” gene length

Koehler et al. Bioinformatics 2010

Robinson et al. Genome Biology 2010; 11:R25

Scale Expression Values by TMM

Benjamini et al. NAR; 2012

Normalise to GC content of

region

Page 41: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Calculate ‘Feature’ Expression

ATG AAA

ATG AAA

Page 42: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Calculate ‘Feature’ Expression

ATG AAA

ATG AAA

Exonic Region

Page 43: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Calculate ‘Feature’ Expression

ATG AAA

ATG AAA

Exonic Region Exon Junction

Page 44: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Calculate ‘Feature’ Expression

ATG AAA

ATG AAA

Exonic Region Exon Junction Intronic Region

Page 45: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Calculate ‘Feature’ Expression

ATG AAA

ATG AAA

Exonic Region Exon Junction Intronic Region Exon Boundary

Page 46: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Calculate ‘Feature’ Expression

ATG AAA

ATG AAA

Exonic Region Exon Junction Intronic Region Exon Boundary Intergenic Region

Page 47: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Calculate ‘Feature’ Expression

ATG AAA

ATG AAA

Exonic Region Exon Junction Intronic Region Exon Boundary Intergenic Region

Calculate RPKM for any feature

Page 48: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Calculate ‘Feature’ Expression

ATG AAA

ATG AAA

Exonic Region Exon Junction Intronic Region Exon Boundary Intergenic Region

Calculate RPKM for any feature

Extended 3’ UTR

ATG AAA

Page 49: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Calculate ‘Feature’ Expression

ATG AAA

ATG AAA

Exonic Region Exon Junction Intronic Region Exon Boundary Intergenic Region

Calculate RPKM for any feature

Extended 3’ UTR

ATG AAA

ATG AAA

Retained Intron

Page 50: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Calculate Transcript Expression

ATG AAA

ATG AAA

ATG AAA

ATG

Page 51: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Calculate Transcript Expression

ATG AAA

ATG AAA

ATG AAA

ATG

diagnostic feature

Page 52: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Calculate Transcript Expression

ATG AAA

ATG AAA

ATG AAA

ATG

diagnostic feature

Approach #1: Expression calculated using diagnostic features

Strong Evidence

Excludes Transcripts

Sampling Variability

Lacks statistical robustness

Easy to calculate

Dependent on gene model

ALEXA-seq: Griffith et al. Nat. Methods 2010; 11:R25

Page 53: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Calculate Transcript Expression

ATG AAA

ATG AAA

ATG AAA

ATG

Page 54: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Calculate Transcript Expression

ATG AAA

ATG AAA

ATG AAA

ATG

Approach #2: Expression estimatedConstruct bipartite graph, then finds minimum path

Cufflinks: Trapnell et al. Nat. Biotech. 2010, 28:511-515

Page 55: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Calculate Transcript Expression

ATG AAA

ATG AAA

ATG AAA

ATG

Estimates expression for all transcripts

Model can fail in complex / highly

expressed regions

More statistically robust Error rate largely unknown

Incorporates ambiguous reads

Approach #2: Expression estimatedConstruct bipartite graph, then finds minimum path

Cufflinks: Trapnell et al. Nat. Biotech. 2010, 28:511-515

Page 56: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Expressed or not?

ATG AAA

ATG AAA

ATG AAA

Cond. 1

Cond. 2

Cond. 3

Freq

uenc

y

log2 (expression)

not “expressed” “expressed”

Need to determine ‘expression’ cut-off value

Page 57: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Expressed or not?

Expressed if > 1 RPKM

1

Lacks sensitivity ArbitraryHas literature

support

Page 58: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Expressed or not?

Expressed if > 1 RPKM

1

Expressed if above intergenic

background

2

log2 Expression

Freq

uenc

y

95th percentile

Lacks sensitivity ArbitraryHas literature

support

Page 59: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Expressed or not?

Expressed if > 1 RPKM

1

Expressed if above intergenic

background

2

log2 Expression

Freq

uenc

y

95th percentile

Cut-off based on empirical

evidence

Still somewhat arbitrary

Lacks sensitivity ArbitraryHas literature

support

Page 60: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Expressed or not?

Expressed if > 1 RPKM

1

Expressed if above intergenic

background

2

log2 Expression

Freq

uenc

y

95th percentile

Cut-off based on empirical

evidence

Still somewhat arbitrary

Incorporate replicate

information

3Based on observed

reproducibility

Requires replicates

Lacks sensitivity ArbitraryHas literature

support

−log2 (expression) bins

np−I

DR

Rep 1 vs Rep 2Rep 2 vs Rep 1MeanCut−off

00.

10.

30.

50.

70.

91

−11 −7 −3 1 5 9 13 17 21 25

Page 61: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Expressed or not?

Expressed if > 1 RPKM

1

Expressed if above intergenic

background

2

log2 Expression

Freq

uenc

y

95th percentile

Cut-off based on empirical

evidence

Still somewhat arbitrary

Incorporate replicate

information

3Based on observed

reproducibility

Requires replicates

Lacks sensitivity ArbitraryHas literature

support

−log2 (expression) bins

np−I

DR

Rep 1 vs Rep 2Rep 2 vs Rep 1MeanCut−off

00.

10.

30.

50.

70.

91

−11 −7 −3 1 5 9 13 17 21 25

Page 62: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Expressed or not?

Expressed if > 1 RPKM

1

Expressed if above intergenic

background

2

log2 Expression

Freq

uenc

y

95th percentile

Cut-off based on empirical

evidence

Still somewhat arbitrary

Incorporate replicate

information

3Based on observed

reproducibility

Requires replicates

Choose what is reasonable for your experiment, be consistent!

Lacks sensitivity ArbitraryHas literature

support

−log2 (expression) bins

np−I

DR

Rep 1 vs Rep 2Rep 2 vs Rep 1MeanCut−off

00.

10.

30.

50.

70.

91

−11 −7 −3 1 5 9 13 17 21 25

Page 63: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Nucleotide-Resolution Analysis

ATG AAA

ATG AAA

ICR

Imprinting

Page 64: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Nucleotide-Resolution Analysis

ATG AAA

ATG AAA

Imprinting

sQTLeQTL

Page 65: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Nucleotide-Resolution Analysis

ATG AAA

ATG AAA

Imprinting

sQTLeQTLComplex Traits

Page 66: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Nucleotide-Resolution Analysis

ATG AAA

ATG AAA

Imprinting

eQTLComplex Traits

A B CSNPs

Allelic Fraction

sQTL

Page 67: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Nucleotide-Resolution Analysis

ATG AAA

ATG AAA

Imprinting

eQTLComplex Traits

A B CSNPs

Allelic Fraction

sQTL

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

Fraction of RNA−seq Reads Matching Reference Allele

Dens

ity

Expected MeanObserved Mean

Degner et al. Bioinformatics 2009

Reference bias

Page 68: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Nucleotide-Resolution Analysis

ATG AAA

ATG AAA

Imprinting

eQTLComplex Traits

A B CSNPs

Allelic Fraction

sQTL

Map to a diploid genome

AlleleSeq: Rozowsky et al. Mol. Sys. Bio 2011

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

Fraction of RNA−seq Reads Matching Reference Allele

Dens

ity

Expected MeanObserved Mean

Degner et al. Bioinformatics 2009

Reference bias

Page 69: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Typical experiment workflow

Design Experiment

Sample AcquisitionField / Clinic / Lab

Validation

VerificationSample Acquisition

Run Experiment

Obtain RNA

Make Library

Sequencing

Base Calling Mapping

Library QC

Publish

Analysis

Interpretation

1° 2°

Field / Clinic Wet Lab Dry Lab

Page 70: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

The future of RNA-seq (now)Single Cell

Shalek, et al. Nature 2013

Page 71: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

The future of RNA-seq (now)Single Cell

Shalek, et al. Nature 2013

Huge Cohort

900 donors 30,000 RNA-seq

data sets!

Genotype-Tissue Expression project (GTEx)

Lonsdale, et al. Nature Genetics 2013

Page 72: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Summary

Choose an alignment approach suitable for your experiment, available resources and tools

Assess library quality, specifically rRNA contamination, insert size, strand specificity and library complexity

Gene and ‘Feature’ Expression can be calculated using count data, and normalised by length, library size and GC content

Transcript expression calculation requires alternative approaches and algorithms, which although common, are largely unproven

RNA-seq can interrogate nucleotide specific questions, but be careful of alignment biases (diploid mapping can help here)

1

2

3

4

5

Page 73: Introduction to RNA-Seq - Bioinformaticsbioinformatics.org.au/ws13/wp-content/uploads/ws13/sites/3/Full... · Design Experiment Sample Acquisition Field / Clinic / Lab Validation

Questions and References

Cloonan et al. Nat Methods 2008; Stem cell transcriptome profiling via massive-scale mRNA sequencing

Mortazavi et al. Nat. Methods 2008; Mapping and quantifying mammalian transcriptomes by RNA-Seq

Wood et al. Bioinformatics 2011; X-MATE: A flexible system for mapping short read data

Trapnell et al. Bioinformatics 2009; TopHat: discovering splice junctions with RNA-Seq

Koehler et al. Bioinformatics 2010. The Uniqueome: A mappability resource for short-tag sequencing

Robinson et al. Genome Biology 2010; A scaling normalization method for differential expression analysis of RNA-seq data.

Benjamini et al. NAR; 2012. Summarizing and correcting the GC content bias in high-throughput sequencing

Griffith et al. Nat. Methods 2010; Alternative expression analysis by RNA sequencing.

Trapnell et al. Nat. Biotech. 2010; Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform

Degner et al. Bioinformatics 2009; Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing

Rozowsky et al. Mol. Sys. Bio 2011; AlleleSeq: analysis of allele-specific expression and binding in a

Shalek, et al. Nature 2013; Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells

Lonsdale, et al. Nature Genetics 2013; The Genotype-Tissue Expression (GTEx) project.