rna-seq tutorial 1 - home | minnesota … isolation sequence fragment end(s) map reads sample 1...
TRANSCRIPT
![Page 1: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/1.jpg)
RNA-Seq Tutorial 1
John Garbe Research Informatics Support Systems, MSI
March 19, 2012
![Page 2: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/2.jpg)
RNA-Seq Tutorials • Tutorial 1
– RNA-Seq experiment design and analysis – Instruction on individual software will be
provided in other tutorials • Tutorial 2
– Hands-on using TopHat and Cufflinks in Galaxy
• Tutorial 3 – Advanced RNA-Seq Analysis topics
![Page 3: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/3.jpg)
Galaxy.msi.umn.edu
Web-base platform for bioinformatic analysis
![Page 4: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/4.jpg)
Outline Experimental Design
Sequencing
Data Quality Control
Read mapping
Differential Expression
Analysis
RNA
fastq
fastq
SAM/BAM
GFF/GTF
Reference Genome
fasta
Reference Transcriptome
Transcriptome Assembly
Introduction
![Page 5: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/5.jpg)
Transcriptome Assembly
Experimental Design
Sequencing
Data Quality Control
Read mapping
Differential expression
analysis
RNA
fastq
fastq
SAM/BAM
GFF/GTF
Reference Genome
fasta
Reference Transcriptome
Introduction • Gene expression • RNA-Seq • Platform characteristics • Microarray comparison
Introduction
![Page 6: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/6.jpg)
Central dogma of molecular biology
Genome
A A G G C A Gene
Transcription
pre-mRNA A A G G C A
Splicing
mRNA G C A
Translation
ProteinK
K
V IV R Q
K
QR
Gene GeneGene
T G G A
A C C T
Alternative Splicing
G C A
Translation
KK
V QK
QR
Gen
e E
xpre
ssio
n
Nucleotide Base Pair
Exon
Intron
exon 1 exon 2 exon 3
gene
transcription &splicing
alternatively spliced mRNA
isoform 1 isoform 2
Alternative splicing
92–94% of human genes undergo alternative splicing, 86% with a minor isoform frequency of 15% or moreE.T. Wang, et al, Nature 456, 470-476 (2008)
![Page 7: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/7.jpg)
Introduction • RNA-Seq
– High-throughput sequencing of RNA – Transcriptome assembly
• Qualitative identification of expressed sequence
– Differential expression analysis • Quantitative measurement of transcript
expression
![Page 8: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/8.jpg)
mRNA isolation
Sequence fragment end(s)
Map reads
Calculate transcript abundance Sample 1
Fragmentation RNA -> cDNA
Genome Reference
Transcriptome A B
Gene A Gene B Sample 1 4 4
Gene A Gene B
Sample 1 4 2
# of Reads
Reads per kilobase of exon
Gene A Gene B Total
Sample 1 4 2 6
Sample 2 7 5 12 Reads per kilobase of exon
Gene A Gene B Total
Sample 1 .7 .3 6
Sample 2 .6 .3 12 Reads per kilobase of exon per million mapped reads
RPKM
![Page 9: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/9.jpg)
Ali Mortazavi et al., Nature Methods - 5, 621 - 628 (2008)
![Page 10: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/10.jpg)
Introduction • RNA-Seq (vs Microarray)
– Strong concordance between platforms – Higher sensitivity and dynamic range – Lower technical variation – Available for all species – Novel transcribed regions – Alternative splicing – Allele-specific expression – Fusion genes – Higher informatics cost
![Page 11: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/11.jpg)
Transcriptome Assembly
Sequencing
Data Quality Control
Read mapping
Differential expression
analysis
fastq
fastq
SAM/BAM
GFF/GTF
Reference Genome
fasta
Reference Transcriptome
Introduction
Experimental Design • Biological comparison(s) • Paired-end vs single end reads • Read length • Read depth • Replicates • Pooling
Experimental Design RNA
![Page 12: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/12.jpg)
Experimental design • Simple designs (Pairwise comparisons)
• Complex designs Consult a statistician
Two group Drug effect
Control Experimental (drug applied)
Two factor Cancer type X drug
Cancer sub-type 1
Cancer sub-type 1 With drug
Cancer sub-type 2
Cancer sub-type 2 With drug
Matched-pair
Normal Cancer
![Page 13: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/13.jpg)
Experimental design • What are my goals?
– Transcriptome assembly? – Differential expression analysis? – Identify rare transcripts?
• What are the characteristics of my system? – Large, complex genome? – Introns and high degree of alternative splicing? – No reference genome or transcriptome?
![Page 14: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/14.jpg)
Experimental design
BMGC RNA-Seq Price list (Jan 2012)
![Page 15: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/15.jpg)
Experimental design
10 million reads per sample, 50bp single-end reads • Small genomes with no alternative splicing
![Page 16: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/16.jpg)
Experimental design
20 million reads per sample, 50bp paired-end reads • Mammalian genomes (large transcriptome, alternative splicing, gene duplication)
![Page 17: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/17.jpg)
Experimental design
50-200 million reads per sample, 100bp paired-end reads • Transcriptome Assembly (100X coverage of transcriptome)
50bp Paired-end >> 100bp Single-end
![Page 18: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/18.jpg)
Experimental design • Technical replicates
– Not needed: low technical variation • Minimize batch effects • Randomize sample order
• Biological replicates – Not needed for transcriptome assembly – Essential for differential expression analysis – Difficult to estimate
• 3+ for cell lines • 5+ for inbred lines • 20+ for human samples
![Page 19: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/19.jpg)
Experimental design
• Pooling samples – Limited RNA obtainable
• Multiple pools per group required
– Transcriptome assembly
![Page 20: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/20.jpg)
Experimental design RNA-seq: technical variability and sampling Lauren M McIntyre, Kenneth K Lopiano, Alison M Morse, Victor Amin, Ann L Oberg, Linda J Young and Sergey V Nuzhdin BMC Genomics 2011, 12:293
Statistical Design and Analysis of RNA Sequencing Data Paul L. Auer and R. W. Doerge Genetics. 2010 June; 185(2): 405–416.
Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries Daniel Aird, Michael G Ross, Wei-Sheng Chen, Maxwell Danielsson, Timothy Fennell, Carsten Russ, David B Jaffe, Chad Nusbaum and Andreas Gnirke Genome Biology 2011, 12:R18
ENCODE RNA-Seq guidelines http://www.encodeproject.org/ENCODE/experiment_guidelines.html
![Page 21: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/21.jpg)
Transcriptome Assembly
Data Quality Control
Read mapping
Differential expression
analysis
fastq
SAM/BAM
GFF/GTF
Reference Genome
fasta
Reference Transcriptome
Experimental Design
Introduction
Sequencing • Platforms • Library preparation • Multiplexing • Sequence reads
Sequencing
RNA
fastq
![Page 22: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/22.jpg)
Sequencing • Illumina sequencing by synthesis
– GAIIx • replaced by HiSeq
– HiSeq2000 – MiSeq
• low throughput, fast turnaround
• SOLiD (not available at BMGC)
– “Color-space” reads (require special mapping software)
– Low error rate
• 454 pyrosequencing – Longer reads, lower throughput
![Page 23: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/23.jpg)
Sequencing • Library preparation (Illumina TruSeq protocol for HiSeq)
– RNA isolation – Ploy-A purification – Fragmentation – cDNA synthesis using random primers – Adapter ligation – Size selection – PCR amplification
![Page 24: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/24.jpg)
Sequencing • Flowcell
– 8 lanes – 200 Million reads per lane – Multiplex up to 24 samples on
one lane using barcodes
![Page 25: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/25.jpg)
Sequencing Adapter Adaptor cDNA insert
Barcode
Read 1 “left” read
Index read
Read 2 “right” read (optional)
Adaptor contamination
![Page 26: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/26.jpg)
Sequencing • Library types
– Polyadenylated RNA > 200bp (standard method) – Total RNA – Small RNA – Strand-specific
• Gene-dense genomes (bacteria, archaea, lower eukaryotes) • Antisense transcription (higher eukaryotes)
– Low input – Library capture
![Page 27: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/27.jpg)
Sequence Data Format • Data delivery
– /project/PI-groupname/120318_SN261_0348_A81JUMABXX • fastq_flt/ Bad reads removed by Illumina software, for use in data analysis • fastq/ Raw sequence output for submission to public archives, contains bad reads
– Upload to Galaxy
• File names – L1_R1_CCAAT_cancer1.fastq – L1_R2_CCAAT_cancer1.fastq
• Fastq format (Illumina Casava 1.8.0) – 4 lines per read
@HWI-M00262:4:000000000-A0ABC:1:1:18376:2027 1:N:0:AGATC TTCAGAGAGAATGAATTGTACGTGCTTTTTTTGT + =1:?7A7+?77+<<@AC<3<,33@A;<A?A=:4=
Read ID Sequence
+ Quality score
Phred+33
Machine ID barcode
QC Filter flag Y=bad N=good
Read pair #
Formats vary
Don’t use in analysis
![Page 28: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/28.jpg)
Reference Genome
Read mapping
Differential expression
analysis
SAM/BAM
GFF/GTF
fasta
Reference Transcriptome
Experimental Design
Sequencing
RNA
Introduction
Data Quality Control • Quality assessment • Trimming and filtering
Data Quality Control
fastq
fastq
Transcriptome Assembly
![Page 29: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/29.jpg)
Data Quality Assessment • Evaluate read library quality
– Identify contaminants – Identify poor/bad samples
• Software – FastQC (recommended)
• Command-line, Java GUI, or Galaxy
– SolexaQC • Command-line • Supports quality-based read trimming and filtering
– SAMStat • Command-line • Also works with bam alignment files
![Page 30: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/30.jpg)
Data Quality Assessment • Trimming: remove bad bases from (end of) read
– Adaptor sequence – Low quality bases
• Filtering: remove bad reads from library – Low quality reads – Contaminating sequence – Low complexity reads (repeats) – Short reads
• Short (< 20bp) reads slow down mapping software • Only needed if trimming was performed
• Software – Galaxy, many options (NGS: QC and manipulation) – Tagdust – Many others: http://seqanswers.com/wiki/Software/list
![Page 31: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/31.jpg)
Data Quality Assessment - FastQC
Good
Bad Trimming needed
Phred 30 = 1 error / 1000 bases Phred 20 = 1 error / 100 bases
Quality scores across bases
![Page 32: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/32.jpg)
Data Quality Assessment - FastQC Good
Bad Filtering needed
Quality scores across reads
![Page 33: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/33.jpg)
Data Quality Assessment - FastQC GC Distribution Good
Bad
![Page 34: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/34.jpg)
Data Quality Assessment - FastQC
High level of sequencing adapter contamination, trimming needed
![Page 35: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/35.jpg)
Data Quality Assessment - FastQC Normal level of sequence duplication in 20 million read mammalian sample
![Page 36: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/36.jpg)
Data Quality Assessment - FastQC Normal sequence bias at beginning of reads due to non-random hybridization of random primers
![Page 37: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/37.jpg)
Data Quality Assessment
• Recommendations – Generate quality plots for all read libraries – Trim and/or filter data if needed
• Always trim and filter for de novo transcriptome assembly
– Regenerate quality plots after trimming and filtering to determine effectiveness
![Page 38: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/38.jpg)
Transcriptome Assembly
Differential expression
analysis
GFF/GTF
Reference Transcriptome
Experimental Design
Sequencing
Data Quality Control
RNA
fastq
Introduction
Read Mapping • Pipeline • Software • Input • Output Read mapping
fastq
SAM/BAM
Reference Genome
fasta
![Page 39: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/39.jpg)
Mapping – with reference genome Millions of short reads
Spliced aligner
Reads aligned to genome
Reference Genome
Abundance estimation
Differential expression analysis
Fastq
SAM/BAM
![Page 40: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/40.jpg)
Mapping – with reference genome Millions of short reads
Spliced aligner
Reads aligned to genome
Reference Genome
Abundance estimation
Differential expression analysis
Fastq
SAM/BAM
Unmapped reads
aligner
Reference splice Junction library
De novo splice junction library
or Fasta/GTF
Fasta/GTF
aligner
3 exon gene junction library
![Page 41: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/41.jpg)
Mapping • Alignment algorithm must be
– Fast – Able to handle SNPs, indels, and sequencing errors – Allow for introns for reference genome alignment (spliced alignment)
• Burrows Wheeler Transform (BWT) mappers – Faster – Few mismatches allowed (< 3) – Limited indel detection – Spliced: Tophat, MapSplice – Unspliced: BWA, Bowtie
• Hash table mappers – Slower – More mismatches allowed – Indel detection – Spliced: GSNAP, MapSplice – Unspliced: SHRiMP, Stampy
![Page 42: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/42.jpg)
Mapping • Input
– Fastq read libraries
– Reference genome index (software-specific: /project/db/genomes) – Insert size mean and stddev (for paired-end libraries)
• Map library (or a subset) using estimated mean and stddev • Calculate empirical mean and stddev
– Galaxy: NGS Picard: insertion size metrics – Cufflinks standard error
• Re-map library using empirical mean and stddev
![Page 43: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/43.jpg)
Mapping • Output
– SAM (text) / BAM (binary) alignment files • SAMtools – SAM/BAM file manipulation
– Summary statistics (per read library) • % reads with unique alignment • % reads with multiple alignments • % reads with no alignment • % reads properly paired (for paired-end libraries)
![Page 44: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/44.jpg)
Transcriptome Assembly
Experimental Design
Sequencing
Data Quality Control
Read mapping
RNA
fastq
fastq Reference Genome
fasta
Introduction
Differential Expression • Discrete vs continuous data • Cuffdiff and EdgeR
Differential Expression
Analysis
SAM/BAM
GFF/GTF
Reference Transcriptome
![Page 45: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/45.jpg)
Differential Expression • Discrete vs Continuous data
– Microarray florescence intensity data: continuous • Modeled using normal distribution
– RNA-Seq read count data: discrete • Modeled using negative binomial distribution
Microarray software cannot be used to analyze RNA-Seq data
![Page 46: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/46.jpg)
Differential Expression • Cuffdiff (Cufflinks package)
– Pairwise comparisons – Differential gene, transcript, and primary transcript expression; differential
splicing and promoter use – Easy to use, well documented – Input: transcriptome, SAM/BAM read alignments (abundance estimation built-in)
• EdgeR – Complex experimental designs using generalized linear model – Information sharing among genes (Bayesian gene-wise dispersion estimation)
– Difficult to use R package – Input: raw gene/transcript read counts (calculate abundance using separate software)
Consult a statistician
![Page 47: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/47.jpg)
Introduction
Experimental Design
Sequencing
RNA
Data Quality Control
fastq
Read mapping
fastq
Transcript abundance estimation
Differential expression analysis
SAM/BAM Suggested Implementation (reference genome
available)
Introduction
Two group design
Sequencing
RNA
FastQC
fastq
Tophat
fastq
Cuffdiff (Cufflinks package)
SAM/BAM
Reference Transcriptome
![Page 48: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/48.jpg)
Experimental Design
Sequencing
Data Quality Control
Read mapping
Differential expression
analysis
RNA
fastq
fastq
SAM/BAM
GFF/GTF
Reference Genome
fasta
Reference Transcriptome
Introduction
Transcriptome Assembly • Pipeline • Software • Input • Output
Transcriptome Assembly
![Page 49: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/49.jpg)
Data Quality Control
Read mapping
Differential Expression Analysis
fastq
SAM/BAM
GFF/GTF
Reference Genome
fasta
Data Quality Control
Read mapping
Differential Expression Analysis
fastq
SAM/BAM
Transcriptome assembly
fasta
Data Quality Control
Read mapping
Transcriptome assembly
Differential Expression Analysis
fastq
SAM/BAM
GFF/GTF
fasta
Reference Transcriptome
Reference Genome
RNA-Seq • Reference genome • Reference transcriptome
RNA-Seq • Reference genome • No reference transcriptome
RNA-Seq • No reference genome • No reference transcriptome
Experimental Design
Sequencing
RNA
fastq
Experimental Design
Sequencing
RNA
fastq
Experimental Design
Sequencing
RNA
fastq
![Page 50: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/50.jpg)
Transcriptome Assembly -with reference genome
Reads aligned to genome
With de novo junction library
Reads aligned to genome
Abundance estimation
Reference transcriptome
De novo transcriptome
Assembler
GFF/GTF
GFF/GTF
SAM/BAM
Abundance estimation
Reference transcriptome available No/poor reference transcriptome available
Novel transcript identification
Annotate
SAM/BAM
![Page 51: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/51.jpg)
Transcriptome Assembly -with reference genome • Reference genome based assembly
– Cufflinks, Scripture
• Reference annotation based assembly – Cufflinks
• Transcriptome comparison – Cuffcompare
• Transcriptome Annotation – Generate cDNA fasta from annotation (Cufflinks’ gffread program)
– Align to library of known cDNA (RefSeq, GenBank)
![Page 52: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/52.jpg)
Transcriptome Assembly – no reference genome
Millions of short reads
Unspliced Aligner
Reads aligned to transcriptome
De novo Transcriptome
Assembler
Abundance estimation
Differential expression analysis
Fastq
SAM/BAM
Trinity Trans-ABySS Velvet/oasis
Computationally intensive
![Page 53: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/53.jpg)
Further Reading Bioinformatics for High Throughput Sequencing Rodríguez-Ezpeleta, Naiara.; Hackenberg, Michael.; Aransay, Ana M.; SpringerLink New York, NY : Springer c2012
RNA sequencing: advances, challenges and opportunities Fatih Ozsolak1 & Patrice M. Milos1 Nature Reviews Genetics 12, 87-98 (February 2011)
Computational methods for transcriptome annotation and quantification using RNA-seq Manuel Garber, Manfred G Grabherr, Mitchell Guttman & Cole Trapnell Nature Methods 8, 469–477 (2011)
Next-generation transcriptome assembly Jeffrey A. Martin & Zhong Wang Nature Reviews Genetics 12, 671-682 (October 2011)
Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks Cole Trapnell, Adam Roberts, Loyal Goff, Geo Pertea, Daehwan Kim, David Kelley, Harold Pimentel, Steven Salzberg, John L Rinn & Lior Pachter Nature Protocols 7, 562–578 (2012)
SEQanswers.com biostar.stackexchange.com
Online access through U library
Table of RNA-Seq software
Popular bioinformatics forums
![Page 54: RNA-Seq Tutorial 1 - Home | Minnesota … isolation Sequence fragment end(s) Map reads Sample 1 Calculate transcript abundance Fragmentation RNA -> cDNA Genome Reference Transcriptome](https://reader031.vdocuments.site/reader031/viewer/2022011723/5aa55b6f7f8b9a2f048d4d6c/html5/thumbnails/54.jpg)
Questions / Discussion