rna-seq 13-0… · 13. januar 2020 dtu aqua 12 1. definitions 2. sample collections and rna...
TRANSCRIPT
DTU Aqua13. januar 2020 2
SUMMARY
1. Definitions
2. Sample collections and RNA integrity
3. Library preparation
4. Data analyses
– Reads mapping/assembly
– Normalization
– Read count
– Differential expression
– Functional enrichment
DTU Aqua13. januar 2020 3
TRANSCRIPTOME
Complete set of transcripts in a cell and their
quantity, for a specific developmental stage or a
physiological condition.
DTU Aqua13. januar 2020 4
RNA classification
•Ribosomal RNA (rRNA): catalytic
component of ribosomes (about 80-85%)
•Transfer RNA (tRNA): transfers amino acids
to polypeptide chain at the ribosomal site of
protein synthesis (about 15%)
•Coding RNA(mRNA): carries information
about a protein sequence to the ribosomes
(about 5%)
•Other Non coding regulatory RNAs
DTU Aqua13. januar 2020 5
Delpu et al. 2016. Drug Discovery in Cancer Epigenetics
Other non coding regulatory RNAs
3. RNA classification
DTU Aqua13. januar 2020 6
Long RNAs: splicing
DNA
RNA
mRNA
lncRNA
1. Definitions
DTU Aqua13. januar 2020 7
RNA-seq
• Abundance estimation/differential expression
• Alternative splicing
• RNA editing
• Novel transcripts
• Allele specific expression
• Fusion transcripts
• Single cell sequencing
High-throughput sequencing technology used
for probing the transcriptome of a sample
DTU Aqua13. januar 2020 8
1. Definitions
2. Sample collections and RNA integrity
3. Library preparation
4. Data analyses
– Read mapping/assembly
– Normalization
– Read count
– Differential expression
– Functional enrichment
DTU Aqua13. januar 2020 9
Before RNA extraction
RNA is more unstable than DNA, therefore higher
precautions are needed to avoid degradation
TISSUE COLLECTION:
• Liquid nitrogen
• RNA later (for solid tissues)
• Tempus/Pax tubes (for blood)
DTU Aqua13. januar 2020 10
After RNA extraction
RIN (RNA integrity number): algorithm for
assigning integrity values to RNA measurements.
10: maximum
0: minimum
Integrity RIN>7 ok
DTU Aqua13. januar 2020 11
RNA quality (RIN)
and quantification:
Bioanalyzer
DTU Aqua13. januar 2020 12
1. Definitions
2. Sample collections and RNA integrity
3. Library preparation
4. Data analyses
– Reads mapping/assembly
– Normalization
– Read count
– Differential expression
– Functional enrichment
DTU Aqua13. januar 2020 13
Different steps for different
RNAs
Total RNA seq (DNase treatment, Ribosomal
depletion, fragmentation, library preparation)
mRNA+lnc (polyA+) RNA seq (DNase
treatment, polyA enrichment, fragmentation,
library preparation)
shortRNA seq (DNase treatment, Size selection,
library preparation)
DTU Aqua13. januar 2020 14
LIBRARY PREPARATION
DTU Aqua13. januar 2020 15
LIBRARY PREPARATION…
with 3rd gen. sequencing
DTU Aqua13. januar 2020 16
SUMMARY
1. Definitions
2. Sample collections and RNA integrity
3. Library preparation
4. Data analyses
– Read mapping/assembly
– Normalization
– Read count
– Differential expression
– Functional enrichment
DTU Aqua13. januar 2020 17
Transcriptome assembly strategies
Reference-based
De novo
Pseudoalignment
DTU Aqua13. januar 2020 18
Martin et al. 2011, Nature Review Genetics
DTU Aqua13. januar 2020 1919
Reference-based: Most common tools
• Unspliced read aligner
BWA
Bowtie2
Novoalign
• Spliced read aligner
Tophat2/Hisat2
STAR
• Splice-junction not
considered
• Ideal for mapping against
cDNA databases
• Novel splice-junction
detected
• Better performance for
polymorphic regions and
pseudogenes
DTU Aqua13. januar 2020 20
• (Cufflinks/StringTie)
1) First you map all the reads from your experiment
to the reference sequence.
2) Then you run another step where you use the
mapped reads to assemble potential transcripts
and identify the genomic locations of introns and
exons.
REFERENCE-GUIDED
ASSEMBLY
DTU Aqua13. januar 2020 21
Splice junctions view through IGV (Integrative Genomics Viewer)
DTU Aqua13. januar 2020 22
• Velvet
Genomics and transcriptomics
• Trinity
Transcriptomics
De novo assembly: Most common tools
DTU Aqua13. januar 2020
KALLISTO-PSEUDOALIGMENT
• Most RNA seq tools do RNA seq analysis in two
parts-
• Alignment
• Quantification
• Kallisto fuses the two steps
N. Bray et al., Nature Biotechnology (2016)
DTU Aqua13. januar 2020
...
... ...
...
...
... ...
...
...
... ...
...
∩∩ =
a
b
c
d
e
• Create every k-mer in the transcriptome, build de Bruin
Graph and mark each k-mer
• Preprocess the transcriptome to create the T-DBG
• Indexing is faster
Target de Bruijn Graph (T-DBG)
DTU Aqua13. januar 2020
Target de Bruijn Graph (T-DBG)
...
... ...
...
...
... ...
...
...
... ...
...
∩∩ =
a
b
c
d
e
...
... ...
...
...
... ...
...
...
... ...
...
∩∩ =
a
b
c
d
e
...
... ...
...
...
... ...
...
...
... ...
...
∩∩ =
a
b
c
d
e
• Use k-mers in read to find which transcript it came
from
• pseudoalignment : which transcripts the read (pair) is
compatible
DTU Aqua13. januar 2020
Target de Bruijn Graph (T-DBG)
...
... ...
...
...
... ...
...
...
... ...
...
∩∩ =
a
b
c
d
e
• Each k-mer appears in a set of transcripts
• The intersection of all sets is our pseudoalignment
http://arxiv.org/pdf/1505.02710v2.pdf
DTU Aqua13. januar 2020 27
NORMALIZATION
• Longer genes will have more reads mapping to
them (within samples)
• Sequencing run with more depth will have more
reads mapping on each gene (between
samples)
DTU Aqua13. januar 2020 28
MAIN FACTORS DURING
NORMALIZATION
Sequencing depth
DTU Aqua13. januar 2020 29
MAIN FACTORS DURING
NORMALIZATION
Gene length
DTU Aqua13. januar 2020 30
MAIN FACTORS DURING
NORMALIZATION
RNA composition Anders and Huber , 2010 Genome Biol.
DTU Aqua13. januar 2020 31
NORMALIZATION
Normalization method Description Accounted factorsRecommendations for
use
TPM (transcripts per
kilobase million)
counts per length of
transcript (kb) per million
reads mapped
sequencing depth and
gene length
gene count comparisons
within a sample or
between samples of the
same sample group; NOT
for DE analysis
RPKM/FPKM(reads/frag
ments per kilobase of
exon per million
reads/fragments
mapped)
similar to TPMsequencing depth and
gene length
gene count comparisons
between genes within a
sample; NOT for
between sample
comparisons or DE
analysis
DESeq2’s median of
ratios
counts divided by
sample-specific size
factors determined by
median ratio of gene
counts relative to
geometric mean per gene
sequencing depth and
RNA composition
gene count comparisons
between samples and
for DE analysis; NOT for
within sample
comparisons
Common normalization methods
DTU Aqua13. januar 2020 32
READ COUNT
Count the
number of reads
aligned to each
known
transcripts/isofor
m
E.g HTSeq-count
DTU Aqua13. januar 2020 33
DIFFERENTIAL EXPRESSION
DTU Aqua13. januar 2020 34
FUNCTIONAL ENRICHMENT
ANALYSIS
Identification of classes of genes that are over-
represented among the differentially expressed genes,
and may have an association with the
disease/phenotype investigated
Gene Ontology project provides an ontology of defined terms representing
gene product properties. The ontology covers three domains:
•Molecular function: molecular activities of gene products
•Cellular component: where gene products are active
•Biological process: pathways and larger processes made up of the
activities of multiple gene products.
DTU Aqua13. januar 2020 35
Biological databases available
•Gene Ontology (GO)
• KEGG (Kyoto Encyclopedia of Genes and
Genomes)
•Reactome
• Ingenuity Pathway Analysis (IPA)
•MSigDB (Molecular Signatures Database)
•DAVID (Database for Annotation, Visualization
and Integrated Discovery)
• Panther
•Gorilla
DTU Aqua13. januar 2020 36
Some GO and pathway analyses
websites
http://amp.pharm.mssm.edu/Enrichr/
http://cbl-gorilla.cs.technion.ac.il/
https://david.ncifcrf.gov/
https://cytoscape.org/
DTU Aqua13. januar 2020 37
ARE YOU LOOKING FOR
THESIS/PROJECT?
You can learn more about RNA-seq and its
application in fish:
• ecology
• health
• aquaculture
You can learn more about NGS and its application in fish with the course
25334 Genomic methods in breeding and management of aquatic
living resources (fall 2020)