Download - RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
![Page 1: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/1.jpg)
RNA-Seq and RNA Structure Prediction
Xiaole Shirley Liu
STAT115, STAT215, BIO298, BIST520
![Page 2: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/2.jpg)
Outline
• RNA-seq– Experiments– Analysis: read mapping, expression index,
isoform inference, differential expression
• RNA structure prediction– Covariance model
– Base-pair maximization
– Free energy method
2
![Page 3: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/3.jpg)
RNA-seq
Mortazavi et al, Nat Meth 20083
![Page 4: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/4.jpg)
RNA-Frag has Less 3’ Biase
Wang et al. 20094
![Page 5: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/5.jpg)
RNA-Seq: Alternative to Microarrays
• General expression profiling
• Novel genes
• Alternative splicing
• Detect gene fusion
• Can use on any sequenced genome
• Better dynamic range
• Cleaner and more informative data
• Data analysis challenges5
![Page 6: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/6.jpg)
Mapping
• Bowtie or Maq mapping identify transcribed known or novel exons
• Longer (e,g. 100bp)
paired-end libraries are
better
6
![Page 7: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/7.jpg)
Transcript Abundances
• More reads mapped to longer genes
• More reads mapped if sequencing is deep
• RPKM: reads per kb transcript per million reads: 1 RPKM ~ 0.3 -1 transcript / cell
• Low technical noise
(Poisson distribution)
but high biological noise
(over dispersion, neg
binomial)7
![Page 8: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/8.jpg)
Different Alternative Splicing
8
![Page 9: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/9.jpg)
Isoform Inference
• If given known set of isoforms
• Estimate x to maximize the likelihood of observing n
9
![Page 10: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/10.jpg)
Known Isoform Abundance Inference
10
![Page 11: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/11.jpg)
De novo isoform inference
11
![Page 12: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/12.jpg)
Isoform Inference
• With known isoform set, sometimes the gene-level expression level inference is great, although isoform abundances have big uncertainty
(e.g. known set is not complete)
• De novo isoform inference is a non-identifiable problem with current RNA-seq protocol and (short) read length
(e.g. exon and isoform numbers are big)12
![Page 13: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/13.jpg)
Gene Fusion
• Down regulation of tumor suppressor or up regulation of oncogenes
Maher et al, Nat 200913
![Page 14: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/14.jpg)
A Few Algorithms
• Expression index and isoform inference– Cufflinks from Steve Salzburg– Rseq from Wing Wong– Scripture from Aviv Regev
• Differential expression– Cufflinks– DESeq from Wolfgang Huber– EdgeR from Gordon Smyth– Replicates are still preferred!
• Still need systematic evaluation 14
![Page 15: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/15.jpg)
15
Why do we Care?
• RNA (tRNA, rRNA) structure determines function
• Many non-coding RNA genes have special structure, which leads to special functions– ncRNA genes later
Mostly RNA 2nd structure: G-C and A-U;G-U
![Page 16: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/16.jpg)
16
Simple RNA Structures
![Page 17: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/17.jpg)
17
More Complex Interactions
• Kissing hairpins
• Pseudoknots
• Hairpin-bulge contact
![Page 18: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/18.jpg)
18
RNA Structure Representations
![Page 19: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/19.jpg)
19
Covariance Models
• Get related RNA sequences, obtain multiple sequence alignment– E.g. orthologous RNA from many species or family of
RNA believed to have similar structure and function
– Require sequences be similar enough so that they can be initially aligned
• Look at every pair of columns and check for covarying substitutions– Sequences should be dissimilar enough for covarying
substitutions to be detected
![Page 20: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/20.jpg)
20
Base-Pair Maximization
• Find structure with the max # of base pairs
• Efficient dynamic programming solution introduced by Nussinov (1970s)
• Compare a sequence against itself in a dynamic programming matrix
• Since structure folds upon itself, only necessary to calculate half the matrix
• Four rules for scoring the structure at a particular point
![Page 21: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/21.jpg)
21
Nussinov Algorithm
• Initialization: score for complementary matches along main diagonal and diagonal just below it are set to zero
![Page 22: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/22.jpg)
22
Nussinov Algorithm
• Fill matrix: M[i][j] = max of the following– M[i+1][j-1] + S(xi, xj)
– M[i+1][j]– M[i][j-1]
– M[i][j] = MAXi<k<j (M[i][k] + M[k+1][j])
![Page 23: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/23.jpg)
23
Nussinov Algorithm
• Fill diagonal by diagonal (assume no bulge penalty, similar to SW gap penalty)
i
j
![Page 24: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/24.jpg)
24
Nussinov Algorithm
• Trace back from upper right corner to get the structure
![Page 25: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/25.jpg)
25
Free Energy Method
• Mfold: Mathews, JMB 1999• Predict the correct secondary structure by
minimizing the free energy (G)• Energy: Base pairing and base stacking
![Page 26: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/26.jpg)
26
Energy Factors
• Consecutive basepairing,
good
• Internal bulge, bad
• Terminal basepairing,
not stable
• Hairpin loop, interior
and bulge loop destabilize energies
![Page 27: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/27.jpg)
27
Energy Minimization
• Assume: the most likely structure is the most stable structure energetically
• Energy associated with any position is only influenced by local sequence and structure
• Does not consider pseudoknot formation• Dynamic program
![Page 28: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/28.jpg)
28
Energy Minimization
![Page 29: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/29.jpg)
29
Vienna RNA Package
• Vienna RNA web
![Page 30: RNA-Seq and RNA Structure Prediction Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520](https://reader035.vdocuments.site/reader035/viewer/2022081514/56649dc75503460f94abbb44/html5/thumbnails/30.jpg)
30
Summary• RNA-seq
– Cutflinks: read mapping, expression index, isoform inference, differential expression
– Different technique, analysis, and output for different tasks
– Awaiting RMA of RNA-seq
– 3rd generation sequencing might read whole transcript
• RNA structure prediction methods– Covariance model: mutual information
– Base-pair maximization: Nussinov
– Free energy method: Mfold, Vienna RNA
– Caution: best is the enemy of the good