predicting rna structure and function. non coding dna (98.5% human genome) intergenic repetitive...
Post on 19-Dec-2015
217 views
TRANSCRIPT
Non coding DNA (98.5% human genome)
• Intergenic
• Repetitive elements
• Promoters
• Introns
• mRNA untranslated region (UTR)
RNA Molecules
• mRNA• tRNA• rRNA• Other types of RNA
-RNaseP –trimming 5’ end of pre tRNA
-telomerase RNA- maintaining the chromosome ends
-Xist RNA- inactivation of the extra copy of the x chromosome
Some biological functions of ncRNA
• Nuclear export
• mRNA cellular localization
• Control of mRNA stability
• Control of translation
The function of the RNA molecule depends on its folded structure
Control of Iron levels by mRNA secondary structure
G U A GC N N N’ N N’ N N’ N N’C N N’ N N’ N N’ N N’ N N’ 5’ 3’
conserved
Iron Responsive ElementIRE
Recognized byIRP1, IRP2
IRP1/2
5’ 3’F mRNA
5’ 3’TR mRNA
IRP1/2
F: Ferritin = iron storageTR: Transferin receptor = iron uptake
IRE
Low Iron IRE-IRP inhibits translation of ferritinIRE-IRP Inhibition of degradation of TR
High IronIRE-IRP off -> ferritin translated
Transferin receptor degradated
RNA Secondary Structure
U U
C G U A A UG C
5’ 3’
5’G A U C U U G A U C
3’
STEM
LOOP• The RNA molecule folds on itself. • The base pairing is as follows: G C A U G U hydrogen bond.
RNA Secondary structure
G G A U
U GC C GG A U A A U G CA G C U U
INTERNAL LOOP
HAIRPIN LOOP
BULGE
STEM
DANGLING ENDS5’ 3’
Examples of known interactions of RNA secondary structural
elementsPseudo-knot
Kissing hairpins
Hairpin-bulge contact
These patterns are excluded from the prediction schemes as their computation is too intensive.
Predicting RNA secondary Structure
• According to base pairing rules only
(Watson Crick A-T G-C and wobble pairs G-T)
sequences may form different structures
• An energy value is associated with each possible structure
• Predict the structure with the minimal free energy (MFE)
Simplifying Assumptions for Structure Prediction
• RNA folds into one minimum free-energy structure.
• There are no knots (base pairs never cross).
• The energy of a particular base pair in a double stranded regions is sequence independent– Neighbors do not influence the energy.
Was solved by dynamic programmingZucker and Steigler 1981
Sequence dependent free-energy values of the base pairs
(nearest neighbor model) U U
C G G C A UG CA UCGAC 3’5’
U U
C G U A A UG CA UCGAC 3’5’
Example values:GC GC GC GCAU GC CG UA -2.3 -2.9 -3.4 -2.1
Free energy computation
U UA A G C G C A G C U A A U C G A U A 3’A5’
-0.3
-0.3
-1.1 mismatch of hairpin-2.9 stacking
+3.3 1nt bulge -2.9 stacking
-1.8 stacking
5’ dangling
-0.9 stacking-1.8 stacking
-2.1 stacking
G= -4.6 KCAL/MOL
+5.9 4 nt loop
Adding Complexity to Energy Calculations
• Stacking energy - We assign negative energies to these between base pair regions.– Energy is influenced by the previous base pair (not by
the base pairs further down).
– These energies are estimated experimentally from small synthetic RNAs.
• Positive energy - added for destabilizing regions such as bulges, loops, etc.
• More than one structure can be predicted
Prediction Tools based on Energy Calculation
Fold, Mfold Zucker & Stiegler (1981) Nuc. Acids Res.
9:133-148Zucker (1989) Science 244:48-52
RNAfoldVienna RNA secondary structure serverHofacker (2003) Nuc. Acids Res. 31:3429-3431
Tools’ Features• Sub-optimal structures
-Provide solutions within a specific energy range.
• Constraints- Regions known experimentally to be single/double
stranded can be defined.
• Statistical significance- Currently lacking in energy based methods- Recently was suggested to estimate a significant stable
and conserved fold in aligned sequences (Washietl ad Hofacker 2004)
• Support by compensatory mutations.
Evolutionary conservation of RNA molecules can be revealed by
identification of compensatory mutations
G C C U U C G G G CG A C U U C G G U CG G C U U C G G C C
U CU GC GN N’G C
Insight from Multiple Alignment
information from multiple alignment about theprobability of positions i,j to be base-paired.
• Conservation – no additional information• Consistent mutations (GC GU) – support
stem• Inconsistent mutations – does not support stem.• Compensatory mutations – support stem.
RNAalifold (Hofacker 2002)From the vienna RNA package
Predicts the consensus secondarystructure for a set of aligned RNA sequences by using modified dynamic programming algorithm that addcovariance term to the standardenergy model
Improvement in prediction accuracy
Other related programs
• COVE
RNA structure analysis using the covariance model (implementation of the stochastic free grammar method)
• QRNA (Rivas and Eddy 2001)
Searching for conserved RNA structures
• tRNAscan-SE tRNA detection in genome sequences
Sean Eddy’s Lab WUhttp://www.genetics.wustl.edu/eddy
RNA families
• Rfam : General non-coding RNA database
(most of the data is taken from specific databases)
http://www.sanger.ac.uk/Software/Rfam/
Includes many families of non coding RNAs and functionalMotifs, as well as their alignement and their secondary structures
Rfam (currently version 6.1)
• 379 different RNA families or functional
Motifs from mRNA UTRs etc.
GENE
INTRON
Cis ELEMENTS