predicting rna structure and function. non coding dna (98.5% human genome) intergenic repetitive...

27
Predicting RNA Structure and Function

Post on 19-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

PredictingRNA Structure and Function

Non coding DNA (98.5% human genome)

• Intergenic

• Repetitive elements

• Promoters

• Introns

• mRNA untranslated region (UTR)

RNA Molecules

• mRNA• tRNA• rRNA• Other types of RNA

-RNaseP –trimming 5’ end of pre tRNA

-telomerase RNA- maintaining the chromosome ends

-Xist RNA- inactivation of the extra copy of the x chromosome

Some biological functions of ncRNA

• Nuclear export

• mRNA cellular localization

• Control of mRNA stability

• Control of translation

The function of the RNA molecule depends on its folded structure

Control of Iron levels by mRNA secondary structure

G U A GC N N N’ N N’ N N’ N N’C N N’ N N’ N N’ N N’ N N’ 5’ 3’

conserved

Iron Responsive ElementIRE

Recognized byIRP1, IRP2

IRP1/2

5’ 3’F mRNA

5’ 3’TR mRNA

IRP1/2

F: Ferritin = iron storageTR: Transferin receptor = iron uptake

IRE

Low Iron IRE-IRP inhibits translation of ferritinIRE-IRP Inhibition of degradation of TR

High IronIRE-IRP off -> ferritin translated

Transferin receptor degradated

RNA Secondary Structure

U U

C G U A A UG C

5’ 3’

5’G A U C U U G A U C

3’

STEM

LOOP• The RNA molecule folds on itself. • The base pairing is as follows: G C A U G U hydrogen bond.

RNA Secondary structure

G G A U

U GC C GG A U A A U G CA G C U U

INTERNAL LOOP

HAIRPIN LOOP

BULGE

STEM

DANGLING ENDS5’ 3’

RNA secondary Structure representation

tRNA

RNA secondary Structure representation

Large subunitsrRNA

Small subunitrRNA

Legal structure Pseudoknots

RNA secondary structure representation

Examples of known interactions of RNA secondary structural

elementsPseudo-knot

Kissing hairpins

Hairpin-bulge contact

These patterns are excluded from the prediction schemes as their computation is too intensive.

Predicting RNA secondary Structure

• According to base pairing rules only

(Watson Crick A-T G-C and wobble pairs G-T)

sequences may form different structures

• An energy value is associated with each possible structure

• Predict the structure with the minimal free energy (MFE)

Simplifying Assumptions for Structure Prediction

• RNA folds into one minimum free-energy structure.

• There are no knots (base pairs never cross).

• The energy of a particular base pair in a double stranded regions is sequence independent– Neighbors do not influence the energy.

Was solved by dynamic programmingZucker and Steigler 1981

Sequence dependent free-energy values of the base pairs

(nearest neighbor model) U U

C G G C A UG CA UCGAC 3’5’

U U

C G U A A UG CA UCGAC 3’5’

Example values:GC GC GC GCAU GC CG UA -2.3 -2.9 -3.4 -2.1

Free energy computation

U UA A G C G C A G C U A A U C G A U A 3’A5’

-0.3

-0.3

-1.1 mismatch of hairpin-2.9 stacking

+3.3 1nt bulge -2.9 stacking

-1.8 stacking

5’ dangling

-0.9 stacking-1.8 stacking

-2.1 stacking

G= -4.6 KCAL/MOL

+5.9 4 nt loop

Adding Complexity to Energy Calculations

• Stacking energy - We assign negative energies to these between base pair regions.– Energy is influenced by the previous base pair (not by

the base pairs further down).

– These energies are estimated experimentally from small synthetic RNAs.

• Positive energy - added for destabilizing regions such as bulges, loops, etc.

• More than one structure can be predicted

Prediction Tools based on Energy Calculation

Fold, Mfold Zucker & Stiegler (1981) Nuc. Acids Res.

9:133-148Zucker (1989) Science 244:48-52

RNAfoldVienna RNA secondary structure serverHofacker (2003) Nuc. Acids Res. 31:3429-3431

Tools’ Features• Sub-optimal structures

-Provide solutions within a specific energy range.

• Constraints- Regions known experimentally to be single/double

stranded can be defined.

• Statistical significance- Currently lacking in energy based methods- Recently was suggested to estimate a significant stable

and conserved fold in aligned sequences (Washietl ad Hofacker 2004)

• Support by compensatory mutations.

Compensatory Substitutions

U U

C G U A A UG CA UCGAC 3’

G C

5’

Maintain the secondary structure

Evolutionary conservation of RNA molecules can be revealed by

identification of compensatory mutations

G C C U U C G G G CG A C U U C G G U CG G C U U C G G C C

U CU GC GN N’G C

Insight from Multiple Alignment

information from multiple alignment about theprobability of positions i,j to be base-paired.

• Conservation – no additional information• Consistent mutations (GC GU) – support

stem• Inconsistent mutations – does not support stem.• Compensatory mutations – support stem.

RNAalifold (Hofacker 2002)From the vienna RNA package

Predicts the consensus secondarystructure for a set of aligned RNA sequences by using modified dynamic programming algorithm that addcovariance term to the standardenergy model

Improvement in prediction accuracy

Other related programs

• COVE

RNA structure analysis using the covariance model (implementation of the stochastic free grammar method)

• QRNA (Rivas and Eddy 2001)

Searching for conserved RNA structures

• tRNAscan-SE tRNA detection in genome sequences

Sean Eddy’s Lab WUhttp://www.genetics.wustl.edu/eddy

RNA families

• Rfam : General non-coding RNA database

(most of the data is taken from specific databases)

http://www.sanger.ac.uk/Software/Rfam/

Includes many families of non coding RNAs and functionalMotifs, as well as their alignement and their secondary structures

Rfam (currently version 6.1)

• 379 different RNA families or functional

Motifs from mRNA UTRs etc.

GENE

INTRON

Cis ELEMENTS

An example of an RNA family miR-1 MicroRNAs