110/17/07bcb 444/544 f07 isu terribilini #24 - rna secondary structure prediction bcb 444/544...

49
1 BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07 BCB 444/544 Lecture 24 Protein Tertiary Structure Prediction #24_Oct17

Upload: wilfred-charles

Post on 14-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

1BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

BCB 444/544

Lecture 24

Protein Tertiary Structure Prediction

#24_Oct17

Page 2: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

2BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Mon Oct 15 - Lecture 23

Protein Tertiary Structure Prediction

• Chp 15 - pp 214 - 230

Wed Oct 17 & Thurs Oct 18 - Lecture 24 & Lab 8

(Terribilini)

RNA Structure/Function & RNA Structure Prediction

• Chp 16 - pp 231 - 242

Fri Oct 18 - Lecture 25

Gene Prediction • Chp 8 - pp 97 - 112

Required Reading (before lecture)

Page 3: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

3BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

New Reading & Homework Assignment

ALL: HomeWork #4 (emailed & posted online Sat AM)

Due: Mon Oct 22 by 5 PM (not Fri Oct 19) Read:

Ginalski et al.(2005) Practical Lessons from Protein Structure Prediction, Nucleic Acids Res. 33:1874-91. http://nar.oxfordjournals.org/cgi/content/full/33/6/1874 (PDF posted on website)

• Although somewhat dated, this paper provides a nice overview of protein structure prediction methods and evaluation of predicted structures.

• Your assignment is to write a summary of this paper - for details see HW#4 posted online & sent by email on Sat

Oct 13

Page 4: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

4BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Seminars this Week

BCB List of URLs for Seminars related to Bioinformatics:http://www.bcb.iastate.edu/seminars/index.html

• Oct 18 Thur - BBMB Seminar 4:10 in 1414 MBB • Sachdeve Sidhu (Genentech) Phage peptide and

antibody libraries in protein engineering and ligand selection

• Oct 19 Fri - BCB Faculty Seminar 2:10 in 102 ScI• Lyric Bartholomay (Ent, ISU) TBA

Page 5: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

5BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Chp 15 - Tertiary Structure Prediction

SECTION V STRUCTURAL BIOINFORMATICS

Xiong: Chp 15

Protein Tertiary Structure Prediction

• Methods• Homology Modeling• Threading and Fold Recognition• Ab Initio Protein Structural Prediction• CASP

Page 6: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

6BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Tertiary Structure Prediction Methods

2 (or 3) Major Methods:1. Comparative Modeling:

• Homology Modeling (easiest!) • Threading and Fold Recognition (harder)

2. Ab Initio Protein Structural Prediction (really hard)

Page 7: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

7BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

1. Align target sequence with template structures

in fold library (usually from the PDB)

2. Calculate energy score to evaluate "goodness of fit" between target sequence & template structure

3. Rank models based on energy scores

Target Sequence

Structure Templates

ALKKGF…HFDTSE

Steps in Threading

Page 8: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

8BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

A Local Example: Rapid Threading Approach for Protein Structure Prediction

Kai-Ming Ho, Physics Haibo Cao

Yungok Ihm Zhong Gao

James MorrisCai-zhuang

Wang Drena Dobbs, GDCB

Jae-Hyung LeeMichael

TerribiliniJeff Sander

Cao H, Ihm Y, Wang, CZ, Morris, JR, Su, M, Dobbs, D, Ho, KM (2004)

Three-dimensional threading approach to protein structure recognition

Polymer 45:687-697

Page 9: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

9BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Simplify: Template structure representation

,1ijC 5.6ijr Åif (contact)

,0ijC Otherwise

A neighbor in sequence (non-contact)

i

j

1

N

Template structure ( contact matrix) C NN

Yungok Ihm

Page 10: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

10BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Simplify: Energy Function

• Interaction “counts” only if two hydrophobic amino acid residues are in contact

• At residue level, pair-wise hydrophobic interaction is dominant:

E = i,j Cij Uij

Cij : contact matrix

Uij = U(residue I, residue J)

MJ: U = Uij

LTW: U = Qi*Qj

HP: U = {1,0}

Yungok Ihm

Page 11: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

Energy calculation: Contact energy

Miyazawa-Jernigan (MJ) matrix:

210 parametersStatistical potential

Li-Tang-Wingreen (LTW):

20 parameters

})){(2~

( jiij qqCM

Contact Energy: )(1

ijjijic CQCQEN

ij

2604.0,6797.0

ii qQ

with

C M F I L

CMFILVW

046 054 -020 049 -001 006057 001 003 -008052 018 010 -001 -004

M

iq

Qi~ solubility

~ hydrophobicity

contact matrix C

Yungok Ihm

Page 12: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

ij

1

N

Template Structure

N

ij

jijic QCQE1

Contact Energy

Contact Matrix

Sequence

AVFMRIHNDIVYNDIANTTQ

Sequence Vector

)6497.0 ,1197.1 ,9897.0 ,7997.0(

),.....,,,(

EFVA QQQQS

otherwise(a neighbor in sequence)

,0

56 if ,1

ij

ijij

C

rC Å

Scoring Function

Summary of Ho Threading Procedure

Yungok Ihm

Page 13: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

Can complexity be further reduced?Consider simplifying structure representation, too

ALKKGF…HFDTSE

Sequence – Structure (1D – 3D problem)

(1D – 2D problem)

(1D – 1D problem)

Sequence – Contact Matrix

Sequence – 1D Profile

Haibo Cao

Page 14: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

Represent contact matrix by its dominanteigenvector (1D profile)

• First eigenvector (with highest eigenvalue) dominates the overlap between sequence and structure

• Higher ranking (rank > 4) eigenvectors are “sequence blind”

Haibo Cao

Page 15: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

15BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Threading Alignment StepThreading Alignment Step - - now fast! now fast! Align Align target sequence vector (1D)target sequence vector (1D) with with eigenvector profile of eigenvector profile of template structure template structure (1D)(1D)

1VP 1D Profile

Maximize the overlap between the

Sequence (S) and the profile (P) allowing gapsPS

Calculate contact energy

using the alignment: Ec

New profile CPP

Cao et al Polymer 45 (2004)

Page 16: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

16BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Parameters for alignment?

• Gap penalty: Insertion/deletion in helices or

strands is strongly penalized; smaller penalties for in/dels in loops

Gap penalties apply to alignment score only, not to energy calculation

• Size penalty: If a target residue and aligned

template residue differ in radius by > 0.5Å and if residue is involved in > 2 contacts, alignment is penalized

Size penalties apply to alignment score only, not to energy calculation

Loop

Helix

ALKKGFG…HFDTSE

Yungok Ihm

Page 17: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

17BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

How incorporate secondary structure?

• Predict secondary structure of target sequence (PSIPRED, PROF, JPRED, SAM, GOR V)

N+ = total number of matches between predicted & actual secondary structure of template

N- = total number of mismatches

Ns = total number of residues selected in alignment

“Global fitness” : f = 1 + (N+ - N-) / Ns

Emod = f * Ethreading

Yungok Ihm

Page 18: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

How much better is this “fit” than random?

Eshuffle : Shuffled Sequence vs Structure

Erelative = Emod – Eshuffled

Yungok Ihm

Avg E score for same sequence shuffled (randomized) many times

E score modifed to reflect fit with predicted 2' structure

Page 19: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

19BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Performance Evaluation? "Blind Test"

CASP5 Competition (CASP7 is most recent)

(Critical Assessment of Protein Structure Prediction)

Given: Amino acid sequence

Goal: Predict 3-D structure (before experimental results published)

Page 20: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

20BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Typical Results: (well, actually, our BEST Results):

HO = #1-Ranked CASP5 Prediction for this Target

• Target 174

• PDB ID = 1MG7

Actual Structure

Predicted Structure

T174_1

T174_2

Cao, Ihm, Wang, Dobbs, Ho

Page 21: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

21BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

• FR Fold Recognition • (targets manually assessed by Nick Grishin)

• -----------------------------------------------------------

• Rank Z-Score Ngood Npred NgNW NpNW Group-name • 1 24.26 9.00 12.00 9 12 Ginalski • 2 21.64 7.00 12.00 7 12 Skolnick Kolinski • 3 19.55 8.00 12.50 9 14 Baker • 4 16.88 6.00 10.00 6 10 BIOINFO.PL • 5 15.25 7.00 7.00 7 7 Shortle • 6 14.56 6.50 11.50 7 13 BAKER-ROBETTA • 7 13.49 4.00 11.00 4 11 Brooks • 8 11.34 3.00 6.00 3 6 Ho-Kai-Ming • 9 10.45 3.00 5.50 3 6 Jones-NewFold • -----------------------------------------------------------

• FR NgNW - number of good predictions without weighting for multiple models• FR NpNW - number of total predictions without weighting for multiple models

Overall Performance in CASP5 Contest

~8th out of 180 (M. Levitt, Stanford)

Page 22: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

22BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

CASP - Check it out!

Critical Assessment of Protein Structure Prediction http://predictioncenter.gc.ucdavis.edu/

• CASP7 contest - 2006:• http://www.predictioncenter.org/casp7/Casp7.html

• Provides assessment of automated servers for protein structure prediction (LiveBench, CAFASP,

EVA) & URLs for them

• Related contests & resources:

• Protein Function Prediction (part of CASP)

• CAPRI = Critical Assessment of Predicted Interactions

• New: CASPM = CASP for M = Mutant proteins

• Predict effects of small (point) mutations, e.g., SNPs

Page 23: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

23BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Another Convenient List of Links for Protein Prediction Servers

http://en.wikipedia.org/wiki/List_of_protein_structure_prediction_software

Page 24: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

24BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Chp 13 - Protein Structure Visualization, Comparison & Classification

SECTION V STRUCTURAL BIOINFORMATICS

Xiong: Chp 13

Protein Structure Visualization, Comparison & Classification

• Protein Structural Visualization

Protein Structure Comparison• Protein Structure Classification

Page 25: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

25BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Protein Structure Comparison Methods

3 Basic Approaches for Aligning Structures (see Xiong textbook for details)

1. Intermolecular 2. Intramolecular 3. Combined

But, very active research area - many recent new methods

3 Popular Methods: 1. DALI = Distance Matrix Alignment of Structures

(Holm)• FSSP Database

2. SSAP = Sequential Structure Alignment Program (Orengo)1. CATH Database

• CE = Combinatorial Extension (Bourne)• VAST at NCBI

URLS:

http://en.wikipedia.org/wiki/Structural_alignment_software

Page 26: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

49BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Chp 16 - RNA Structure Prediction

SECTION V STRUCTURAL BIOINFORMATICS

Xiong: Chp 16 RNA Structure Prediction (Terribilini)

• RNA Function• Types of RNA Structures• RNA Secondary Structure Prediction Methods• Ab Initio Approach• Comparative Approach• Performance Evaluation

Page 27: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

50BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

RNA Function

• Storage/transfer of genetic information• Newly discovered regulatory functions - RNAi

pathways especially• Catalytic

Page 28: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

51BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

RNA types & functions

Types of RNAs Primary Function(s)

mRNA - messenger translation (protein synthesis) regulatory

rRNA - ribosomal translation (protein synthesis) <catalytic>

t-RNA - transfer translation (protein synthesis)

hnRNA - heterogeneous nuclear

precursors & intermediates of mature mRNAs & other RNAs

scRNA - small cytoplasmic signal recognition particle (SRP)tRNA processing <catalytic>

snRNA - small nuclear snoRNA - small nucleolar

mRNA processing, poly A addition <catalytic>rRNA processing/maturation/methylation

regulatory RNAs (siRNA, miRNA, etc.)

regulation of transcription and translation, other??

Page 29: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

52BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

RNA Structure

• RNA forms complex 3D structures• Mainly single stranded• The single RNA strand can self-hybridize to

form base paired regions

Page 30: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

53BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Levels of RNA Structure

• Like proteins, RNA has primary, secondary, and tertiary structures

• Primary structure - base sequence• Secondary structure - single stranded or base paired• Tertiary structure - 3D structure

Rob KnightUniv Colorado

Page 31: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

54BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

RNA Structure Prediction

• RNA tertiary structure is very difficult to predict• Focus on predicting RNA secondary structure• Given a RNA sequence, predict the secondary

structure of the molecule• Almost all methods ignore higher order

secondary structures like psuedoknots

Page 32: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

55BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Base Pairing in RNA

G-C, A-U, G-U ("wobble") & variants

http://www.fli-leibniz.de/ImgLibDoc/nana/IMAGE_NANA.html#basepairs

See: IMB Image Library of Biological Molecules

Page 33: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

56BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Common structural motifs in RNA

• Helices

• Loops• Hairpin • Interior • Bulge • Multibranch

• Pseudoknots

Fig 6.2Baxevanis & Ouellette 2005

Page 34: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

57BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

RNA Secondary Structure Prediction Methods

• Two main types of methods• Ab initio - based on calculating the most

energetically favorable secondary structure• Comparative approach - based on evolutionary

comparison of multiple related RNA sequences

Page 35: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

58BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Ab Initio Prediction

• Only requires a single RNA sequence• Calculates minimum free energy structure• Base pairing lowers free energy of the

structure, so methods attempt to find secondary structure with maximal base pairing

Page 36: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

59BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Ab Initio Prediction

• Free energy is calculated based on parameters determined in the wet lab

• Known energy associated with each type of base pair

• Base pair formation is not independent - multiple base pairs adjacent to each other are more favorable than individual base pairs - cooperative

• Bulges and loops adjacent to base pairs have a free energy penalty

Page 37: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

60BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Ab Initio Energy Calculation Method

• Search for all possible base-pairing patterns

• Calculate the total energy of the structure based on all stabilizing and destabilizing forces

Fig 6.3Baxevanis & Ouellette 2005

Page 38: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

61BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Dot Matrices

• Can be used to find all possible base pair patterns

• Compare the input sequence to itself and put a dot anywhere there is a complimentary base

R Knight 2005

Page 39: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

62BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Dynamic Programming

• Finding the best possible secondary structure is difficult - lots of possibilities

• Compare RNA sequence with itself• Apply scoring scheme based on energy

parameters for base pairs, cooperativity, and penalties for destabilizing forces

• Find path that represents the most energetically favorable secondary structure

Page 40: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

63BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Problem

• DP returns the SINGLE best structure• There may be many structures with similar

energies• Also, your predicted secondary structure is only

as good as the energy parameters used• Solution - return multiple structures with near

optimal energies

Page 41: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

64BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Popular Ab Initio Prediction Programs

• Mfold• Combines DP with thermodynamic calculations• Fairly accurate for short sequences, less accurate as

sequence length increases

• RNAfold• Returns multiple structures near the optimal structure• Computes a larger number of potential secondary

structures than Mfold, so it uses a simplified energy function

Page 42: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

65BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Comparative Approach

• Uses multiple sequence alignment• Assumes related sequences fold into the same

secondary structure

Page 43: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

66BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Covariation

• RNA functional motifs are conserved• To maintain RNA structure during evolution, a

mutation in a base paired residue must be compensated for by a mutation in the base that it pairs with

• Comparative methods search for covariation patterns in MSA

Page 44: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

67BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Consensus Structures

• Predict secondary structure of each individual sequence

• Compare all structures and see if there is a most common structure

Page 45: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

68BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Popular Comparative Prediction Programs

• Two types• Require user to provide MSA• No MSA required

Page 46: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

69BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

RNAalifold

• Requires user to provide the MSA• Creates a scoring matrix combining minimum

free energy and covariation information• DP is used to select the minimum free energy

structure

Page 47: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

70BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Foldalign

• User provides a pair of unaligned RNA sequences

• Foldalign constructs alignment then computes a commonly conserved structure

• Suitable only for short sequences

Page 48: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

71BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Dynalign

• User provides two input sequences• Dynalign calculates possible secondary

structures using algorithm similar to Mfold• Dynalign compares multiple structures from

both sequences to find a common structure

Page 49: 110/17/07BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction BCB 444/544 Lecture 24  Protein Tertiary Structure Prediction #24_Oct17

72BCB 444/544 F07 ISU Terribilini #24 - RNA Secondary Structure Prediction 10/17/07

Performance Evaluation

• Ab initio methods achieve correlation coefficient of 20-60%

• Comparative approaches achieve correlation coefficient of 20-80%

• Programs that require user to supply MSA are more accurate

• Comparative programs are consistently more accurate than ab initio programs

• Base-pairs predicted by comparative sequence analysis for large & small subunit rRNAs are 97% accurate when compared with high resolution crystal structures!

- Gutell, Pace