dynamic programming (cont’d)

23
Dynamic Programming (cont’d) CS 466 Saurabh Sinha

Upload: lorand

Post on 18-Feb-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Dynamic Programming (cont’d). CS 466 Saurabh Sinha. RNA secondary structure prediction. RNA. RNA is similar to DNA chemically. It is usually only a single strand. T(hyamine) is replaced by U(racil) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dynamic Programming (cont’d)

Dynamic Programming (cont’d)

CS 466Saurabh Sinha

Page 2: Dynamic Programming (cont’d)

RNA secondary structure prediction

Page 3: Dynamic Programming (cont’d)

RNA• RNA is similar to DNA chemically. It is usually only a

single strand. T(hyamine) is replaced by U(racil)• Some forms of RNA can form secondary structures by

“pairing up” with itself. This can change its properties dramatically.

http://www.cgl.ucsf.edu/home/glasfeld/tutorial/trna/trna.giftRNA linear and 3D view:

Page 4: Dynamic Programming (cont’d)

RNA

• There’s more to RNA than mRNA

• RNA can adopt interesting non-linear structures, and catalyze reactions

• tRNAs (transfer RNAs) are the “adapters” that implement translation

Page 5: Dynamic Programming (cont’d)

Secondary structure• Several interesting RNAs have a conserved

secondary structure (resulting from base-pairing interactions)

• Sometimes, the sequence itself may not be conserved for the function to be retained

• It is important to tell what the secondary structure is going to be, for homology detection

Page 6: Dynamic Programming (cont’d)

Conserved secondary structure N-Y A A N-N’ N-N’R N-N’ N-N’ N-N’ N-N’ N-N’/ N

Consensus binding site for R17 phage coatprotein. N = A/C/G/U,N’ is a complementarybase pairing to N,Y is C/U, R is A/G

Source: DEKM

Page 7: Dynamic Programming (cont’d)

Basics of secondary structure

• G-C pairing: three bonds (strong)• A-U pairing: two bonds (weaker)• Base pairs are approximately coplanar

Page 8: Dynamic Programming (cont’d)

Basics of secondary structure

Page 9: Dynamic Programming (cont’d)

Basics of secondary structure

• G-C pairing: three bonds (strong)• A-U pairing: two bonds (weaker)• Base pairs are approximately coplanar• Base pairs are stacked onto other base

pairs (arranged side by side): “stems”

Page 10: Dynamic Programming (cont’d)

Secondary structure elements

Loop: single stranded subsequences bounded by base pairs

loop at the endof a stem

stem loop

single strandedbases withina stem

… only on oneside of stem

… on bothsides of stem

Page 11: Dynamic Programming (cont’d)

Non-canonical base pairs

• G-C and A-U are the canonical base pairs• G-U is also possible, almost as stable

Page 12: Dynamic Programming (cont’d)

Nesting

• Base pairs almost always occur in a nested fashion

• If positions i and j are paired, and positions i’ and j’ are paired, then these two base-pairings are said to be nested if:

• i < i’ < j’ < j OR i’ < i < j < j’

• Non-nested base pairing: pseudoknot

Page 13: Dynamic Programming (cont’d)

Pseudoknot

2 11

9 18(9, 18)(2, 11)NOT NESTED

Page 14: Dynamic Programming (cont’d)

Pseudoknot problems

• Pseudoknots are not handled by the algorithms we shall see

• Pseudoknots do occur in many important RNAs

• But the total number of pseudoknotted base pairs is typically relatively small

Page 15: Dynamic Programming (cont’d)

Secondary structure prediction

• Find the secondary structure with most base pairs.

• Nussinov’s algorithm• Recursive: finds best structure for small

subsequences, and works its way outwards to larger subsequences

Page 16: Dynamic Programming (cont’d)

Nussinov’s algorithm: idea

• There are only four possible ways of getting the best structure for subsequence (i,j) from the best structures of the smaller subsequences(1) Add unpaired position i onto best

structure for subsequence (i+1,j)

ii+1 j

Page 17: Dynamic Programming (cont’d)

Nussinov’s algorithm: idea

• There are only four possible ways of getting the best structure for subsequence (i,j) from the best structures of the smaller subsequences(2) Add unpaired position j onto best

structure for subsequence (i,j-1)

jj-1i

Page 18: Dynamic Programming (cont’d)

Nussinov’s algorithm: idea

• There are only four possible ways of getting the best structure for subsequence (i,j) from the best structures of the smaller subsequences(3) Add (i,j) pair onto best structure for

subsequence (i+1,j-1)

ji+1 j-1

i

Page 19: Dynamic Programming (cont’d)

Nussinov’s algorithm: idea

• There are only four possible ways of getting the best structure for subsequence (i,j) from the best structures of the smaller subsequences(4)Combine two optimal substructures (i,k)

and (k+1,j)

i k k+1 j

Page 20: Dynamic Programming (cont’d)

Nussinov RNA folding algorithm

• Given a sequence s of length L with symbols s1 … sL. Let (i,j) = 1 if si and sj are a complementary base pair, and 0 otherwise.

• We recursively calculate scores g(i,j) which are the maximal number of base pairs that can be formed for subsequence si…sj.

• Dynamic programming

Page 21: Dynamic Programming (cont’d)

Recursion

• Starting with all subsequences of length 2, to length L

• g(i,j) = max of • g(i+1, j)• g(i,j-1)• g(i+1,j-1) + (i,j)• maxi < k < j [g(i,k) + g(k+1,j)]

• Initialization• g(i,i-1) = 0• g(i,i) = 0

O(n2) ? No. O(n3)

Page 22: Dynamic Programming (cont’d)

Traceback

• As usual in sequence alignment ?• Optimal sequence alignment is a linear

path in the dynamic programming table• Optimal secondary structure can have

“bifurcations”• Traceback uses a pushdown stack

Page 23: Dynamic Programming (cont’d)

Traceback

Push (1,L) onto stackRepeat until stack is empty:pop (i,j)if i >= j continueelse if g(i+1,j) = g(i,j) push (i+1,j)else if g(i,j-1) = g(i,j) push (i,j-1)else if g(i+1,j-1) + (i,j) = g(i,j) record (i,j) base pair push (i+1,j-1)else for k = i+1 to j-1, if g(i,k)+g(k+1,j) g(i,j) push (k+1,j) push (i,k) break (for loop)